国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

Home Backend Development PHP Tutorial Bringing Unicode to PHP with Portable UTF-8

Bringing Unicode to PHP with Portable UTF-8

Feb 23, 2025 am 09:29 AM

Bringing Unicode to PHP with Portable UTF-8

Core points

  • Although PHP is able to handle multi-byte variable names and Unicode strings, the language lacks comprehensive Unicode support because of treating strings as single-byte character sequences. This limitation affects all aspects of string operation, including substring extraction, determining string length, and string segmentation.
  • Portable UTF-8 is a user space library that brings Unicode support to PHP applications. It is built on top of mbstring and iconv, provides about 60 Unicode-based string manipulation, testing and verification functions, and uses UTF-8 as its main character encoding scheme. The library is fully portable and can be used with any PHP 4.2 or later installation.
  • Portable UTF-8 library provides multiple functions for processing Unicode strings, including UTF-8 input verification, removing invalid bytes, encoding text into HTML entities to prevent XSS attacks, trimming spaces, removing duplicate spaces, creating inclusions UTF-8 characters URL fragments and forced limits on input character length. This ensures that in Unicode-enabled applications, the focus shifts from byte and byte lengths to character and character lengths.

PHP allows multi-byte variable names (e.g. $a∩b, $?xy and $Δx), mbstring and other extensions can handle Unicode strings, and utf8_encode() and utf8_decode() functions can be used in UTF Convert strings between -8 and ISO-8859-1 encoding. However, it is widely believed that PHP lacks Unicode support. This article describes the meaning of lack of Unicode support and demonstrates how to use a library that brings Unicode support to PHP applications - Portable UTF-8.

Unicode support in PHP

PHP's lack of Unicode/multi-byte support means that standard string processing functions treat strings as single-byte character sequences. In fact, the official PHP manual defines a string in PHP as "a series of characters, one of which is the same as a byte". PHP supports only 8-bit characters, while Unicode (and many other character sets) may require multiple bytes to represent a character. This limitation of PHP affects almost all aspects of string operation, including (but not limited to) substring extraction, determining string length, string segmentation, mixing and so on. Efforts to solve this problem began in early 2005, but in 2010, the work of bringing native Unicode support to PHP was stopped and put on hold for a variety of reasons. Since native Unicode support in PHP can take years to implement (if it does), developers must rely on available extensions such as mbstring and iconv to fill this gap, but these extensions offer only limited Unicode support. These libraries are not Unicode-centric and can also be converted between non-Unicode encodings. They make positive contributions to simplifying Unicode string processing. However, the above extension also has some disadvantages. They only provide limited Unicode string processing capabilities, and none of them are enabled by default. Server administrators must explicitly enable any or all extensions to access them through PHP applications. Shared hosting providers often make things worse by installing one or two extensions, which makes it difficult for developers to rely on an always-available API to meet their Unicode needs. Still, the good news is that PHP can output Unicode text. This is because PHP doesn't really care whether we are sending English text encoded in ASCII or other text belonging to the language whose characters are encoded in multiple bytes. Knowing this, PHP developers now only need an API that provides comfortable Unicode-based string manipulation.

Portable UTF-8

The recent solution is to create a user space library written in PHP. Even if the server/language level lacks support, these libraries can be easily bundled with the application to ensure the presence of Unicode support. Many open source applications already include their own libraries of this kind, and many more use free third-party libraries; Portable UTF-8 is such a library. Portable UTF-8 is a free lightweight library built on top of mbstring and iconv. It extends the functionality of these two extensions, providing about 60 Unicode-based string manipulation, testing and verification functions; it provides UTF-8-aware corresponding functions for nearly all PHP common string handling functions. As the name implies, Portable UTF-8 uses UTF-8 as its primary character encoding scheme. The library uses available extensions (mbstring and iconv) for speed reasons and bridges some inconsistencies when using them directly, but if there are no these extensions on the server, it falls back to using pure PHP A UTF-8 routine written. Portable-UT8 is fully portable and can be used with any PHP 4.2 or later installation.

Stand processing using Portable UTF-8

Text editors with poor Unicode support can corrupt text when reading text, and text copied and pasted into web forms from such an editor may be the source of invalid UTF-8 for the application. When processing user-submitted input, be sure to make sure the input is exactly in line with the application's expectations. To detect whether the text is valid UTF-8, you can use the library's is_utf8() function.

if (is_utf8($_POST['title'])) {
    // 執(zhí)行某些操作...
}

Recovering characters from invalid bytes is impossible, so removing bytes that are not recognized as valid UTF-8 characters may be your only choice. The utf8_clean() function can be used to remove invalid bytes.

$title = utf8_clean($_POST['title']);

Each Unicode character can be encoded as the corresponding HTML entity, and you may want to encode the text in this way to help prevent XSS attacks before outputting it to the browser.

echo utf8_html_encode($title);

Usually, spaces are trimmed at the beginning and end of a string. Unicode lists about 20 space characters, and some ASCII-based control characters should also be considered objects that need to be pruned.

$title = utf8_trim($title);

On the other hand, duplicates of such spaces may exist in the middle of a string and should be deleted. The following shows how to use utf8_remove_duplicates() and utf8_ws() in combination:

$title = utf8_remove_duplicates($title, utf8_ws());

The traditional solution for creating URL fragments for SEO purposes uses transliteration and removes all non-ASCII characters from the fragment. This makes the URL less valuable than it is. While the URL can support UTF-8 encoded characters, without such removal or transliteration, we can create rich snippets containing characters in any language:

$slug = utf8_url_slug($title, 30); // 字符長度30

From the start of input verification to saving data to a database, Unicode-enabled applications focus on character and character lengths, not byte and byte lengths. This shift in focus requires a new interface to understand this difference. It is usually necessary to limit the length of the input character, so if the input is more than 60 characters in length, we will create a substring.

if (utf8_strlen($title) > 60) {
    $title  = utf8_substr($title, 0, 60);
}

Or:

if (!utf8_fits_inside($title , 60)) {
    $title  = utf8_substr($title, 0 ,60);
}

There are three different ways to access a single character using the Portable-UT8 library. We can use utf8_access() to access a single character.

echo '第六個(gè)字符是:' . utf8_access($string, 5);

utf8_chr_map() Allows iterative access of a single character using a callback function.

utf8_chr_map('some_callback', $string);

We can split the string into a character array using utf8_split() and process the array elements as a single character.

array_map('some_callback', utf8_split($string));

Training Unicode may also require us to find the minimum/maximum code point in the string, segment the string, process byte order markers, string case conversion, randomization/mixing, replacement, etc. All of this is supported by Portable-UT8.

Conclusion

PHP 6 development has been stopped, resulting in the long-term need for native Unicode support being delayed, which is crucial for the development of multilingual applications. Therefore, server-side extensions and user space libraries such as Portable UTF-8 play an important role in helping developers create better standardized webs to meet local needs.

(The FAQs part is omitted here due to space limitations)

The above is the detailed content of Bringing Unicode to PHP with Portable UTF-8. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What are some best practices for versioning a PHP-based API? What are some best practices for versioning a PHP-based API? Jun 14, 2025 am 12:27 AM

ToversionaPHP-basedAPIeffectively,useURL-basedversioningforclarityandeaseofrouting,separateversionedcodetoavoidconflicts,deprecateoldversionswithclearcommunication,andconsidercustomheadersonlywhennecessary.StartbyplacingtheversionintheURL(e.g.,/api/v

How do I implement authentication and authorization in PHP? How do I implement authentication and authorization in PHP? Jun 20, 2025 am 01:03 AM

TosecurelyhandleauthenticationandauthorizationinPHP,followthesesteps:1.Alwayshashpasswordswithpassword_hash()andverifyusingpassword_verify(),usepreparedstatementstopreventSQLinjection,andstoreuserdatain$_SESSIONafterlogin.2.Implementrole-basedaccessc

What are weak references (WeakMap) in PHP, and when might they be useful? What are weak references (WeakMap) in PHP, and when might they be useful? Jun 14, 2025 am 12:25 AM

PHPdoesnothaveabuilt-inWeakMapbutoffersWeakReferenceforsimilarfunctionality.1.WeakReferenceallowsholdingreferenceswithoutpreventinggarbagecollection.2.Itisusefulforcaching,eventlisteners,andmetadatawithoutaffectingobjectlifecycles.3.YoucansimulateaWe

What are the differences between procedural and object-oriented programming paradigms in PHP? What are the differences between procedural and object-oriented programming paradigms in PHP? Jun 14, 2025 am 12:25 AM

Proceduralandobject-orientedprogramming(OOP)inPHPdiffersignificantlyinstructure,reusability,anddatahandling.1.Proceduralprogrammingusesfunctionsorganizedsequentially,suitableforsmallscripts.2.OOPorganizescodeintoclassesandobjects,modelingreal-worlden

How can you handle file uploads securely in PHP? How can you handle file uploads securely in PHP? Jun 19, 2025 am 01:05 AM

To safely handle file uploads in PHP, the core is to verify file types, rename files, and restrict permissions. 1. Use finfo_file() to check the real MIME type, and only specific types such as image/jpeg are allowed; 2. Use uniqid() to generate random file names and store them in non-Web root directory; 3. Limit file size through php.ini and HTML forms, and set directory permissions to 0755; 4. Use ClamAV to scan malware to enhance security. These steps effectively prevent security vulnerabilities and ensure that the file upload process is safe and reliable.

How can you interact with NoSQL databases (e.g., MongoDB, Redis) from PHP? How can you interact with NoSQL databases (e.g., MongoDB, Redis) from PHP? Jun 19, 2025 am 01:07 AM

Yes, PHP can interact with NoSQL databases like MongoDB and Redis through specific extensions or libraries. First, use the MongoDBPHP driver (installed through PECL or Composer) to create client instances and operate databases and collections, supporting insertion, query, aggregation and other operations; second, use the Predis library or phpredis extension to connect to Redis, perform key-value settings and acquisitions, and recommend phpredis for high-performance scenarios, while Predis is convenient for rapid deployment; both are suitable for production environments and are well-documented.

What are the differences between == (loose comparison) and === (strict comparison) in PHP? What are the differences between == (loose comparison) and === (strict comparison) in PHP? Jun 19, 2025 am 01:07 AM

In PHP, the main difference between == and == is the strictness of type checking. ==Type conversion will be performed before comparison, for example, 5=="5" returns true, and ===Request that the value and type are the same before true will be returned, for example, 5==="5" returns false. In usage scenarios, === is more secure and should be used first, and == is only used when type conversion is required.

How do I stay up-to-date with the latest PHP developments and best practices? How do I stay up-to-date with the latest PHP developments and best practices? Jun 23, 2025 am 12:56 AM

TostaycurrentwithPHPdevelopmentsandbestpractices,followkeynewssourceslikePHP.netandPHPWeekly,engagewithcommunitiesonforumsandconferences,keeptoolingupdatedandgraduallyadoptnewfeatures,andreadorcontributetoopensourceprojects.First,followreliablesource

See all articles