phpBB

Development Wiki

PhpBB3.1/RFC/Automatic UTF-8 Normalization

From phpBB Development Wiki

< PhpBB3.1‎ | RFC

Currently any call to request_var with multibyte characters requires a manual call to utf8_normalize_nfc. This, even though you have to tell request_var that the content can contain UTF-8 characters. This makes the whole API more complicated than needed. By doing normalization from within request_var this can be simplified.

Usage

The old syntax for reading UTF-8 data from the request parameters was:

$input utf8_normalize_nfc(request_var('input'''true));

The new API is:

$input request_var('input'''true);

Implementation

Because UTF-8 normalization is idempotent (http://unicode.org/reports/tr15/#Design_Goals), the change is fully backwards-compatible.

Normalization is applied from within request_var, which means it doesn't have to be explicitly called from outside anymore.

Notes

This page focuses mostly on the automatic normalization. There are also other unicode enhancements included in the patch.

PHP 5.3.0 includes the intl extension which contains a native normalizer (http://php.net/manual/en/normalizer.normalize.php). This can lead to great performance boosts compared to the PHP implementation. If available, phpBB will use the native normalizer.