Andy Young's Blog

me!

Hi, I'm Andy! I'm cofounder and CTO of GroupSpaces.com, a site that takes the pain out of managing real-world groups. I created the Selective Twitter Status app for Facebook. You can send me an email.




PHP htmlspecialchars()/htmlentities() invalid multibyte/UTF-8 gotcha with display_errors=true

Here’s a seemingly nonsensical gotcha I just discovered with error handling for htmlspecialchars() and htmlentities() in PHP. Since PHP 5.2.5, if either of those functions are passed invalid multibyte strings (invalid UTF-8, perhaps containing a truncated multi-byte character after improper use of substr() intead of mb_substr()) then PHP triggers the following error and returns an empty string:

PHP Warning: htmlspecialchars(): Invalid multibyte sequence in argument

Now, in general the php ini setting display_errors can be used to control whether errors are output to the browser, the ini setting log_errors can be independently used to control whether errors are written to logfile, and if a custom error handler has been set with set_error_handler() then this is always called for all errors and can then read the values of display_errors and log_errors along with the value of error_reporting() and take the appropriate course of action, right?

Wrong! In this case, htmlspecialchars() and htmlentities() only trigger the error if the value of display_errors is false. If the value of display_errors is true then no error is triggered at all! This seemingly nonsensical behaviour makes it impossible to detect these errors during debugging with display_errors on.

Here’s the bug report, marked as closed - bogus, original modification to the PHP source that added this behaviour, initial fix for this behaviour and subsequent revert of that fix that I believe was incorrect.

I’ve contacted the core PHP developers involved but in the meantime this may help anyone searching Google..


April by David. A Monthly Theme. Powered by Tumblr.