PHP htmlspecialchars()/htmlentities() invalid multibyte/UTF-8 gotcha with display_errors=true
Here’s a seemingly nonsensical gotcha I just discovered with error handling for htmlspecialchars() and htmlentities() in PHP. Since PHP 5.2.5, if either of those functions are passed invalid multibyte strings (invalid UTF-8, perhaps containing a truncated multi-byte character after improper use of substr() intead of mb_substr()) then PHP triggers the following error and returns an empty string:
PHP Warning: htmlspecialchars(): Invalid multibyte sequence in argument
Now, in general the php ini setting display_errors can be used to control whether errors are output to the browser, the ini setting log_errors can be independently used to control whether errors are written to logfile, and if a custom error handler has been set with set_error_handler() then this is always called for all errors and can then read the values of display_errors and log_errors along with the value of error_reporting() and take the appropriate course of action, right?
Wrong! In this case, htmlspecialchars() and htmlentities() only trigger the error if the value of display_errors is false. If the value of display_errors is true then no error is triggered at all! This seemingly nonsensical behaviour makes it impossible to detect these errors during debugging with display_errors on.
Here’s the bug report, marked as closed - bogus, original modification to the PHP source that added this behaviour, initial fix for this behaviour and subsequent revert of that fix that I believe was incorrect.
I’ve contacted the core PHP developers involved but in the meantime this may help anyone searching Google..