Andy Young's Blog

me!

Hi, I'm Andy! I'm cofounder and CTO of GroupSpaces.com, a site that takes the pain out of managing real-world groups. I created the Selective Twitter Status app for Facebook. You can send me an email.




Posts about Dev

PHP Trick: Catching fatal errors (E_ERROR) with a custom error handler

Implementing a custom error handler using set_error_handler() in PHP can be a useful technique (Google search for more info/examples)

Unfortunately, set_error_handler() doesn’t catch fatal errors - as the PHP docs say:

The following error types cannot be handled with a user defined function: E_ERROR, E_PARSE, E_CORE_ERROR, E_CORE_WARNING, E_COMPILE_ERROR, E_COMPILE_WARNING, and most of E_STRICT raised in the file where set_error_handler() is called.

Handily, I just discovered it’s possible to catch fatal E_ERROR errors and direct them to your custom error handler using a combination of register_shutdown_function() and error_get_last():

set_error_handler('myErrorHandler');
register_shutdown_function('fatalErrorShutdownHandler');

function myErrorHandler($code, $message, $file, $line) {
  ...
}

function fatalErrorShutdownHandler()
{
  $last_error = error_get_last();
  if ($last_error['type'] === E_ERROR) {
    // fatal error
    myErrorHandler(E_ERROR, $last_error['message'], $last_error['file'], $last_error['line']);
  }
}

The key is that functions registered with register_shutdown_function() are called even on a fatal error - including out of memory errors. error_get_last() can then be used to detect whether we’re ending the script because of a fatal error, and pass the error info to your custom error handler if so.


PHP htmlspecialchars()/htmlentities() invalid multibyte/UTF-8 gotcha with display_errors=true

Here’s a seemingly nonsensical gotcha I just discovered with error handling for htmlspecialchars() and htmlentities() in PHP. Since PHP 5.2.5, if either of those functions are passed invalid multibyte strings (invalid UTF-8, perhaps containing a truncated multi-byte character after improper use of substr() intead of mb_substr()) then PHP triggers the following error and returns an empty string:

PHP Warning: htmlspecialchars(): Invalid multibyte sequence in argument

Now, in general the php ini setting display_errors can be used to control whether errors are output to the browser, the ini setting log_errors can be independently used to control whether errors are written to logfile, and if a custom error handler has been set with set_error_handler() then this is always called for all errors and can then read the values of display_errors and log_errors along with the value of error_reporting() and take the appropriate course of action, right?

Wrong! In this case, htmlspecialchars() and htmlentities() only trigger the error if the value of display_errors is false. If the value of display_errors is true then no error is triggered at all! This seemingly nonsensical behaviour makes it impossible to detect these errors during debugging with display_errors on.

Here’s the bug report, marked as closed - bogus, original modification to the PHP source that added this behaviour, initial fix for this behaviour and subsequent revert of that fix that I believe was incorrect.

I’ve contacted the core PHP developers involved but in the meantime this may help anyone searching Google..


Design by Contract - for efficient coding

Yesterday I picked up this tweet from Joel:

Getting into PHP Exceptions. Very useful :) Example - userExists() function should raise exception when no user passed, not return false!

Hang on, should it? While exceptions in programming languages are without doubt a useful tool, his example reminded me of a problematic style of coding I’ve experienced - something I consider an Antipattern. The style I have in mind is when each function starts with code validating the input parameters before the main work of the function is executed. Here’s an example:

function addUserToGroup($user, $group, $done_by) {
  if (!($user instanceof User)) { throw new Exception(..); }
  if (!($group instanceof Group)) { throw new Exception(..); }
  if (!($done_by instanceof User)) { throw new Exception(..); }
  // ok, we're (finally) good to go:
  $ref = new UserGroupRef();
  $ref->setUser($user);
  $ref->setGroup($group);
  $ref->setDoneBy($done_by);
  $ref->save();
}

If you’d never dream of writing code like this, move along - nothing of interest to see below! If you’re wondering why I think this is a bad approach, though - what’s the problem here?

The code is verbose - 50% is performing “meta” tasks. This takes time to write and makes it harder to maintain and interpret in the future. (I’m a big believer in self-documenting code - key to this is keeping code short and reducing the amount of unimportant information).

The issue gets stronger if we go on to include similar checks in the implementations of UserGroupRef->setUser(), setGroup() and setDoneBy(). The complete system is now “nice and robust” but has doubled in terms of lines of code, and you can imagine how when the code is executed the same checks are called repeatedly at different levels of the call stack. If those other functions are only ever called by code that has already ensured the parameters passed will be valid, surely all the time and effort to write and maintain the additional checks isn’t strictly necessary..

Design By Contract

So how do we tell when we need to do this validation? Enter the concept of “Design by Contract” - a “technique to reduce the programming effort for large projects” (source). The basic idea is that each function comes with a “contract” that should be agreed to when calling it, in the same way that two people may have a contract when agreeing to work together in the real world.

The key elements of Design by Contract are:

  • a Precondition - what must be true before the function is called - i.e. the responsibilities of the calling scope. For example, the $user parameter will contain a User object.
  • a Postcondition - what will be true after the function has executed - i.e. the responsibilities of the function itself. For example. $user will be a member of $group.
  • a Class Invariant - what the function guarantees to leave unchanged (for completeness - so we don’t get unexpected side-effects).

The part I want to focus on is the precondition - specifically what parameters the function will be given - values, datatypes etc. Design by Contract places the responsibility on the calling code to provide valid parameters, not on the called code to validate them. The core idea is “fail hard” - if any part of the contract is broken the entire thing is void - and there’s no guarantee what will happen(!)

That obviously sounds quite scary - I think this interpretation of DBC is a relevant approach only for internal functions within a module or self-contained system. DBC is not a good model for dealing with user input at run-time(!) - and for open-source code or APIs that will be used by other teams or released publicly the extra code to check for and report violation of the contract is probably worth the saving in confusion and support questions.

Using Design By Contract for efficient coding

So the point is to use DBC as the justification for not filling your classes and methods with code validating input parameters, resulting in lengthy code that’s harder to understand and takes longer to maintain. You just need to ensure that every member of the team writing code that uses a particular module works on the principle that it is the calling code’s responsibility to provide valid parameters, something that is often possible with small teams and bespoke modules. DBC also simplifies debugging and unit testing since the intended behaviour of each routine is clearly specified.

Back to Joel’s example - should userExists() throw an exception if passed invalid input? The answer really depends on how it will be used, and what the “contract” for function is understood to be. The contract could be written that userExists() will take responsibility for validating it’s input. However it’s often the case that the calling code will also need to know if it’s not working with valid data, which indicates that validation is best done at the earliest point in the call stack than repeatedly throughout.

It’s also important to keep in mind what real-world effect invalid input may have - for some functions it could result in critical data corruption. However the chances are that when testing for the existence of an invalid user, simply returning false will cause the system to function fine as expected. This leads us to one of my favourite concepts of coding - YAGNI ;)

Comments welcome.


Reliability Bug in Facebook PHP API Batching Support

I recently discovered a bug in the current Facebook API PHP Client library that affects the support for batching calls which I’ve used for the Selective Twitter Status application.

The bottom line is, sometimes the call to batch.run within execute_server_side_batch() as called by end_batch() returns an empty string instead of the expected array of results from each of the batched calls. The code has the following test for an exception but this isn’t triggered in it’s current state since no specific FB exception is being returned, just the empty result:

    if (is_array($result) && isset($result['error_code'])) {
      throw new FacebookRestClientException($result['error_msg'],
                                            $result['error_code']);
    }

The symptom is a bunch of PHP warnings similar to the following as the code attempts to iterate over the empty string as if it was the expected array of results:

PHP Notice:  Uninitialized string offset:  0 in [...]/facebook/facebookapi_php5_restlib.php on line 210
PHP Notice:  Uninitialized string offset:  1 in [...]/facebook/facebookapi_php5_restlib.php on line 210

etc

In my experience the empty result has been in situations where the requested batch operations haven’t been performed, so the fix is to add a test for the empty result and to treat this as an exception. This was my hack:

 if (!isset($result[0])) {
            throw new FacebookRestClientException('Batch call returned invalid result: ' . var_export($result, true) . '; ' . var_export($xml, true), 9999);
    }

As a side note, the batching support as implemented in the current PHP API client is rather limited since if one of the batched functions returns an exception it is thrown immediately, with the result that the return values of subsequent calls are not populated and it’s impossible to know whether they executed successfully. I’ve changed the line that threw the exception to instead set the FacebookRestClientException object as the return value for the relevant call as follows:

    for($i = 0; $i batch_queue[$i];
      $batch_item_result_xml = $result[$i];
      $batch_item_result = $this->convert_xml_to_result($batch_item_result_xml, $batch_item['m'], $batch_item['p']);
      if (is_array($batch_item_result) &&
          isset($batch_item_result['error_code'])) {
        // PATCH: modify to return the Exception object as the result for this call so we can process all results/exceptions
        $batch_item_result = new FacebookRestClientException($batch_item_result['error_msg'], $batch_item_result['error_code']);
      }
      $batch_item['r'] = $batch_item_result;
    }

Hopefully this helps anyone searching Google..


April by David. A Monthly Theme. Powered by Tumblr.