Internal structures and implementation

In this (last) section on object orientation in PHP we’ll have a look at some of the internal structures that were previously only mentioned in passing. In particular we’ll see more thoroughly the default object structure and the object store.

Object properties

The probably by far most complicated part of PHP’s object orientation system is the handling of object properties. In the following we’ll take a look at some of its parts in more detail.

Property storage

In PHP object properties can be declared, but don’t have to. How can one efficiently handle such a situation? To find out let’s recall the standard zend_object structure:

typedef struct _zend_object {
    zend_class_entry *ce;
    HashTable *properties;
    zval **properties_table;
    HashTable *guards;
} zend_object;

This structure contains two fields for storing properties: The properties hash table and the properties_table array of zval pointers. Two separate fields are used to best handle both declared and dynamic properties: For the latter, i.e. properties that have not been declared in the class, there is no way around using the properties hash table (which uses a simple property name => value mapping).

For declared properties on the other hand storing them in a hashtable would be overly wasteful: PHP’s hash tables have a very high per-element overhead (of nearly one hundred bytes), but the only thing that really needs to be stored is a zval pointer for the value. For this reason PHP employs a small trick: The properties are stored in a normal C array and accessed using their offset. The offset for each property name is stored in a (global) hashtable in the class entry. Thus the property lookup happens with one additional level of indirection, i.e. rather than directly fetching the property value, first the property offset is fetched and that offset is then used to fetch the actual value.

Property information (including the storage offset) is stored in class_entry->properties_info. This hash table is a map of property names to zend_property_info structs:

typedef struct _zend_property_info {
    zend_uint flags;
    const char *name;
    int name_length;
    ulong h;                 /* hash of name */
    int offset;              /* storage offset */
    const char *doc_comment;
    int doc_comment_len;
    zend_class_entry *ce;    /* CE of declaring class */
} zend_property_info;

One remaining question is what happens when both types of properties exist. In this case both structures will be used simultaneously: All properties will be written into the properties hashtable, but properties_table will still contain pointers to them. Note though that if both are used the properties table holds zval** values rather than zval* values.

Sometimes PHP needs the properties as a hashtable even if they are all declared, e.g. when the get_properties handler is used. In this case PHP also switches to using properties (or rather the hybrid approach described above). This is done using the rebuild_object_properties function:

ZEND_API HashTable *zend_std_get_properties(zval *object TSRMLS_DC)
{
    zend_object *zobj;
    zobj = Z_OBJ_P(object);
    if (!zobj->properties) {
        rebuild_object_properties(zobj);
    }
    return zobj->properties;
}

Property name mangling

Consider the following code snippet:

class A {
    private $prop = 'A';
}

class B extends A {
    private $prop = 'B';
}

class C extends B {
    protected $prop = 'C';
}

var_dump(new C);

// Output:
object(C)#1 (3) {
  ["prop":protected]=>
  string(1) "C"
  ["prop":"B":private]=>
  string(1) "B"
  ["prop":"A":private]=>
  string(1) "A"
}

In the above example you can see the “same” property $prop being defined three times: Once as a private property of A, once as a private property of B and once as a protected property of C. Even though these three properties have the same name they are still distinct properties and require separate storage.

In order to support this situation PHP “mangles” the property name by including the type of the property and the defining class:

class Foo { private $prop;   } => "\0Foo\0prop"
class Bar { private $prop;   } => "\0Bar\0prop"
class Rab { protected $prop; } => "\0*\0prop"
class Oof { public $prop;    } => "prop"

As you can see public properties have “normal” names, protected ones get a \0*\0 prefix (where \0 are NUL bytes) and private ones start with \0ClassName\0.

Most of the time PHP does a good job hiding the mangled names from userland. You only get to see them in some rare cases, e.g. if you cast an object to array or look at serialization output. Internally you usually don’t need to care about mangled names either, e.g. when using the zend_declare_property APIs the mangling is automatically done for you.

The only places where you have to look out for mangled names is if you access the property_info->name field or if you try to directly access the zobj->properties hash. In this cases you can use the zend_(un)mangle_property_name APIs:

// Unmangling
const char *class_name, *property_name;
int property_name_len;

if (zend_unmangle_property_name_ex(
        mangled_property_name, mangled_property_name_len,
        &class_name, &property_name, &property_name_len
    ) == SUCCESS) {
    // ...
}

// Mangling
char *mangled_property_name;
int mangled_property_name_len;

zend_mangle_property_name(
    &mangled_property_name, &mangled_property_name_len,
    class_name, class_name_len, property_name, property_name_len,
    should_do_persistent_alloc ? 1 : 0
);

Property recursion guards

The last member in zend_object is the HashTable *guards field. To find out what it is used for, consider what happens in the following code using magic __set properties:

class Foo {
    public function __set($name, $value) {
        $this->$name = $value;
    }
}

$foo = new Foo;
$foo->bar = 'baz';
var_dump($foo->bar);

The $foo->bar = 'baz' assignment in the script will call $foo->__set('bar', 'baz') as the $bar property is not defined. The $this->$name = $value line in the method body in this case would become $foo->bar = 'baz'. Once again $bar is an undefined property. So, does that mean that the __set method will be (recursively) called again?

That’s not what happens. Rather PHP sees that it is already within __set and does not do a recursive call. Instead it actually creates the new $bar property. In order to implement this behavior PHP uses recursion guards which remember whether PHP is already in __set etc for a certain property. These guards are stored in the guards hash table, which maps property names to zend_guard structures:

typedef struct _zend_guard {
    zend_bool in_get;
    zend_bool in_set;
    zend_bool in_unset;
    zend_bool in_isset;
    zend_bool dummy; /* sizeof(zend_guard) must not be equal to sizeof(void*) */
} zend_guard;

Object store

We already made a lot of use of the object store, so let’s have a closer look at it now:

typedef struct _zend_objects_store {
    zend_object_store_bucket *object_buckets;
    zend_uint top;
    zend_uint size;
    int free_list_head;
} zend_objects_store;

The object store is basically a dynamically resized array of object_buckets. size specifies the size of the allocation, whereas top is the next object handle to be used. Handles are counted starting from 1, to ensure that all handles are “truthy”. Thus if top == 1 the next object will get handle = 1, but will be put at position object_buckets[0].

The free_list_head is the head of a linked list of unused buckets. Whenever an object is destroyed it leaves behind an unused bucket, which is then put in this list. If a new object is created and such a bucket exists (i.e. free_list_head is not -1), then this bucket is used instead of the top one.

To see how this linked list is maintained have a look at the zend_object_store_bucket structure:

typedef struct _zend_object_store_bucket {
    zend_bool destructor_called;
    zend_bool valid;
    zend_uchar apply_count;
    union _store_bucket {
        struct _store_object {
            void *object;
            zend_objects_store_dtor_t dtor;
            zend_objects_free_object_storage_t free_storage;
            zend_objects_store_clone_t clone;
            const zend_object_handlers *handlers;
            zend_uint refcount;
            gc_root_buffer *buffered;
        } obj;
        struct {
            int next;
        } free_list;
    } bucket;
} zend_object_store_bucket;

If the bucket is in use (i.e. stores an object), then the valid member will be 1. In this case the struct _store_object part of the union will be used. If the bucket is not used, then valid will be 0 and PHP will make use of free_list.next.

This reclaiming of unused object handles can be shown with a small script:

var_dump($a = new stdClass); // object(stdClass)#1 (0) {}
var_dump($b = new stdClass); // object(stdClass)#2 (0) {}
var_dump($c = new stdClass); // object(stdClass)#3 (0) {}

unset($b); // free handle 2
unset($a); // free handle 1

var_dump($e = new stdClass); // object(stdClass)#1 (0) {}
var_dump($f = new stdClass); // object(stdClass)#2 (0) {}

As you can see the handles of $b and $a are reused in reverse order of destruction.

Apart from valid the bucket structure also contains a destructor_called flag. This flag is needed for PHP’s two-phase object destruction process: As already outlined previously PHP has distinct dtor (can run userland code, isn’t always run) and free (must not run userland code, is always executed) phases. After the dtor handler has been called, the destructor_called flag is set to 1, so that the dtor is not run again when the object is freed.

The apply_count member serves the same role as the nApplyCount member of HashTable: It protects against infinite recursion. It is used via the macros Z_OBJ_UNPROTECT_RECURSION(zval_ptr) (leave recursion) and Z_OBJ_PROTECT_RECURSION(zval_ptr) (enter recursion). The latter will throw an error if the nesting level for an object is 3 or larger. Currently this protection mechanism is only used in the object comparison handler.

The handlers member in the _store_object struct is also required for destruction. The reason for this is that the dtor handler only gets passed the stored object and its handle:

typedef void (*zend_objects_store_dtor_t)(void *object, zend_object_handle handle TSRMLS_DC);

But in order to call __destruct PHP needs a zval. Thus it creates a temporary zval using the passed object handle and the object handlers stored in bucket.obj.handlers. The issue is that this member can only be set if the object is destructed through zval_ptr_dtor or some other method where the zval (and as such the object handlers) is known.

If on the other hand the object is destroyed during shutdown (using zend_objects_store_call_destructors) the zval is not known. In this case bucket.obj.handlers will be NULL and PHP falls back to the default object handlers. Thus it can sometimes happen that overloaded object behavior is not available in __destruct. An example:

class DLL extends SplDoublyLinkedList {
    public function __destruct() {
        var_dump($this);
    }
}

$dll = new DLL;
$dll->push(1);
$dll->push(2);
$dll->push(3);

var_dump($dll);

set_error_handler(function() use ($dll) {});

This code snippet adds a __destruct method to SplDoublyLinkedList and then forces the destructor to be called during shutdown by binding it to the error handler (the error handler is one of the last things that is freed during shutdown.) This will produce the following output:

object(DLL)#1 (2) {
  ["flags":"SplDoublyLinkedList":private]=>
  int(0)
  ["dllist":"SplDoublyLinkedList":private]=>
  array(3) {
    [0]=>
    int(1)
    [1]=>
    int(2)
    [2]=>
    int(3)
  }
}
object(DLL)#1 (0) {
}

For the var_dump outside the destructor get_debug_info is invoked and you get meaningful debugging output. Inside the destructor PHP uses the default object handlers and as such you don’t get anything apart from the class name. The same also applies to other handlers, e.g. things like cloning, comparison, etc will not work properly.

This concludes the chapter on object orientation. You should now have a good understanding of how the object orientation system in PHP works and how extensions can make use of it.