Nodes have evolved remarkably over Drupal’s history. In Drupal 4.7, node types were typically created by modules that “owned” their node types. There was no way to create a node type without a module behind it. Modules creating node types would implement hook_node_info() and directly handle the the main loading, saving, and editing of the node type. Drupal core handled the loading and saving of the title and body. Modules doing this were effectively subclassing a pseudo-abstract node class (a class containing title and body only) in core and adding their own fields.

Drupal 4.7 was also the dawn of the Form API and its hook_form_alter(). Combined with the ability (beginning in Drupal 5) to create node types directly in core, the dominant pattern of node type development changed. The decorator pattern emerged as the preferred approach. This allowed multiple modules to simultaneously create fields on the same node types.

Using decorators with node types is straightforward:

  • Create your node types using Drupal’s content type administration tool.
  • Create a module that implements hook_form_alter() and hook_nodeapi() that adds fields to selected content types.
  • Configure the module to add its fields to the selected content types.

Even when it would be equally straightforward to have a module implement its own node type, implementing the fields with a decorator is superior because the approach maintains consistency. All fields are saved and loaded through hook_nodeapi(), and all form elements are defined through hook_form_alter(). No module has any special claim to particular node types. This approach should completely obsolete the implementation of hook_node_info() and other “I own this node type” hooks by modules.

But this model has some downsides, at least in Drupal:

  • When modules directly created node types, data tended to be better consolidated. Because we now prefer to decorate node types using the fields from multiple modules, data is more scattered. We’re typically implementing each decorator as a table with a foreign key to nid and vid. This has massive performance implications.
  • Configuration for managing the mapping of decorators to node types is wildly inconsistent, and there’s no way to see, globally, which decorators apply to which node types.
  • Managing interaction between decorators is inconsistent or absent. This interaction includes namespace conflicts on $node objects.
  • The node editing form and $node object are the only places where decorators all come together in a consistent way. This makes data importing and exporting nearly impossible without custom code for each module performing decoration.

Fortunately, we don’t have to solve this problem on our own. Systems like Sun’s OpenDS have sophisticated, well-reasoned data models that allow decorators to elegantly combine to form coherent, node-like objects. OpenDS discusses its schema model on its wiki, and I’ll use it as my example.

The OpenDS schema contains basic layers:

  • Attributes, which are fields
  • Abstract and structural classes, which contain attributes
  • Object classes, which are the set of classes assigned to an object

Objects (which are node-like) can be assigned multiple object classes, each of which functions like a decorator. The objects may contain values (often even multiple values) for the attributes provided by their object classes.

Drupal modules would create attributes and directory classes, and Drupal core would contain a unified interface for assigning the classes to node types.

What this gets us:

  • Asynchronous multi-master replication support. Right now, node data is scattered all over the database, and there’s no way to “package” it for coherent, asynchronous replication across multiple hosts without a PHP-level implementation. In OpenDS, objects are fundamentally understood and managed by the directory’s data storage layer. It’s easy for it to replicate whole nodes.
  • Similarly, this ability to “package” objects gives us importing and exporting for free. OpenDS can import and export LDIF-formatted data, and this would allow nodes to be transported to dissimilar systems, even different directory servers. You would simply need the same classes supported on the destination system.
  • The “packaged” objects make sharding and partitioning data much easier.
  • We get tools like Apache Directory Studio that give a coherent object view, including the list of classes for each object. There’s no way to view a node in MySQL without a painful number of joins.
  • Built-in protection against namespace collisions for attributes.
  • We get unified indexing across decorators. Because decorators currently store their data in multiple tables, we can’t index, say, a city and the title for a node without denormalization. In OpenDS, you can create VLV indexes that span any set of attributes and selectable subsets of nodes. It basically allows creation of a comprehensive index for anything configured in the Views module. The only comparable features in relational databases are indexed views in SQL Server and materialized views in Oracle. MySQL does not support such indexes.
  • We can change a field from being single-valued to multi-valued without a fundamental change in the way we access the data.

I’ll be experimenting with using OpenDS as a node back-end in the upcoming weeks. It would be great to have a robust, multi-master, free/open-source node storage system.