Node.js and Drupal Feb. 7th, 2012 Elliott Foster

Node.js and Drupal

February 7th, 2012

Drupal is a great platform, but it can’t do everything. As your site grows, you’ll likely encounter use cases that Drupal can’t or shouldn’t do. Some examples include interacting with third-party APIs while preserving good page loads, or performing repeated actions (polling, etc) that result in site updates. Fortunately, separating these kinds of problems from Drupal and moving them to specialized “sub-stacks” within or near your Drupal stack is easy to do.

Enter Node.js. If you’re unfamiliar with it, Node.js is described as follows:

[…] a platform built on Chrome’s JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.

– From nodejs.org

The relevant part of this description that I’ll be focusing on is node’s “event-driven, non-blocking” model.

Third-party API communication

It’s becoming more common to integrate Drupal sites with third-party APIs. API integration is a great way of making your site more powerful without having to build a lot of functionality yourself. The problem is that a lot of third-party APIs are slow. A normal Drupal session runs in a single thread and blocks until all code finishes executing. If you are communicating with a third party during a user session, you can end up degrading user experience through slow page loads, confusing error messages, etc.

Handling this problem with Drupal queues is becoming more popular, and for good reason. Queueing API requests takes the transport out of the user session and shouldn’t affect user experience, but it also means your requests have to tolerate some delay (until the next time the queue is processed). We’re starting to see this more frequently on large Drupal sites and the trade off between user experience and immediate data propagation isn’t always an easy one to balance with clients.

So in this example, Node.js’s “event-driven, non-blocking” model means that you don’t need to wait for the API to respond before you can respond to Drupal. Node.js accomplishes this using callbacks, which if you’ve used jQuery before you should be fairly familiar with. The program will make a call and continue working, and when a response comes back, the callback will be executed. When written correctly, you can achieve nearly parallel execution of tasks — something that’s very difficult, and often impossible, to do in Drupal.

So how about an example of doing this? Consider a case where you want to update a third-party API with user information when the user’s profile is saved. Normally you would wait for the API to respond during the page request, but if you use Node.js as an API relay, the node server will immediately respond that it received the request, allowing the page load to finish quickly. The node server will then relay the request to the API. When a response is received it will call back Drupal with the results. Conceptually, it looks like this:

Moving the API communication out of Drupal allows you to speed up page requests and split off functionality that’s not well suited to a blocking model. There are many places you can typically do this with Drupal, not solely limited to API communication.

Repetitive jobs in Node.js

Another area that’s a good candidate for a Node.js-Drupal marriage is in repetitive jobs that need to be run very frequently. Again, Drupal queues can fulfill some of this need, but if you need to repeat the job very frequently, Drupal-based solutions are not usually ideal due to their high overhead (having to bootstrap Drupal every time you need to repeat the job). It would be better to bootstrap Drupal only when there’s work that actually needs to be done.

An example of this could be polling an external API for data that needs to be added to your site. Thanks to node’s event driven model in relatively few lines of code it can perform this action without generating a lot of overhead on the server while it’s waiting to repeat:

This code will repeat a get request to http://localhost/poll every one second. When the response is complete you can add some logic to determine if Drupal needs to do any work, then make another call to Drupal with the payload from localhost. Any site-specific business logic still remains in Drupal, you’re just offloading the grunt work to an engine that’s more efficient at doing it.

Proof of concept

To see some real-world examples of the two use cases I’ve described here, check out the following code:

The key as a developer is to remember that Drupal is great at a lot of things — just not everything. If there’s another tool that’s better suited to the job you’re trying to do, use it! Just because Drupal is usually flexible enough for you to get it to do what you want doesn’t mean that you should bend it that way.

Comments