The fast and the curious: How exploring Amazon Neptune helped us optimize scalability for NBCUniversal

Elliott Foster

Web Chef Emeritus

Elliott Foster is a developer and standard nerd. When not crushing code, Elliott can probably be found on his bike suffering up one of Austin’s hills or on a porch drinking a local craft brew.

December 19, 2019

I just got back from AWS re:Invent in Las Vegas, where I had the opportunity to share some exciting results. Well, they’re exciting to us Web Chefs here at Four Kitchens — and if you’re looking for a fully managed, fast and reliable, easy-to-use graph database service, you’ll be excited, too.

Graph databases are designed to help navigate connected structures. Today, that covers a lot of ground, including:

Search engines
Social and professional networks
Fraud detection
Network and IT operations

With a graph database, you query data and filter results based on relationships. It makes sense with the types and quantity of data we see in so many use cases.

In fact, that’s how I ended up speaking about the new Amazon Neptune graph database service at re:Invent. Amazon announced Neptune’s general availability at the conference last year (November 2018), but we had the opportunity to put the database service into use even earlier, as part of our work for NBCUniversal.

Amazon says it designed Neptune to be fast, reliable, open, and easy to use. The service allows you to stripe six replicas of your data across three availability zones, so you can query literally billions of relationships without pushing latency past the milliseconds mark. You get full backup and restore, plus support for Apache TinkerPop with Gremlin and W3C RDF with SPARQL.

Four Kitchens has partnered with multinational media conglomerate NBCUniversal for nearly seven years. As part of that partnership, we’ve helped the company enhance the capabilities of NBC.com, the online presence of flagship brand NBC. Huge amounts of data streaming from multiple sources and to multiple platforms are nothing new for the company.

But by early 2018, it was apparent to the NBCUniversal backend team that the company’s legacy in-house data storage and management methods no longer matched up to its data-delivery needs. Data was stored in a document store but used in a highly relational way. Substantial year-over-year growth and a focus on content personalization made current caching strategies ineffective or unscalable.

Building a prototype

After evaluating several alternatives, we concluded that a graph database would align with our scalability demands and data usage. Neptune wasn’t yet available to the public, but it sounded like a potential solution to our problem. Luckily, the AWS team was willing to work with Four Kitchens and NBCUniversal to build a prototype system.

We put the system through its paces as we introduced content as part of OneApp, which lets users access content from all the company’s networks (including NBC, E!, SyFy, Oxygen, and others). The OneApp implementation produced a 10x increase in the amount of data we were responsible for managing. As you can imagine, we learned a lot about this new graph database service — both its potential and where there’s room to grow.

We originally hoped to port our existing SPARQL queries to Neptune. But we learned that Gremlin was the better option — at least in this situation. Now, almost every engineer on the team can use Gremlin to write complicated traversals.
Because we had no tolerance for a major outage, we had to build out and release components into production in parallel with the legacy system. I like to say that it was a bit like changing a flat while speeding down a highway at 60 mph.
We ended up implementing Neptune in three phases. During the first phase, we learned how the database scales and how to provision the cluster. During the second phase, we wrote a custom traversal builder on top of our JSON schema data model, so that we could reliably build traversals as we added members to our graph. During the final phase, we began to see real results, plummeting from a latency of 240 ms to under 50 ms, even as data requests grew from around 10k to 50k — and all in just a few months.
We learned that graph databases simply aren’t great at handling bulk data operations or loosely filtered queries. Knowing this in advance enables you to work around the limitations to some extent through intelligent graph design … but it is what it is.
Fail-over logic makes right-sizing clusters a bit tricky. However, the AWS engineers were more than willing to help us work out the kinks, and that was before Neptune was generally available. Current documentation and the fine-tuning of over a year in production probably translate to even more support now.

Hear the details

All in all, we were pleased with the cost savings and scalability boost that we saw from Neptune. Interested in learning more about the experience? You can watch the talk here:

Next up, we’re planning additional personalization to help drive higher user engagement. We’re also digging deeper into our data and analyzing how users interact with it. And we’re developing a GraphQL interface to Neptune. Stay tuned for more as details become available.

Making the web a better place to teach, learn, and advocate starts here...

When you subscribe to our newsletter!

* indicates required field

Email*

Country*

We take your privacy seriously. We do not sell or share your data. We use it to enhance your experience with our site and to analyze the performance of our marketing efforts. To learn more, please see our Privacy Notice.

I agree

EU status

The fast and the curious

Building a prototype

Hear the details

Making the web a better place to teach, learn, and advocate starts here...

How can we help?