The Hidden Work of AI at Warner Brothers Discovery

WBD’s AI platform proves production AI is an engineering problem: Kafka backbones, Airflow DAGs, Protobuf schemas, and sandboxed pipelines embed human feedback into workflows, ensuring secure, scalable adoption at petabyte scale rather than relying on novel models alone.

Nenad Mancevic, Principal Software Engineer at Warner Bros Discovery opened his talk at the AI Infra Summit last week with a Google schematic from a decade ago to remind the audience that the faith of the day might be misplaced.

“The actual machine learning code enabling systems is really the smallest part of the entire system. Everything else is the infrastructure,” he said before unpacking it in the context of WBD’s century-old content archive and a subscriber base topping 120 million.

This environment shuttles petabytes of video data through a cloud-native, multi-provider environment spanning AWS, GCP, and some Azure, with Kafka, Postgres, and sandboxed AI pipelines securing and orchestrating the flow.

Mancevic explains that while In the language of enterprise AI, everything glamorous lives at the surface (agent demos, personalization, LLMs beating a benchmark). But he thinks the numbers tell a harsher truth. “Less than 5% to the workload making production” and the problem isn’t the quality of models but the failure to do the hard work of building pipelines that last and workflows people will actually use.

So WBD built from the bottom up.

Terraform-defined infrastructure stretched across AWS, GCP, and a little bit of Azure, a mostly serverless compute model to keep developers from sinking into provisioning hell, Google Cloud Storage buckets glued into place.

Above that, Kafka operates as the bloodstream. “The central component is the data streaming backbone, which is essentially our message bus. It's a Kafka based message bus, and this is the communication channel between different systems.” Everything that enters the platform must conform to a schema. Only then does it land in a Postgres data lake, where it becomes fodder for Vertex AI or SageMaker, feature stores, registries, and GPU-backed inference jobs, orchestrated through Airflow.

From there, outputs rise up into the applications (captioning, ad slot detection, clip highlights) but the visible use cases are just the final mile of an engineering project whose bulk is hidden beneath.

Mancevic stressed that humans are not a problem to eliminate, but the only reason the system exists at all. “Ultimately, we are solving those problems for humans, we want to make humans more productive and more engaged in the solution.”

Captioning is the canonical example. A model generates the transcript, but humans correct it, and that correction is not wasted labor. “Once the humans are done with their work, this becomes the ground truth for us, and it gets sent back to the platform, which we use them for model fine tuning and model training.”

The same cycle plays out with ad placement, with recommendations, with highlights. The lesson is that an AI platform is not an island of algorithms but a loop between machine output and human validation, one that has to be encoded directly into the engineering.

Even the apparent simplicity of building features is just engineering discipline wearing a friendly face. “They just need to focus on two files and build their features into files so they can experiment with many, many different candidate features.” But those two files are the surface of a vast apparatus of Protobuf schemas, Airflow DAGs, and Terraform provisioning scripts that enforce compatibility, ensure pipelines don’t collapse, and keep every experiment within a walled sandbox. That sandbox is a separate environment used only for AI workflows and AI features, which means it's secure by default, he says.

And then there is cost. For a company that moves petabytes of video, the topic is far from abstract. The engineering here is not only about moving data quickly but about not going broke in the process.

The refrain at the end was stripped of hype. “Applied AI is really mostly about engineering. It's the ML code, especially today with llms, and it is the ML code.” And again: “Focusing on what you want to build instead of how is is critical, and being able to work with the stakeholders and your partners in the organization is even more important than tech itself.”

It would have been easier to end with a celebration of generative breakthroughs or a showcase of dazzling user experiences. Instead, the message was austere: the pipeline is the product. The Kafka topics, the Protobuf contracts, the human corrections, the Terraform scripts, the sandboxed security...

All the scaffolding that doesn’t look like AI is the only reason AI actually works.

The latest agents and LLMs may sparkle, but “they are not silver bullets. They are not going to solve all of your problems. You need to know their limits," he concludes.

In other words, the hidden work is the real work, and the engineers know it.