Jul 17, 2025

How GPU-Optimized AI Transformed Product Search at Walmart

Nicole Hemsoth Prickett

How GPU-Optimized AI Transformed Product Search at Walmart

What goes on behind the scenes when millions of Walmart shoppers type their queries into the search bar?

The long answer unfolds in milliseconds. 

A recent paper from Walmart researchers highlights the recent deployment of a BERT-based cross encoder model, showing how product relevance is computed in real-time, especially for those trickier long-tail queries that have historically defied more simplistic optimization.

Transformer-based architectures like BERT are notoriously powerful, capable of grasping nuance, context, and intent in ways that conventional models, especially simpler dual encoder systems, struggle to match. 

But as is ever the case there’s a tradeoff. This power comes at a steep cost, a formidable computational weight that can crush responsiveness, especially when scaled across millions of daily queries.

Recognizing the gravity of latency at this scale, Walmart focused heavily on GPU-level optimizations. 

By implementing intermediate representations of the model, operator fusion to reduce redundant memory access, and careful vectorization to streamline I/O, Walmart's team coaxed roughly a 5x reduction in inference latency from the model. 

Further enhancements like pre-tokenizing product information and compressing payloads before shuttling them between storage and GPU clusters chipped latency overhead down dramatically, from 30% to just a fraction of its initial burden. 

Additionally, fine-tuning GPU parameters, such as batch sizes and the number of concurrent worker threads, drove further improvements, halving median latency and cutting peak latency fourfold.

At an enterprise of Walmart’s size, shaving milliseconds isn’t trivial. Real-world experiments validated these efforts, with measurable lifts in key customer engagement metrics, including increased add-to-cart events, reduced abandonment rates, and fewer clicks required to find and purchase desired products. 

Walmart’s deployment shows how AI, when thoughtfully optimized at the infrastructure level, can elevate the consumer experience at staggering scale. 

Subscribe and learn everything.
Newsletter
Podcast
Spotify Logo
Subscribe
Community
Spotify Logo
Join The Cosmos Community

© VAST 2025.All rights reserved

  • social_icon
  • social_icon
  • social_icon
  • social_icon