solutions

Aug 14, 2025

How SK Telecom and VAST Data Rewrote the Rules of GPU Virtualization

Author

Nicole Hemsoth Prickett, Head of Industry Relations

When the South Korean government decided to pursue an independent, general-purpose AI model development project, SK Telecom was handed a clear but thorny problem: build a secure, multi-tenant GPU cloud, sovereign from hardware to hypervisor.

The tricky parts weren’t just compliance or even multi-tenancy, it was delivering those without the usual cost of performance. Traditional virtualization stacks choke on GPU I/O and bare-metal solves performance but wrecks tenant isolation.

SKT decided not to compromise or cobble together existing tech. Instead, it architected from scratch, starting with a telco-grade virtualization layer built around direct RDMA passthrough to GPUs.

At the heart of this new architecture sits VAST Data, integrated deeply enough to erase the usual distinctions between storage and compute. SKT’s stack leverages VAST to deliver near-bare-metal RDMA throughput, automated tenant provisioning, and secure isolation without performance degradation.

What follows is the story of how SK Telecom, alongside VAST, built a new kind of GPU cloud, one that is sovereign, multi-tenant, and fast enough to feel (and benchmark) like bare metal.

Built a Sovereign AI Cloud from the GPU Up

After watching hyperscalers and OpenAI shape global AI adoption (and control critical IP) the South Korean government drew a line. The nation’s future models, infrastructure, and intellectual property would be homegrown, built domestically from metal to models.

The flagship initiative is structured as a competitive proving ground. Korean AI companies, government-funded and fiercely competitive, race to build state-of-the-art language models on fully sovereign infrastructure.

SK Telecom found itself at this project’s epicenter, tasked to deliver GPU infrastructure that could meet intense performance, isolation, and compliance requirements simultaneously.

This meant no week-long provisioning delays or compromises on tenant security, and no distant availability-zone dependence. But it also meant completely rethinking GPU virtualization, historically seen as too costly or too slow for large-scale AI training.

With 1,000 NVIDIA Blackwell GPUs, SK Telecom took on the sovereign imperative and along the way redefined how sovereign AI infrastructure could be built.

From 5G MEC to Sovereign AI Cloud

Of course, SK Telecom’s sovereign AI stack didn’t appear from thin air. Its roots trace back to an earlier ambition to bring ultra-low-latency compute at the telco edge.

In 2020, as SKT rolled out one of the world’s first national-scale 5G networks, its engineers began experimenting with mobile edge computing (MEC), deploying small datacenters inside telecom central offices to push compute closer to user traffic.

But low latency required a virtualization layer unlike anything on the market. Off-the-shelf software from VMware or via OpenStack couldn’t provide the tight performance controls or real-time responsiveness SKT required.

So SKT chose to build its own platform, Petasus, engineered to manage performance-sensitive jobs at the network edge.

When the Korean government announced the World Best LLM (WBL) initiative, SK Telecom’s work on Petasus suddenly gained a new level of relevance. The same problems of network latency, strict isolation, and instant provisioning that defined edge requirements mapped nicely to GPU-intensive AI workloads. Petasus evolved quickly from a specialized 5G edge platform into a robust virtualization stack capable of handling GPUs, secure multi-tenancy, and intense I/O demands.

And while the timing meshed well, the journey from 5G edge to sovereign AI infrastructure was not linear or planned from day one but it did give SK Telecom a head start, allowing them to architect a sovereign AI stack without starting entirely from scratch.

The result is a platform with deep technical maturity, built specifically for sovereign-scale GPU virtualization, and ready to support Korea’s ambitious AI objectives. Reflecting this evolution and expanded mission, Petasus has been re-branded as Petasus AI Cloud, a sovereign AI cloud platform designed for secure, high-performance GPU virtualization at national scale.

Building a Virtualized AI Platform From the Ground Up

To handle GPU-driven AI, traditional virtualization models simply fall short, according to Dr. Jian Li, Principal Engineer at SK Telecom and recognized SDN and virtualization expert.

GPU infrastructure is expensive and tricky; it doesn’t take kindly to the usual overhead and abstraction layers of standard VMs and containers. Li, having already faced down latency challenges with Petasus AI Cloud, understood that from the start. His team needed an architecture that could deliver GPU resources instantly, securely, and with near-zero performance penalties.

Their solution built with support from VAST engineers is a unified virtualization layer, supporting VMs, containers, and fully managed Kubernetes environments, capable of slicing and isolating GPUs at runtime.

Petasus AI Cloud supports multi-vendor GPU/NPU and heterogeneous user workloads.

Instead of fractionalizing single GPUs, Petasus AI Cloud partitions SXM-type 8-GPU modules into integer sub-bundles (2, 4, 6, or 8 GPUs), enabling instant provisioning of multi-vendor GPU clusters while maintaining secure isolation without the cost of a traditional hypervisor. Crucially, even when an SXM 8-GPU module is partitioned, GPUs within the same sub-bundle retain ultra-low-latency, high-bandwidth NVLink connectivity, allowing them to communicate over NVLink as if they were a single, monolithic device — preserving GPU-to-GPU performance for tightly-coupled parallel workloads.

Petasus AI Cloud GPU monitoring dashboard eases troubleshooting GPU and its inter-connect.

Beyond single-node partitioning, Petasus AI Cloud also supports slicing a single, multi-node GPU cluster into tenant-isolated virtual GPU clusters. Each tenant receives an integer-assigned set of GPUs that may span multiple physical nodes; within a node those GPUs communicate over NVLink, while inter-node GPU communication uses RDMA over the cluster fabric (e.g., InfiniBand or Ethernet). The system applies end-to-end (E2E) virtualization optimizations — from topology-aware placement and interconnect mapping to scheduling and bandwidth steering — so that virtualized GPU clusters deliver near-native performance for both tightly-coupled and distributed workloads, while preserving strong security and isolation guarantees.

AI Storage Virtualization and Optimization

CSI based NVMe over TCP vs. Native NFS over TCP vs. Native NFS over RDMA on Kubernetes.

And of course, secure multi-tenancy is paramount and had to be baked in from the start.

As Li explains, real multi-tenancy is harder than most realize, especially at the kernel level. Simple container-based approaches are tempting since they’re flexible and quick, but they carry unacceptable security risks, since even one compromised kernel can infect its neighbors.

Instead SKT’s Petasus AI Cloud implements strict isolation at every layer. Tenants see dedicated GPU paths, private network segments, and isolated memory spaces. Every tenant workload is secured end-to-end, from the VM or container to the storage backend itself.

We have to go into the weeds a bit to spell out the most critical challenge (and the most compelling technical achievement) which was optimizing the I/O path for GPU workloads.

Early attempts at virtualized storage and networking were really slow, managing only 3.6 GB/s throughput using steady old TCP connections and CSI plugins. That was barely a third of bare-metal performance, nowhere near acceptable.

SK Telecom with help from VAST engineers deconstructed that bottleneck, eventually uncovering a breakthrough with direct RDMA passthrough.

By bypassing both host and guest kernel stacks (routing traffic directly via SR-IOV Virtual Functions) they achieved throughput approaching 14 GB/s, equal to bare-metal performance, which redefined what we thought virtualization could achieve.

RDMA passthrough turned GPU virtualization from a compromise into a competitive edge.

VAST Data’s integration into SKT’s Petasus AI Cloud is subtle at first glance until you realize that on the unified virtualization layer, VAST’s data platform provides all that immediate, API-driven provisioning, making GPU resources instantly available to tenants without manual intervention.

For multi-tenancy, VAST offloads security responsibilities from the virtualization layer, enforcing tenant-level isolation directly at the storage fabric. The net effect? Simplifying the stack while strengthening end-to-end security.

But it’s in network optimization where VAST’s role is most prolific. SK Telecom’s RDMA breakthrough hinges on the seamless, zero-copy connectivity between compute nodes and VAST storage.

By supporting RDMA passthrough via SR-IOV Virtual Functions Link Aggregation (VF LAG), VAST storage became the center of SKT’s performance leap, delivering bare-metal-grade throughput and production-grade high-availability inside fully virtualized environments.

Underneath those performance gains was an architectural rethinking worth visualizing clearly.

The original path from VM to storage traversed multiple layers of guest and host kernel overhead, each adding latency and complexity. By shifting to RDMA passthrough, SKT created a straight-line path from GPU workload to VAST AI data platform, eliminating unnecessary hops and buffering layers.

Benchmarking with FiO confirmed what SKT suspected, Li says. Overhead was nearly eliminated, I/O latency plummeted, and throughput surged to levels indistinguishable from bare metal.

FiO benchmark evaluation results on bare-metal and virtual machine.

Is it becoming clear yet that GPU clouds no longer have to compromise on performance, security, or agility?

VAST Data Integration: Storage as Platform

Choosing a storage provider for SK Telecom’s GPU cloud was never going to be a simple RFP.

SKT required far more than a fast, commoditized data store. They needed storage to operate as part of the platform itself, blending into the virtualization fabric rather than stapled onto its edges.

As Li explains, at the top of their must-have list was multi-tenancy at scale, not simply permission sets, but full, fine-grained tenant isolation, secure authentication, and bandwidth control. Few options came close.

SK Telecom evaluated solutions from others, but found those offerings lacking the level of integrated multi-tenancy or simplicity they required. VAST Data stood out, not just because it was fast (though it certainly was), but because its architecture inherently supported secure multi-tenant GPU workloads, embedded QoS controls, and a clear path to API-based provisioning.

SK Telecom implemented VAST’s storage via dedicated per-tenant networks, isolating I/O paths securely and cleanly at the network layer. Using RDMA virtualization, SKT ensured each tenant’s GPU cluster would have direct, high-performance access to its assigned storage resources, entirely bypassing traditional overhead.

And Li loves to show off the co-engineering via the VM provisioning flow which has been automated. It’s a great demo to watch: VM golden images with pre-installed VAST client stacks auto-mount their storage shares, completely removing manual intervention at tenant instantiation.

Storage virtualization stopped being just about consolidation and became instead a cornerstone of dynamic infrastructure delivery.

There’s also a story to be told around observability and the control offered by APIs.

SK Telecom’s engineers prize simplicity. They knew the platform wouldn’t scale if their operators had to juggle complex monitoring tools or log into multiple UIs.

VAST’s built-in Prometheus exporter simplified monitoring dramatically, Li says, funneling deep storage telemetry directly into SKT’s existing observability stack.

And because VAST, as the operating system for AI, exposes all its provisioning and performance management via straightforward REST APIs, SKT could easily integrate storage management into their own UI, further streamlining operations.

Data infrastructure ceased to be a static commodity, becoming instead the active core of a fully orchestrated, sovereign GPU cloud platform.

From Storage to Platform: Redefining Roles

Rather than a passive, peripheral resource, storage became a deeply integrated platform component, active at every layer of the virtualization stack.

VAST’s shared everything architecture gave SKT’s platform a single, scalable data fabric capable of handling multiple protocols and workloads simultaneously.

Object storage, shared VM boot disks, ultra-fast AI training data, all of these previously distinct use cases flowed through the same flexible storage fabric, which was a major selling point for the SKT teams, Li says.

Most crucially, storage became central to tenant isolation and security.

Instead of relying exclusively on hypervisor-level security, VAST enforced strict multi-tenancy directly at the fabric level, simplifying both architecture and operation. This wasn’t storage as SKT had ever known it; it was storage as the platform itself, smart, responsive, programmable, and importantly, fully sovereign.

Li agrees the old boundaries between compute, networking and storage are fading rapidly.

SKT’s Petasus AI Cloud, with VAST as its foundation, represents this future clearly. It’s one where data platforms enable rather than constrain, and where infrastructure is reshaped by the very workloads it hosts.

Inference, Monitoring, and Agentic Workloads Ahead

SK Telecom’s Petasus AI Cloud was born from a specific mission. To provide large-scale model training for South Korea’s national LLM initiative.

Yet SKT knows training is just the starting point. Ahead lies the more complex world of production inference, real-time workloads, and data-hungry AI agents operating at massive scale.

The architecture they’ve developed is poised to evolve smoothly into that future, and plans are already underway.

The next immediate step will be dedicated inference infrastructure. SK Telecom expects inference workloads to have drastically different profiles with smaller batch sizes, latency-sensitive queries, rapid context switching, each requiring different performance optimizations.

VAST Data’s recent innovations around cache-offloading, data disaggregation, and fine-grained QoS are poised to play a key role as SKT expands into inference, enabling rapid responsiveness while retaining the same hard multi-tenant isolation that defined their training infrastructure.

Monitoring, too, is growing increasingly sophisticated and is an important piece of current and future visions, Li says. SK Telecom’s existing integration with VAST’s Prometheus exporter already offers deep observability into storage and network performance, but future expansions will embed these insights even deeper into SKT’s monitoring and operational stack.

The goal is unified visibility not just into infrastructure, but into the AI workloads themselves, giving operators instant understanding of workload performance, utilization, and bottlenecks Li adds.

Further ahead, SK Telecom anticipates the arrival of agent-driven workloads, autonomous, data-hungry applications that dynamically adjust resources in real-time, pulling storage, GPU, and network bandwidth exactly as needed.

Petasus AI Cloud SKT built with VAST is already inherently agent-ready, programmable and composable at every layer.

Lessons Learned Inside a Codesign Process

Li remarks on how SK Telecom’s collaboration with VAST Data was genuinely different. It was a co-design process from the start, structured around mutual discovery, rapid prototyping, and open problem-solving sessions.

Early performance bottlenecks forced both teams to pause, rethink, and redesign. When initial benchmarks revealed slow TCP-based throughput, SKT didn’t troubleshoot alone. VAST’s Korean and APAC engineering teams jumped in, contributing RDMA expertise, network-layer insights, and immediate responsiveness.

Weekly calls turned into working sessions, iterating together on architectures, testing scenarios, and integrating changes on-the-fly.

Li says both teams learned a core lesson early on. That solving performance bottlenecks in highly virtualized, GPU-intensive environments requires joint thinking not just problem escalation.

And the outcome speaks for itself. SK Telecom achieved its breakthrough RDMA optimization by leveraging VAST’s network expertise, while VAST learned valuable lessons in pushing storage architecture to support GPU-grade virtualization.

This co-design approach fostered lasting technical insight, laying foundations for future collaboration in inference optimization, monitoring advancements, and agent-driven architectures.

True innovation happens when teams work as one, thinking, designing, and learning together with a shared commitment to solving the same complex challenges.