Google has decided that YouTube demands such a huge transcoding workload that it needs to build its own server chips. The company detailed its new “Argos” chips in a YouTube blog post, a CNET interview, and in a paper for ASPLOS, the Architectural Support for Programming Languages and Operating Systems Conference. Just as there are GPUs for graphics workloads and Google’s TPU (tensor processing unit) for AI workloads, the YouTube infrastructure team says it has created the “VCU” or “Video (trans)Coding Unit,” which helps YouTube transcode a single video into over a dozen versions that it needs to provide a smooth, bandwidth-efficient, profitable video site.
Google’s Jeff Calow said the Argos chip has brought “up to 20-33x improvements in compute efficiency compared to our previous optimized system, which was running software on traditional servers.” The VCU package is a full-length PCI-E card and looks a lot like a graphics card. A board has two Argos ASIC chips buried under a gigantic, passively cooled aluminum heat sink. There’s even what looks like an 8-pin power connector on the end because PCI-E just isn’t enough power.
Google provided a lovely chip diagram that lists 10 “encoder cores” on each chip, with Google’s white paper adding that “all other elements are off-the-shelf IP blocks.” Google says that “each encoder core can encode 2160p in realtime, up to 60 FPS (frames per second) using three reference frames.”
The cards are specifically designed to slot into Google’s warehouse-scale computing system. Each compute cluster in YouTube’s system will house a section of dedicated “VCU machines” loaded with the new cards, saving Google from having to crack open every server and load it with a new card. Google says the cards resemble GPUs because they are what fit in its existing accelerator trays. CNET reports that “thousands of the chips are running in Google data centers right now,” and thanks to the cards, individual video workloads like 4K video “can be available to watch in hours instead of the days it previously took.”
Factoring in the research and development on the chips, Google says this VCU plan will save the company a ton of money, even given the below benchmark showing the TCO (total cost of ownership) of the setup compared to running its algorithm on Intel Skylake chips and Nvidia T4 Tensor core GPUs.
YouTube’s unfathomably large transcoding problem
Because YouTube is the world’s biggest video site, keeping it running was originally seen as an impossible task until Google bought the company in 2006. Since then, Google has aggressively fought to keep the site’s cost down, often reinventing Internet infrastructure and copyright in order to make it happen. Today, the primary infrastructure problem YouTube needs to solve for end users is providing video that works just right for your device and bandwidth while maintaining quality. That means using a codec that is supported by your device and picking a resolution that matches your display (and not blowing up your Internet connection with a massive file).
For Google, that means transcoding a single video into a lot of other videos. You can see part of this work yourself just by clicking on the gear for an 8K video, where you’ll see nine total resolutions created from a single upload: 144p, 240p, 360p, 480p, 720p, 1080p, 1440p, 2160p, and 4320p. These are all different video files, and every one needs to be created from the original 8K uploaded file—and keep in mind, this is just for your specific device.
Google also needs to offer some of those nine resolutions in multiple codecs, which dictate how the video is compressed on its way over the Internet. The company wants to offer videos in the most advanced, efficient codec available to save on bandwidth, which is a massive part of YouTube’s costs. Decoding a video codec gobbles up processing power, though, and on cheaper mobile devices, decoding won’t happen smoothly and efficiently without dedicated hardware acceleration support for each new codec. That means Google only gets to use the best codecs on new devices, and it needs to keep copies of the video around in older codecs for older devices.
Today, modern devices usually get the efficient VP9 codec, while the more compatible H.264 is kept around for devices that aren’t on the cutting edge. No one truly knows the depths of YouTube’s video codec selection, but the site also generally supports devices going back almost 10 years, including “low-resolution flip phones,” according to the ASPLOS paper. So there are some pre-H.264 codecs, like 3GP, for ancient devices.
Google’s YouTube computing challenge becomes even more unfathomably large when you consider that codecs are continually being pushed forward—and again, with bandwidth being such a huge cost of running the site, it benefits Google to push for and upgrade to these new codecs as soon as possible. Upgrading to a new codec means transcoding every video (or at least a majority of them) to the hot new codec, and, oh yeah, this needs to happen every few years for each new codec.
How many videos do you think are on YouTube? Google probably only provides stats about growth (like “500 hours of video are uploaded to YouTube every minute”) because the total number of videos is so large, it’s an unknowable amount. And that’s not even counting YouTube Live (imagine all of this transcoding happening live, within a 100 ms delay) and the additional workloads from Drive and Google Photos. Google has the biggest transcoding job on earth.
Codecs are so important to YouTube’s success that Google actually takes a lead in developing them. In 2009, Google bought codec developer On2 Technologies (the company that provided the VP6 codec used in Flash video, which powered YouTube at the time), and the search giant has been a major codec developer since then. After pushing out and upgrading to VP8 and VP9, Google is moving on to its next codec, called “AV1,” which it hopes will someday see a wide rollout. AV1 was created through an industry coalition.
Regarding AV1, Calow told the YouTube blog, “One of the things about this is that it wasn’t a one-off program. It was always intended to have multiple generations of the chip with tuning of the systems in between. And one of the key things that we’re doing in the next-generation chip is adding in AV1, a new advanced coding standard that compresses more efficiently than VP9 and has an even higher computation load to encode.” AV1 is experimentally available on YouTube and several other video sites, but mass usage is currently held up by client support. According to CNET, these second-generation chips are already being phased into Google’s server farms.