People of ACM - Pei Cao
February 10, 2026
How did your career path lead to your current focus on web searching and information discovery?
I've always been fascinated by the interplay between CPU, memory, storage, and networks. For any given system under a specific workload, throughput is almost always bottlenecked by one of these four components. The interesting question becomes, “Can we use spare capacity in the non-bottlenecked components to relieve the pressure on the bottleneck?”
During my PhD, I worked on workloads where disk storage was the limiting factor, and found that good caching and prefetching algorithms could use RAM to offload disks and boost overall system performance.
After I graduated, it was right around the time when the World Wide Web was exploding, but the wider Internet infrastructure was not built up yet. People visiting web sites regularly experienced delays and timeouts. The whole Internet system was bottlenecked by network bandwidth. The solution? Use disk storage to cache web pages closer to users, essentially trading storage for network capacity. My research work focused on building high performance web caching servers.
When building web caching servers, we run into a specific problem: how do you summarize a dynamically changing set of URLs on one server for efficient set-membership tests by another server? Andrei Broder and I worked on this problem, and the solution we came up with, Counting Bloom Filters, became a widely used data structure in many computer systems.
When I joined Google in 2004, I encountered a similar challenge on a massive scale. How do we index all web pages on the Internet and answer search queries at very high queries per second (QPS)? The standard approach was prohibitively expensive. We had to use techniques like caching, filtering, pre-computation, and compression in creative ways to maximally leverage available compute, random access memory (RAM), network, solid state drives (SSDs), and disks to lower costs. I spent years working with my team building such a system.
I enjoy solving performance optimization challenges, and luckily new problems keep surfacing, so I'm always having fun!
According to recent statistics, nearly 2.8 billion people used YouTube in Q2 of 2025, up from 234 million users in Q2 of 2010. From an infrastructure perspective, how has YouTube been able to meet this enormous growth? Looking ahead, how will YouTube keep pace with expected expansion in the coming years?
Engineers love solving challenging problems, and scaling YouTube is exactly the kind of fun challenge that pushes us to come up with new ideas and question long-held assumptions. A good example is the Argos—an ASIC chip that performs encoding transformation of videos at low cost, developed by colleagues of mine.
My own work in this area focused on reducing network bandwidth consumption. When I was leading YouTube's video thumbnail team, we realized YouTube was among the largest image-serving websites in the world, so image compression was paramount for both YouTube and our users. In 2016 we collaborated with the Chrome team to adopt WebP format for YouTube video thumbnails, cutting image size by over 15%. In 2017 we introduced algorithm-selected animated thumbnails to help people decide which videos to watch without having to watch the beginning portions.
As YouTube continues growing, the YouTube engineering team will keep innovating to serve billions of users well.
Social media is changing search, and more people (especially Gen Z) are now using video more than text to learn about a topic. Generally speaking, how is YouTube addressing these new realities?
I don't lead the YouTube search team these days, so some of my thoughts here might be a little outdated. But generally speaking, information search over videos is harder than search over text for two reasons: videos don't contain as much text information as a web page, and finding the exact answer to a search query can be tedious in long videos.
To address the first issue, back in the pre-large language model (LLM) days, the YouTube search team employed various means to enrich the text information associated with videos - for example, co-training text and video embeddings using co-watch graphs. Today, with LLMs that are highly capable at visual understanding, I think this first issue will become less of a challenge.
For the second issue, the YouTube search team introduced several changes in how we present search results. In 2021, we started surfacing video chapters directly in search results, so users can jump straight to a video segment containing the answer. In 2022, we enabled “search within video”—if a query matches a particular moment in the video, a snippet appears in the search result pointing to that moment.
I'd also like to expand your question to talk about another group YouTube search is uniquely suited to help: low literacy users. Nearly 800 million people worldwide cannot read or write, the majority of whom are women. For them, YouTube is the only venue for information on the Internet, because videos don't require literacy to consume. But in order to find information, they need to be able to search YouTube, using voice!
So, we invested heavily in improving the voice search experience. We developed specialized error correction techniques to reduce word error rate for voice recognition in search queries, utilized personalization to improve voice recognition for each user (since people talk differently), and added explicit audio confirmations of recognized queries to give users a chance to correct voice-recognition mistakes. We saw increased voice search usage in the developing world once we rolled out these improvements. This workstream has been particularly satisfying for the YouTube search team.
Part of your role as a VP of Engineering at YouTube includes leading the technical aspects of YouTube’s Trust and Safety division. What is one example of an important advance (software or hardware) that supports the secure and ethical use of the platform?
Instead of talking about advances, I'd like to discuss a particular challenge facing YouTube these days: detecting DeepFake content and AI-generated media. Over the past three years, the ability of AI tools to manipulate media and generate brand new videos has advanced dramatically. Today, humans can no longer reliably tell what's real and what's generated by GenAI. As you can imagine, AI-manipulated and AI-generated media are rife for abuse such as misinformation, harassment, and worse. It's critical to know whether an image or video is camera-captured, GenAI generated, or GenAI manipulated.
This classification task will be a challenge for years to come. While legitimate AI tool providers work hard to introduce technologies like synthID and watermarking, the open-source nature of machine learning (ML) models means there is always content generated by tools not from legitimate providers. Furthermore, when we rely on ML models for this classification, adversaries can train against the detection model and develop generation models that break the detection. We then need to evolve our detection ML model to catch the output from the adversarial generation model. This cat-and-mouse game can go on indefinitely.
YouTube's Trust and Safety teams are investing in two approaches. First, we're working with industry partners to support the C2PA industry standard. C2PA not only provides a way for legitimate GenAI providers to specify the provenance of generated content but also enables digital camera manufacturers to cryptographically attest the authenticity of an image or video. We believe this attested provenance approach complements ML approaches. Second, we're actively collaborating with academic researchers to identify and develop machine learning techniques to detect GenAI media. In August 2026, YouTube and IEEE are sponsoring the first International Symposium on Synthetic Media Attribution and Detection. We look forward to more collaborations across industry and academia.
Given your varied career, what advice would you offer a younger colleague just starting out in the field?
Computer science is a fast-changing field. Every decade brings new challenges that require communities of researchers to solve. As you advance in your career, you won't always work in the technical subareas in which you were trained. But a solid computer science foundation will enable you to learn any subfield of computer science quickly and then make contributions. So be prepared to learn continuously and don’t be afraid to venture into new areas.
Pei Cao is a Vice President of Engineering at YouTube, the largest video platform in the world. She served as the engineering leader for YouTube Search from 2016 to 2023, and has led the engineering team for Trust & Safety at YouTube since 2023. Prior to 2016 she worked at Google, where she designed a scalable system that enabled the platform to search hundreds of billions of web documents. Cao has been granted multiple patents for technological and software advancements, including co-owning a recent patent for generating moving thumbnails for videos.
An ACM member since 2013, Cao was recently named an ACM Fellow for contributions to web caching, search engine efficiency, and information quality.