Summary

Title: Reshaping Datacenter Performance and Efficiency at Meta: A HW-SW Co-design Journey

Speaker: Yifan Yuan, Meta's AI-Systems Co-design team

Date/Time: Friday, February 14th @11AM

Location: 471 Rhodes Hall (CSL Lounge) 

Host: Mohammad Alian

Title: Reshaping Datacenter Performance and Efficiency at Meta: A HW-SW Co-design Journey

Speaker: Yifan Yuan, Meta’s AI-Systems Co-design team

Date/Time: Friday, February 14th @11AM

Location: 471 Rhodes Hall (CSL Lounge)

Host: Mohammad Alian

Abstract: In the rapidly evolving landscape of cloud computing, optimizing datacenter performance and efficiency is paramount. At Meta, we have embarked on a transformative journey to enhance our datacenter through innovative hardware and software technologies. This talk will delve into three pivotal advancements that are reshaping our infrastructure. Our story begins with DCPerf, an open-source benchmark suite meticulously crafted to evaluate the diverse workloads of Meta’s datacenters. As our datacenter fleet supports a wide array of applications, DCPerf provides a comprehensive set of benchmarks that simulate real-world production workloads, enabling us to make informed hardware design/procurement decisions and collaborate with industry/academia partners to influence future architectures. In addition to that, we strive to make the most out of our fleet resources. On the software side, we actively develop TPP (transparent page placement) and its succeeder (TPPv2), a novel Linux kernel feature to manage (CXL-based) tiered memory systems. TPP intelligently optimizes memory allocation by dynamically placing pages in different tiers, with hotness, multi-tenancy, and observability in mind, requiring no application-level change. On the hardware side, we explore datacenter-wide CPU frequency and power management deployment and operation. We navigate the heterogeneity of both hardware and software in the fleet, ensuring high service quality and low infrastructure reliability risk. Through these innovations, we are transforming our datacenter operations, achieving better performance, efficiency, and sustainability.

Bio: Dr. Yifan Yuan is a Senior Research Engineer at Meta’s AI-Systems Co-design team, where he drives innovation in system architecture for datacenters. His research expertise spans I/O, networking, memory, and accelerators, with a recent focus on CXL-related techniques and emerging core/accelerator architectures. He has published around 20 technical papers in top architecture/systems/networking conferences, and holds several US patents. Before joining Meta, Dr. Yuan was a research scientist at Intel Labs. He earned his PhD in Computer Engineering from the University of Illinois at Urbana-Champaign (UIUC) in 2022.

CSL Seminar: Yifan Yuan, Meta