Schedule Overview
8:30 | – | 9:15 | am | Breakfast and registration |
9:15 | – | 9:30 | am | Welcoming Remarks: Prof. José Martínez |
9:30 | – | 10:30 | am | Keynote: “Designing Hybrid Skins” |
by Prof. Cindy Kao, DEA | ||||
10:30 | – | 10:45 | am | Morning Break |
10:45 | – | 11:00 | am | “Ground Texture Localization for Mobile Robots” |
by Aaron Wilhelm | ||||
11:00 | – | 11:15 | am | “Is FPGA a Fit for LLM Inference?” |
by Yixiao Du | ||||
11:15 | – | 11:30 | am | “Recent Digital Compute-In-Memory Designs and SoC-/Chiplet-level Evaluation” |
by Prof. Jae-sun Seo | ||||
11:30 | – | 12:15 | pm | CSL SWOT analysis |
12:15 | – | 1:15 | pm | Lunch + Trivia |
1:15 | – | 2 | pm | Team-building activity |
2:00 | – | 3:00 | pm | Keynote: Your Host is a Distributed System |
by Prof. Rachit Agarwal | ||||
3:00 | – | 3:15 | pm | Afternoon Break |
3:15 | – | 3:30 | pm | “The Plants Have Spoken” |
by Cecilio Cesar Tamarit Camarero | ||||
3:30 | – | 3:45 | pm | “Emergent Collective Locomotion in Physically Entangled Robots” |
by Danna Ma | ||||
3:45 | – | 4:00 | pm | “ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models” |
by Yash Akhauri | ||||
4:00 | – | 4:20 | pm | Poster lightning talks |
4:20 | – | 6:00 | pm | Wine/Cheese and Poster session |
☆☆☆
Morning Keynote
☆☆☆
9.30-10.30am: Keynote by Prof. Cindy Kao, Cornell DEA
Title: Designing Hybrid Skins
Abstract: Hybrid Skins are an emerging form of conformable interface situated at all scales of the human experience. These conformable interfaces are hybrid in their integration of technological function with social and cultural perspectives, blending historical craft with miniaturized robotics, machines, and materials in their development. The resulting skins also serve social, cultural, and technological purposes while supporting the construction of individual identities. This talk examines recent work from the Hybrid Body Lab in designing Hybrid Skins through under-explored approaches of textile robotics, bio-fluid sensing, modular flexible electronics, and sustainable materials exploration. With their seamless and conformable form factor, Hybrid Skins afford unprecedented intimacy to the human experience and an opportunity for us to carefully rethink and redesign how our relationship with technology can, should (or should not) be. By blending engineering, design, and committed engagement with diverse communities, Kao and her lab’s research aims to foster inclusive design for future wearable technology that can celebrate (instead of constrict) the diversity of the human experience.
☆☆☆
Afternoon Keynote
☆☆☆
2.00-3.00pm: Keynote by Prof. Rachit Agarwal, Cornell DEA
Title: Your Host is a Distributed System
Abstract: Processor, memory and peripheral interconnects have been studied for decades in the computer architecture community. The host network integrates these interconnects to enable data transfers within a host. In this talk, I will reflect on my group’s (ongoing) journey that started with a surprising phenomenon observed in a lab experiment—nanosecond-scale inefficiencies within the host network percolating through network protocols and OS to create millisecond-scale impact on distributed applications. I will discuss our work on understanding, characterizing, and resolving the above phenomenon in the lab and in production clusters. I will also discuss how this phenomenon opens up intriguing research questions at the intersection of computer networking, operating systems, and computer architecture.
☆☆☆
Invited Talks, Morning Session
☆☆☆
10.45-11.00am: Invited Talk by Yixiao Du
Title: Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
Abstract: Recent advancements in large language models (LLMs) boasting billions of parameters have generated a significant demand for efficient deployment in inference workloads. While hardware accelerators for Transformer-based models have been extensively studied, the majority of existing approaches rely on temporal architectures that reuse hardware units for different network layers and operators. However, these methods often encounter challenges in achieving low latency due to considerable memory access overhead. We investigate the feasibility and potential of model-specific spatial acceleration for LLM inference on FPGAs. Through our analysis, we can identify the most effective parallelization and buffering schemes for the accelerator and, crucially, determine the scenarios in which FPGA-based spatial acceleration can outperform its GPU-based counterpart. We have implemented BERT and GPT2 on an AMD Alveo U280 FPGA. Experimental results demonstrate our approach can achieve up to 13.4 × speedup when compared to previous FPGA-based accelerators for the BERT model. For GPT generative inference, we attain a 2.2 × speedup compared to DFX, an FPGA overlay, in the prefill stage, while achieving a 1.9 × speedup and a 5.7 × improvement in energy efficiency compared to the NVIDIA A100 GPU in the decode stage.
11.00-11.15am: Invited Talk by Aaron Wilhelm
Title: Ground Texture Localization for Mobile Robots
Abstract: The ability to localize is fundamental to mobile robots with applications including healthcare, security, and delivery robots. While many localization methods such as GPS, lidar, and computer vision with outward facing cameras exist, these approaches can be cost prohibitive or fail in certain scenarios. In this talk I will present our research on ground texture localization, a promising method that provides inexpensive yet high accuracy localization. Our recently published work demonstrated state-of-the-art accuracy while also offering a lightweight version that localizes at 4Hz on a Raspberry Pi 4. Additionally, I will discuss our current research on improving the bag-of-words technique for use in a simultaneous localization and mapping (SLAM) system for ground texture.
11.15-11.30am: Invited Talk by Prof. Jae-sun Seo
Title: Recent Digital Compute-In-Memory Designs and SoC-/Chiplet-level Evaluation
Abstract: In this talk, we present several of our 28nm digital CIM macro chip designs, featuring CIM designs with sparse computing using compressed weights, with configurable floating-point precision, and with a decomposed precision dataflow. While high energy-efficiencies have been reported at the CIM macro levels, it is important to evaluate the energy-efficiency of CIM based accelerators at the system-on-chip (SoC) and chiplet level for practical system considerations. We present collective benchmarking framework development efforts for 2D SoC, 2.5D chiplet, and 3D IC designs, showcasing the benefits and considerations of CIM designs across such different heterogeneous integration systems.
☆☆☆
Invited Talks, Afternoon Session
☆☆☆
3.15-3.30pm: Invited Talk by Cecilio Cesar Tamarit Camarero, PhD
Title: The Plants Have Spoken
Abstract: The Center for Research on Programmable Plant Systems (CROPPS) is now into its fourth year. Cecilio brings us some of the latest updates, including his own efforts in developing faster, more accurate tools for plant bioengineering and how this fits into the larger vision of programmable plants. Lastly, he will cover some of the budding ideas that have been taking root in the center with the hope that it may motivate the diverse CSL audience to branch out into this blooming area.
3.30-3.45pm: Invited Talk by Danna Ma
Title: Emergent Collective Locomotion in Physically Entangled Robots
Abstract: In living systems, complex behaviors such as such as migration, transportation, and manipulation can emerge from simple agent-to-agent interactions without a central controller. For instance, birds in a flock hone in on those closest to them and act as a group without any leadership. At smaller scales, such as social insects or bacteria, these global behaviors often emerge not from explicit communication, but rather through physical perturbations or applied forces between agents. One particularly exciting example involves fire ants and army ants that can create and maintain elaborate living structures by linking their own bodies together, providing a careful balance of viscous and elastic response to external forces. Smart robotic materials which are engineered to act in swarms present the opportunity to study and possibly replicate the capabilities of such natural organisms. Using a bottom-up approach, my work focuses on the design of swarms of physically entangled, simple robots with the goal of mapping local interactions to swarm-wide emergent behaviors. Specifically, I am interested in how these swarms can achieve robust locomotion over complex terrains.
3.45-4.00pm: Invited Talk by Yash Akhauri
Title: ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
Abstract: In this talk, we look at addressing the high power consumption and latency sensitivity in large language models (LLMs) through techniques like quantization and sparsity. Specifically, we focus on contextual sparsity, which adapts pruning based on input to maintain accuracy. We introduce ShadowLLM, a novel predictor that goes beyond traditional magnitude-based criteria to evaluate the importance of attention heads and neurons. This approach results in over a 15% improvement in end-to-end accuracy without increasing latency and achieves up to a 20% speed-up compared to the DejaVu framework.
☆☆☆
Poster Session Entries
☆☆☆
Poster session entry by Sadie Cutler
Title: Leveraging Tethers for Distributed Formation Control of Simple Robots
Abstract: Tethers have great potential in multi-robot systems from enabling retrieval of deployed robots and facilitating power transfer, to use by the robots as a net or partition. In this paper, we show in simulation that tethers can also be used to do distributed formation control on very simple robots. Specifically, our simulated agents are connected in series by un-actuated, flexible, fixed-length tethers and use tether angle and strain, in conjunction with the physical constraints of the tethers, to adjust their position with respect to their neighbors. This presents a significant simplification over traditional formation control which, at a minimum, requires exteroceptive sensors to perceive bearing and/or distance to nearby agents. We present and evaluate an algorithm on a large set of transitions between formations with 5 agents and an example transition with 35 agents. The convergence time grows with the number of agents, however, the memory and computation time per agent remain constant. Future work will investigate the ability to use tethers and strain for reactive behaviors and more diverse tasks.
Poster session entry by Grace Dinh
Title: Benchmarking and Cost Estimation for Sparse Tensor Workloads
Abstract: The performance of sparse tensor programs can significantly vary based on their inputs. As a result, both estimating and bounding their costs (in terms of flop count, communication, or output size) is a nontrivial task required to answer several practical questions, e.g.: – How much buffer space must be allocated for the output of a computation? – What is “peak” performance for a given workload? Is a given implementation performing close to peak, or is there room for additional optimization? – What is the cost of a given algorithm for a combination of input, mapping, and hardware (for optimization or co-design)? This project aims to provide a suite of benchmarking tools and techniques – both theoretical and data-driven – to estimate the cost of sparse tensor operations.
Poster session entry by Hang Gao
Title: High-speed interfacial flight of an insect-scale robot
Abstract: Several insect species are capable of manipulating the air-water interface for locomotion, detection, and communication. Crawling insects like water striders or fishing spiders exploit surface tension to stay on the water surface and use water ripples to sense objects and communicate with others. The waterlily beetle flaps its wings to generate thrust along the air-water interface, and inspired by these semi-aquatic insects, we present the γ-bot, an insect-scale robot designed to operate at the air-water interface. The γ-bot features a flapping-wing mechanism that generates thrust parallel to the water surface, supported by passive legs that utilize surface tension to bear the robot’s weight. Additionally, the flapping-wings can generate water ripples and a mm-scale off-the-shelf IMU can detect the frequency of these wave patterns for sensing obstacles or communicating with another robot. We developed and validated a simple model to characterize the drag forces acting on the vehicle and estimate its velocity in open-loop operation. The 112 mg version robot can achieve maximum velocities of 0.9 m s⁻¹ (equivalent to 15 BL s⁻¹) and can perform both left and right turns, demonstrating high maneuverability. Moreover, the γ-bot can carry an additional payload of up to ~900 mg for onboard power, communication, and environmental detection tasks.
Poster session entry by Leo Han
Title: Understanding the Carbon Footprint of Computing
Abstract: The carbon footprint of the information and communication technology (ICT) industry is 2.1% to 3.9% of global total CO2 emissions per year. To reduce the impact of computing, we must understand both how to design more sustainable computer systems as well as how to use them more sustainably. To help hardware designers make more informed sustainable design decisions, we extend the existing ACT IC carbon modeling framework to account for design and environmental uncertainties. Through another project, we provide cloud users with fair carbon attributions as a first step towards incentivizing sustainable cloud usage.
Poster session entry by Anshuman Mohan
Title: Accelerated Programmable Packet Scheduling
Abstract: Managing modern server and internet traffic requires the scheduling of packets. The recipe for a good scheduler is relatively well-known: maintain line rate, prevent head-of-line blocking, minimize latency, etc. A relatively underapprecaited aspect of a good scheduler is _programmability_, which would allow it to react to changing network conditions, or to be easily extended to support new protocols. I will present ongoing work on a programmable packet scheduler that can express a range of programming policies against a fixed hardware implementation.
Poster session entry by Derin Ozturk
Title: Entomoton Bench: Performance Evaluation of Insect-Scale Robotics Workloads
Abstract: Insect-scale robots hold the potential to revolutionize applications like reconnaissance, search-and-rescue, and environmental monitoring. However, their deployment outside of lab scenarios has been limited due to extreme size, weight, and power constraints. Over the past decade, these robots have lacked onboard processing for control and sensing tasks, with full system integration only recently being considered. Additionally, algorithmic research on these robots often fails to accurately assess energy and latency performance on real hardware, complicating fully integrated system design. With the end of Dennard scaling and the approaching end of Moore’s law, hardware specialization has become crucial for improved performance and energy efficiency in computer systems, especially for domain specific workloads. Insect-scale robots can benefit from such specialization, but the tradeoff between programmability and performance remains critical as algorithms continue to evolve. A well composed benchmark suite and evaluation tools are needed both for computer architects to explore hardware-software co-design and for roboticists to develop performant, fully integrated systems using commercial hardware. We present preliminary work on building a benchmark suite for insect-scale robotics, specifically with varying complexity workloads in monocular pose estimation and sensor fusion for state estimation.
Poster session entry by Neel Patel
Title: Unlocking the Potential of Accelerated Chip Multi-Processors
Abstract: Recent Chip Multi-Processors incorporate several on-chip accelerators, marking the beginning of the Accelerated Chip Multi-Processors (XMP) era in datacenters. However, despite the close proximity of accelerators and general-purpose cores, offloading functions to accelerators introduces several overheads that can nullify the benefits of hardware acceleration. We classify these overheads into offload tax and offload interference. In this paper, our goal is to maximize the throughput of modern XMPs without compromising programmer productivity. To achieve this goal, we develop runtime and hardware support for a high-performance preemptive execution model for applications that utilize accelerators and general purpose cores on XMPs. Our results show up to 80% higher throughput for an Intel Xeon XMP with 20 cores and 4 accelerators.
Poster session entry by Aaron Wilhelm
Title: Lightweight Ground Texture Localization
Abstract: We present a lightweight ground texture based localization algorithm (L-GROUT) that improves the state of the art in performance and can be run in real-time on single board computers without GPU acceleration. Such computers are ubiquitous on small indoor robots and thus this work enables high-precision, millimeter-level localization without instrumenting, marking, or modifying the environment. The key innovations are an improved database feature extraction algorithm, a dimensionality reduction method based on locality preserving projections (LPP) that can accommodate faster-to-compute binary features, and an improved spatial filtering step that better preserves performance when the databases are tuned for lightweight applications. We demonstrate the approach by running the whole system on a low-cost single board computer (Raspberry Pi 4) to produce global localization estimates at greater than 4Hz on an outdoor asphalt dataset.
Poster session entry by Andrew Wilhelm
Title: Constraint Programming for Component-Level Robot Design
Abstract: Effective design automation for building robots would make development faster and easier while also less prone to design errors. However, complex multi-domain constraints make creating such tools difficult. One persistent challenge in achieving this goal of design automation is the fundamental problem of component selection, an optimization problem where, given a general robot model, components must be selected from a possibly large set of catalogs to minimize design objectives while meeting target specifications. Our approach formulates the component selection problem as a combinatorial optimization problem, which does not require any system approximations, and using constraint programming (CP) to solve this problem with a depth-first branch-and-bound algorithm. As the efficacy of CP critically depends upon the orderings of variables and their domain values, we present two heuristics specific to the problem of component selection that significantly improve solve time compared to traditional constraint satisfaction programming heuristics. We also add redundant constraints to the optimization problem to further improve run time by evaluating certain global constraints before all relevant variables are assigned. We demonstrate that our CP approach can find optimal solutions from over 20 trillion candidate solutions in only seconds, up to 48 times faster than an MCDP approach solving the same problem.
Poster session entry by Julie Villamil
Title: Monolithic Fabrication of Crawling Microrobots with Electromagnetic Actuators
Abstract: The development of insect-scale, autonomous legged robotic platforms has been a major field of interest to inspect human-unreachable environments, such as confined spaces or hazardous areas for inspection tasks. We propose the design and development of a cm-scale, autonomous legged robotic platform for environmental monitoring and inspection of confined and unstructured areas. These cm-scale robots present a number of engineering challenges in the design, manufacturing, and autonomous locomotion of the vehicle. While the majority of cm-scale legged robots use high-voltage piezoelectric actuation for their high power density, they require low-mass and low-efficiency step-up converters to achieve untethered locomotion. Low-voltage electromagnetic actuation, while less efficient than their piezoelectric counterparts, eliminate the need for high-voltage step-up converters, simplifying the electrical system, and their high displacements simplify the transmission mechanisms to achieve large stride lengths. We designed planar voice coil actuators, also known as Lorentz force actuators, with the necessary displacement and force output for a cm-scale legged-robot. These actuators were then coupled with laminate-based parallel mechanisms that can be monolithically fabricated, enabling future mass production of these devices. Finally, as tetherless operation is essential in unstructured environments, we characterized commercial off the shelf circuit components in terms of accuracy, bandwidth, mass, power, and compute limitations. Thus, significant results from this project will set new design standards for tetherless systems and high-performance inspection tools.
☆☆☆
Acknowledgement and Sponsorship
☆☆☆
Funding for the CSL retreat was partially provided by Intel Corporation. Thank you to Marie Roller for helping to organize the retreat.