Selected Papers

How to cite your work

For now, the collection will sit online, although our intent is to publish it through ACM or IEEE. If you would like to cite your retrospective, you can use this:

@incollection{AuthorOneISCA50Retrospective,
  author    = "Author One and Author Two and Author Three",
  editor    = "Jos{\'e} F.\ Mart{\'\i}nez and Lizy K.\ John",
  title     = "{RETROSPECTIVE}: {T}itle Goes Here",
  booktitle = "{ISCA@50 25-Year Retrospective: 1996-2020}",
  month     = june,
  year      = 2023,
  publisher = "{ACM} {SIGARCH} and {IEEE} {TCCA}",
  url       = "https://bit.ly/isca50_retrospective",
}

If you’d like to announce your paper’s selection, you can say something along the lines of: “Selected for inclusion in ISCA@50 25-Year Retrospective: 1996-2020.”

Please remember that only authors of the original paper who contributed to (or at least reviewed) the retrospective may be listed as authors of the retrospective itself. Non-contributing authors are still honored by including the original paper in the collection. If you have any questions, please feel free to reach out to us.

José F. Martínez and Lizy K. John, Editors — June 2023

2020

Accel-sim: an extensible simulation framework for validated GPU modeling
by Mahmoud Khairy, Zhesheng Shen, Tor M. Aamodt, and Timothy G. Rogers
[Retrospective]
Printed microprocessors
by Nathaniel Bleier, Muhammad Husnain Mubarik, Farhan Rasheed, Jasmin Aghassi-Hagmann, Mehdi B. Tahoori, and Rakesh Kumar
[Retrospective]
Genesis: a hardware acceleration framework for genomic data analysis
by Tae Jun Ham, David Bruns-Smith, Brendan Sweeney, Yejin Lee, Seong Hoon Seo, U Gyeong Song, Young H. Oh, Krste Asanovic, Jae W. Lee, and Lisa Wu Wills
[Retrospective]
Hardware-software co-design for brain-computer interfaces
by Ioannis Karageorgos, Karthik Sriram, Ján Veselý, Michael Wu, Marc Powell, David Borton, Rajit Manohar, and Abhishek Bhattacharjee
[Retrospective]
MLPerf Inference Benchmark
by Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, and Yuchen Zhou
[Retrospective]

2019

Sparse ReRAM engine: joint exploration of activation and weight sparsity in compressed neural networks
by Tzu-Hsien Yang, Hsiang-Yun Cheng, Chia-Lin Yang, I-Ching Tseng, Han-Wen Hu, Hung-Sheng Chang, and Hsiang-Pang Li
[Retrospective]
New attacks and defense for encrypted-address cache
by Moinuddin K. Qureshi
[Retrospective]

2018

A configurable cloud-scale DNN processor for real-time AI
by Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, Stephen Heil, Prerak Patel, Adam Sapek, Gabriel Weisz, Lisa Woods, Sitaram Lanka, Steven K. Reinhardt, Adrian M. Caulfield, Eric S. Chung, and Doug Burger
[Retrospective]
Firesim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud
by Sagar Karandikar, Howard Mao, Donggyu Kim, David Biancolin, Alon Amid, Dayeol Lee, Nathan Pemberton, Emmanuel Amaro, Colin Schmidt, Aditya Chopra, Qijing Huang, Kyle Kovacs, Borivoje Nikolic, Randy Katz, Jonathan Bachrach, and Krste Asanović
[Retrospective]
Neural cache: bit-serial in-cache acceleration of deep neural networks
by Charles Eckert, Xiaowei Wang, Jingcheng Wang, Arun Subramaniyan, Ravi Iyer, Dennis Sylvester, David Blaauw, and Reetuparna Das
[Retrospective]
GraFboost: using accelerated flash storage for external graph analytics
by Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu, and Arvind
[Retrospective]
Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network
by Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Vikas Chandra, and Hadi Esmaeilzadeh
[Retrospective]

2017

HeteroOS: OS Design for Heterogeneous Memory Management in Datacenter
by Sudarsun Kannan, Ada Gavrilovska, Vishal Gupta, and Karsten Schwan
[Retrospective]
Clank: Architectural Support for Intermittent Computation
by Matthew Hicks
[Retrospective]
Plasticine: A Reconfigurable Architecture For Parallel Patterns
by Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis,Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun
[Retrospective]
Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism
Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke
[Retrospective]
In-Datacenter Performance Analysis of a Tensor Processing Unit
by Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal,Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin,Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb,Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho,Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski,Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy,James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin,Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami,Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps,Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov,Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian,Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon
[Retrospective]

2016

Cnvlutin: ineffectual-neuron-free deep neural network computing
Jorge Albericio, Patrick Judd, Tayler H. Hetherington, Tor M. Aamodt, Natalie D. Enright Jerger, and Andreas Moshovos
[Retrospective]
PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory
Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie
[Retrospective]
EIE: efficient inference engine on compressed deep neural network
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally
[Retrospective]
Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory
Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay
[Retrospective]
Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks
Yu-Hsin Chen, Joel Emer, and Vivienne Sze
[Retrospective]
ASIC clouds: specializing the datacenter
Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, and Michael Bedford Taylor
[Retrospective]

2015

Profiling a warehouse-scale computer
Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks
[Retrospective]
ShiDianNao: shifting vision processing closer to the sensor
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam
[Retrospective]
A scalable processing-in-memory accelerator for parallel graph processing
Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi
[Retrospective]

2014

Aladdin: a Pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures
Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David Brooks
[Retrospective]
A reconfigurable fabric for accelerating large-scale datacenter services
Andrew Putnam, Adrian Caulfield, Eric Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, Eric Peterson, Aaron Smith, Jason Thong, Phillip Yi Xiao, Doug Burger, Jim Larus, Gopi Prashanth Gopal, and Simon Pope
[Retrospective]
Memory persistency
Steven Pelley, Peter M. Chen, and Thomas F. Wenisch
[Retrospective]
Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors
Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu
[Retrospective]
General-purpose code acceleration with limited-precision analog computation
Renée St. Amant, Amir Yazdanbakhsh, Jongse Park, Bradley Thwaites, Hadi Esmaeilzadeh, Arjang Hassibi, Luis Ceze, and Doug Burger
[Retrospective]

2013

DNA-based molecular architecture with spatially localized components
Richard A. Muscat, Karin Strauss, Luis Ceze, and Georg Seelig
[Retrospective]
Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers
Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang
[Retrospective]
An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms
Jamie Liu, Ben Jaiyen, Yoongu Kim, Chris Wilkerson, and Onur Mutlu
[Retrospective]

2012

Scale-out processors
Pejman Lotfi-Kamran, Boris Grot, Michael Ferdman, Stavros Volos, Onur Kocberber, Javier Picorel, Almutaz Adileh, Djordje Jevdjic, Sachin Idgunji, Emre Ozer, and Babak Falsafi
[Retrospective]
RAIDR: Retention-Aware Intelligent DRAM Refresh
Jamie Liu, Ben Jaiyen, Richard Veras, and Onur Mutlu
[Retrospective]
Scheduling heterogeneous multi-cores through performance impact estimation (PIE)
Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer
[Retrospective]

2011

Dark silicon and the end of multicore scaling
Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger
[Retrospective]
Power management of online data-intensive services
David Meisner, Christopher M. Sadler, Luiz André Barroso, Wolf-Dietrich Weber, and Thomas F. Wenisch
[Retrospective]

2010

High Performance Cache Replacement Using Re-Reference Interval Prediction (RRIP)
Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, and Joel Emer
[Retrospective]
A Dynamically Configurable Coprocessor for Convolutional Neural Networks
Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, and Srihari Cadambi
[Retrospective]
Energy Proportional Datacenter Networks
Dennis Abts, Mike Marty, Philip Wells, Peter Klausler, and Hong Liu
[Retrospective]
Use ECP, not ECC, for hard failures in resistive memories
Stuart Schechter, Gabriel H. Loh, Karin Strauss, and Doug Burger
[Retrospective]
Understanding Sources of Inefficiency in General-purpose Chips
Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz
[Retrospective]
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Victor W. Lee, Changkyu Kim, Jatin Chhugani, Michael Deisher, Daehyun Kim, Anthony D. Nguyen, Nadathur Satish, Mikhail Smelyanskiy, Srinivas Chennupaty, Per Hammarlund, Ronak Singhal, and Pradeep Dubey
[Retrospective]

2009

An Analytical Model for a GPU Architecture with Memory-Level and Thread-Level Parallelism Awareness
Sunpyo Hong and Hyesoon Kim
[Retrospective]
Scalable High Performance Main Memory System Using Phase-Change Memory Technology
Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers
[Retrospective]
Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors
Abhishek Bhattacharjee and Margaret Martonosi

2008

Corona: System Implications of Emerging Nanophotonic Technology
Dana Vantrease, Robert Schreiber, Matteo Monchiero, Moray McLaren, Norman P. Jouppi, Marco Fiorentino, Al Davis, Nathan Binkert, Raymond G. Beausoleil, and Jung Ho Ahn
[Retrospective]
Technology-Driven, Highly-Scalable Dragonfly Topology
John Kim, William J. Dally, Steve Scott, and Dennis Abts
[Retrospective]
3D-Stacked Memory Architectures for Multi-Core Processors
Gabriel H. Loh
[Retrospective]
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
Engin Ipek, Onur Mutlu, José F. Martínez, and Rich Caruana
[Retrospective]

2007

Power Provisioning for a Warehouse-Sized Computer
Xiaobo Fan, Wolf-Dietrich Weber, and Luiz André Barroso
[Retrospective]
Adaptive Insertion Policies for High Performance Caching
Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer
[Retrospective]
New Cache Designs for Thwarting Software Cache-Based Side Channel Attacks
Zhenghong Wang and Ruby B. Lee
[Retrospective]
Power model validation through thermal measurements
Francisco J. Mesa-Martínez, Joseph Nayfach-Battilana, and Jose Renau
[Retrospective]
Core Fusion: Accommodating Software Diversity in Chip Multiprocessors
Engin Ipek, Meyrem Kirman, Nevin Kirman, and José F. Martínez
[Retrospective]
Anton, a special-purpose machine for molecular dynamics simulation
David E. Shaw, Martin M. Deneroff, Ron O. Dror, Jeffrey Kuskin, Richard H. Larson, John K. Salmon, Cliff Young, Brannon Batson, Kevin J. Bowers, Jack C. Chao, Michael P. Eastwood, Joseph Gagliardo, J. P. Grossman, C. Richard Ho, Doug Ierardi, István Kolossváry, John L. Klepeis, Timothy Layman, Christine McLeavey, Mark A. Moraes, Rolf Mueller, Edward C. Priest, Yibing Shan, Jochen Spengler, Michael Theobald, Brian Towles, and Stanley C. Wang

2006

Techniques for Multicore Thermal Management: Classification and New Exploration
James Donald and Margaret Martonosi
[Retrospective]
Ensemble-Level Power Management for Dense Blade Servers
Parthasarathy Ranganathan, Phil Leech, David Irwin, and Jeffrey Chase
[Retrospective]
Bulk Disambiguation of Speculative Threads in Multiprocessors
Luis Ceze, James Tuck, Josep Torrellas, and Calin Cascaval
[Retrospective]

2005

Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Rakesh Kumar, Victor V. Zyuban, and Dean M. Tullsen
[Retrospective]
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Michael Zhang and Krste Asanovic
[Retrospective]
BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging
Satish Narayanasamy, Gilles Pokam, and Brad Calder
[Retrospective]

2004

Transactional Memory Coherence and Consistency
Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu, Honggo Wijaya, Christos Kozyrakis, and Kunle Olukotun
[Retrospective]
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Michael Bedford Taylor, Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt, Ben Greenwald, Henry Hoffmann, Paul Johnson, Jason Kim, James Psota, Arvind Saraf, Nathan Shnidman, Volker Strumpen, Matt Frank, Saman Amarasinghe, and Anant Agarwal
[Retrospective]
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Rakesh Kumar, Dean M. Tullsen, Parthasarathy Ranganathan, Norman P. Jouppi, and Keith I. Farkas
[Retrospective]
Low-Latency Virtual-Channel Routers for On-Chip Networks
Robert Mullins, Andrew West, and Simon Moore
[Retrospective]
The Case for Lifetime Reliability-Aware Microprocessors
Jayanth Srinivasan, Sarita V. Adve, Pradip Bose, Jude A. Rivers
[Retrospective]
A First-Order Superscalar Processor Model
Tejas S. Karkhanis and James E. Smith
[Retrospective]

2003

Temperature-Aware Microarchitecture
Kevin Skadron, Mircea R. Stan, Karthik Sankaranarayanan, Wei Huang, Sivakumar Velusamy, and David Tarjan
[Retrospective]
Transient-fault recovery for chip multiprocessors
Mohamed Gomaa, Chad Scarbrough, T. N. Vijaykumar, and Irith Pomeranz
[Retrospective]
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, and Charles R. Moore
[Retrospective]
DRPM: dynamic speed control for power management in server class disks
Sudhanva Gurumurthi, Anand Sivasubramaniam, Mahmut Kandemir, and Hubertus Franke
[Retrospective]
A “flight data recorder” for enabling full-system multiprocessor deterministic replay
Min Xu, Rastislav Bodik, and Mark D. Hill
[Retrospective]

2002

Drowsy Caches: Simple Techniques for Reducing Leakage Power
Krisztián Flautner, Nam Sung Kim, Steve Martin, David Blaauw, and Trevor Mudge
[Retrospective]
Detailed design and evaluation of redundant multithreading alternatives
Shubhendu Mukherjee, Michael Kontz, and Steven K. Reinhardt
[Retrospective]
Design tradeoffs for the Alpha EV8 conditional branch predictor
André Seznec, Stephen Felix, Venkata Krishnan, and Yiannakis Sazeides
[Retrospective]

2001

Focusing processor policies via critical-path prediction
Brian Fields, Shai Rubin, and Rastislav Bodík
[Retrospective]
Cache decay: exploiting generational behavior to reduce cache leakage power
Stefanos Kaxiras, Zhigang Hu, and Margaret Martonosi
[Retrospective]
NanoFabrics: spatial computing using molecular electronics
Seth Copen Goldstein and Mihai Budiu
[Retrospective]
Dead-block prediction & dead-block correlating prefetchers
An-Chow Lai, Cem Fide, and Babak Falsafi
[Retrospective]

2000

Wattch: A Framework for Architectural-Level Power Analysis and Optimizations
David Brooks, Vivek Tiwari, and Margaret Martonosi
[Retrospective]
Memory access scheduling
Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, and John D. Owens
[Retrospective]
Smart Memories: a modular reconfigurable architecture
Ken Mai, Tim Paaske, Nuwan Jayasena, Ron Ho, William J. Dally, and Mark Horowitz
[Retrospective]

1999

PipeRench: A Coprocessor for Streaming Multimedia Acceleration
Seth Copen Goldstein, Herman Schmit, Matthew Moe, Mihai Budiu, Srihari Cadambi, R. Reed Taylor, and Ronald Laufer
[Retrospective]
Selective value prediction
Brad Calder, Glenn Reinman, and Dean M. Tullsen
[Retrospective]
Performance of image and video processing with general-purpose processors and media ISA extensions
Parthasarathy Ranganathan, Sarita Adve, and Norman P. Jouppi
[Retrospective]
A performance comparison of contemporary DRAM architectures
Vinodh Cuppu, Bruce Jacob, Brian Davis, and Trevor Mudge
[Retrospective]

1998

Pipeline Gating: Speculation Control for Energy Reduction
Srilatha Manne, Artur Klauser, and Dirk Grunwald
[Retrospective]
Memory system characterization of commercial workloads
Luiz André Barroso, Kourosh Gharachorloo, and Edouard Bugnion
[Retrospective]

1997

Complexity-Effective Superscalar Processors
Subbarao Palacharla, Norman P. Jouppi, James E. Smith
[Retrospective]
The SGI Origin: A ccNUMA Highly Scalable Server
James Laudon and Daniel Lenoski
[Retrospective]
Prefetching Using Markov Predictors
Doug Joseph and Dirk Grunwald
[Retrospective]
DAISY: Dynamic Compilation for 100% Architectural Compatibility
Kemal Ebcioglu and Erik Altman
[Retrospective]

1996

Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor
Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, Rebecca L. Stamm
[Retrospective]
Memory Bandwidth Limitations of Future Microprocessors
Doug Burger, James R. Goodman, and Alain Kägi
[Retrospective]
Missing the Memory Wall: The Case for Processor/Memory Integration
Ashley Saulsbury, Fong Pong, and Andreas Nowatzyk