SWE Tea - Year 1
Table of Contents
About a year ago, my buddy Casper and I started a paper reading group. We ended up covering 45 papers in the first year, which was a blast. Some papers were duds, and some were absolutely brilliant, so here's a list of the ones that really made us think. You can find the full list in the history page: http://malloc.dog/swetea
1. Infra and Networking
- Jupiter Rising
Singh, Arjun, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, et al. "Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network." In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, 183–97. London United Kingdom: ACM, 2015. https://doi.org/10.1145/2785956.2787508
- This paper was incredible - it shows how Google built an optical switching fabric using actual mirrors to route network traffic. Way cooler than traditional CLOS networks.
- FBOSS
Choi, Sean, Boris Burkov, Alex Eckert, Tian Fang, Saman Kazemkhani, Rob Sherwood, Ying Zhang, and Hongyi Zeng. "FBOSS: Building Switch Software at Scale," 2018
- A peek into Facebook's software switch internals. Probably outdated compared to what's running now, but the architecture is fascinating.
- Slicer
Adya, Atul, Daniel Myers, Jon Howell, Jeremy Elson, Colin Meek, Vishesh Khemani, Stefan Fulger, et al. "Slicer: Auto-Sharding for Datacenter Applications," n.d.
- Slicer shows why consistent hashing is actually terrible at scale, and why a central coordinator can do way better at resource utilization.
2. Distributed Systems
- RocksDB
Dong, Siying, Andrew Kryczka, Yanqin Jin, and Michael Stumm. "RocksDB: Evolution of Development Priorities in a Key-Value Store Serving Large-Scale Applications." ACM Transactions on Storage 17, no. 4 (November 30, 2021): 1–32. https://doi.org/10.1145/3483840
- This paper is massive, but it's worth it. Great deep dive into why RocksDB works so well and where it fits in modern architectures.
- Ceph
Weil, Sage A, Scott A Brandt, Ethan L Miller, Darrell D E Long, and Carlos Maltzahn. "Ceph: A Scalable, High-Performance Distributed File System," n.d
- A fantastic "big ideas" paper. CRUSH, metadata/object store separation, and dynamic metadata management - all ideas that were way ahead of their time.
- DTrace
Cantrill, Bryan M, Michael W Shapiro, and Adam H Leventhal. "Dynamic Instrumentation of Production Systems," n.d
- The OG tracing paper. Modern tools are still trying to catch up to what DTrace did years ago.
3. System Management
- Dynamo
Wu, Qiang, Qingyuan Deng, Lakshmi Ganesh, Chang-Hong Hsu, Yun Jin, Sanjeev Kumar, Bin Li, Justin Meza, and Yee Jiun Song. "Dynamo: Facebook's Data Center-Wide Power Management System," n.d.
- The efficiency numbers aren't mind-blowing, but it's a great look into power management at warehouse scale.
4. Virtualization and Memory
- Xen
Barham, Paul, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. "Xen and the Art of Virtualization." Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles - SOSP '03, 2003, 164. https://doi.org/10.1145/945445.945462
- Another classic "big ideas" paper that laid out what good virtualization should look like.
5. Algorithms and Theory
- ANS
https://kedartatwawadi.github.io/post--ANS/
- ANS is wild stuff, but this explains what compression could be if we really pushed it.