Andrew A Lamb

Staff Engineer at InfluxData

Apache DataFusion PMC (Chair) | Apache Arrow PMC
Member, Apache Software Foundation

LinkedIn | Github
Last Update: Sep, 2024

I am a software engineer with experience in environments ranging from 2 developers in a VC's office, to large multinational corporations and distributed open source projects (I love small companies). I focus on systems (e.g. databases), and platform engineering, and have been both an architect and manager/VP.

I currently work in Rust on InfluxDB 3.0, focused on query processing, the Apache DataFusion query engine and the Apache Arrow ecosystem. I am honored to serve on the Apache DataFusion PMC (2024 Chair), and Apache Arrow PMC (2023 Chair). I actively contribute to the Apache Arrow DataFusion query engine and the Apache Arrow Rust implementation


Highlights (full list below)

2024-09-23 Carnegie Mellon Univeristy: Database Building Blocks Seminar Series - Fall 2024 Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine (talk) slides, recording

2024-06-19 Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine (alternate download) Andrew Lamb, Yijie Shen, Daniël Heres, Jayjeet Chakraborty, Mehmet Ozan Kabak, Chao Sun, and Liang-Chi Hsieh, 2024 International Conference on Management of Data (SIGMOD 2024), June 9-15, 2024, Santiago, Chile

2012-08-27 The Vertica Analytic Database: C-Store 7 Years Later. Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandiver, Lyric Doshi, Chuck Bear. 38th International Conference on Very Large Data Bases, Proceedings of the VLDB Endowment, Vol. 5, No. 12


Blogs

2024-11-18 [DataFusion Blog] Apache DataFusion is now the fastest single node engine for querying Apache Parquet files

2024-09-03 [InfluxData Blog] Using StringView / German Style Strings to Make Queries Faster: Part 2 - String Operations

2024-09-03 [InfluxData Blog] Using StringView / German Style Strings to Make Queries Faster: Part 1 - Reading Parquet

2024-03-18 [InfluxData Blog] Making Most Recent Value Queries Hundreds of Times Faster

2023-10-25 [InfluxData Blog] Flight, DataFusion, Arrow, and Parquet: Using the FDAP Architecture to build InfluxDB 3.0

2023-08-01 [InfluxData Blog] Aggregating Millions of Groups Fast in Apache Arrow DataFusion (cross post on Arrow Blog )

2022-12-07 [InfluxData Blog] Querying Parquet with Millisecond Latency (cross post on arrow.apache.org/blog )

2022-11-07 [Apache Arrow Blog] Fast and Memory Efficient Multi-Column Sorts in Apache Arrow Rust, Part 2

2022-11-07 [Apache Arrow Blog] Fast and Memory Efficient Multi-Column Sorts in Apache Arrow Rust, Part 1

2022-10-27 [ODBMS.org] On InfluxData's New Storage Engine. Q&A with Andrew Lamb

2022-10-17 [Apache Arrow Blog] Arrow and Parquet Part 3: Arbitrary Nesting with Lists of Structs and Structs of Lists

2022-10-08 [Apache Arrow Blog] Arrow and Parquet Part 2: Nested and Hierarchical Data using Structs and Lists

2022-10-05 [Apache Arrow Blog] Arrow and Parquet Part 1: Primitive Types and Nullability

2022-01-14 [InfluxData Blog] Rust Object Store Donation

2022-01-14 Using Rustlang's Async Tokio Runtime for CPU-Bound Tasks.


Talks and Presentations

2024-10-28 Boston Univeristy: MiDAS Fall 2024 (Data Systems Seminar) Apache DataFusion: Design Choices when Building Modern Analytic Systems slides, slides(pdf), recording

2024-09-27 Belgrade Apache DataFusion Meetup Apache DataFusion: What, Why and How slides, recording

2024-09-23 Carnegie Mellon Univeristy: Database Building Blocks Seminar Series - Fall 2024 Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine (talk) slides, recording

2024-06-26 New York City Apache DataFusion Meetup NYC Meetup slides

2024-06-26 Microsoft Gray Systems Lab: Building InfluxDB 3.0 (and other systems) without starting from "scratch" with Apache DataFusion slides

2024-06-25 San Francisco Bay Area Apache DataFusion Meetup DataFusion Meetup 2.0 - San Francisco slides

2024-06-14 2024 Simplicy in Management of Data (SiMOD) DataFusion: The Case for Building Open Data Systems (keynote) slides

2024-06-13 2024 ACM SIGMOD International Conference on Management of Data Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine (talk) slides, recording, paper

2023-05-09 [ODSC East 2024]: Introduction to Apache Arrow and Apache Parquet, using Python and pyarrow (updated). slides

2024-03-27 DataCouncil 2024: Building InfluxDB 3.0 with Apache Arrow, DataFusion, Flight and Parquet. slides, recording,

2024-03-27 Apache Arrow Datafusion Meetup: Introduction, Agenda, Remarks. slides, recording,

2023-09-27 MIT Database Group: Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust. slides,

2023-06-02 [Dutch Seminar on Database System Design]: Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust. slides, recording,

2023-05-09 [ODSC East 2023]: Introduction to Apache Arrow and Apache Parquet, using Python and pyarrow. slides

2023-04-05 The Apache Arrow DataFusion Architecture Part 3: Physical Plan and Execution. slides, recording,

2023-04-04 The Apache Arrow DataFusion Architecture Part 2: Logical Plans and Expressions. slides, recording,

2023-03-31 The Apache Arrow DataFusion Architecture Part 1: Query Engines. slides, recording,

2023-02-15 [Invited Talk at Optum Labs]: Building a new time series database "from scratch" Using Apache Arrow, Parquet, DataFusion and Rust slides,

2022-06-27 [DataBricks Data+AI Summit]: DataFusion and Arrow: Supercharge Your Data Analytical Tool with a Rusty Query Engine. slides, recording

2022-05-23 [The Data Thread 2022]: Apache Arrow and DataFusion: Changing the Game for Implementing Database Systems. slides, recording

2022-04-06 [EM.S20, MIT Sloan School of Management, Guest Speaker]: Managing Software Dependencies and the Supply Chain. slides

2021-10-13 [InfluxData Tech Talk]: Query Processing in InfluxDB IOx. slides, recording

2021-04-20 [USC CSE-132 Database Systems Implementation, Guest Speaker]: Apache Arrow and its impact on the database industry. slides, recording

2021-03 [InfluxData Tech Talk]: Query Engine Design and the Rust-Based DataFusion in Apache Arrow. slides, slides (slideshare), recording

2020-12-09 [InfluxData Tech Talk]: A Rusty Introduction to Apache Arrow and how it applies to a TimeSeries Database. slides, recording

2013-01-10 [MIT IAP Talk]: Tradeoffs in Massively Parallel Analytical Systems. slides


Journal / Conference Papers

2024-08-26 The Five-Minute Rule for the Cloud: Caching in Analytics Systems Kira Duwe (EPFL), Angelos-Christos Anadiotis (Oracle Zurich), Andrew Lamb (InfluxData), Lucas Lersch (Amazon), Boaz Leskes (MotherDuck), Daniel Ritter (SAP), Pinar Tozun (IT University of Copenhagen) The Conference on Innovative Data Systems Research (CIDR), 2025, Amsterdam, The Netherlands

2024-08-26 POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance (alternate download) David Justen, Daniel Ritter, Campbell Fraser, Andrew Lamb, Allison Lee, Thomas Bodner, Mhd Yamen Haddad, Steffen Zeuch, Volker Markl, and Matthias Boehm, Proc. VLDB Endow. 17, 6 (February 2024), 1350-1363.

2024-06-19 Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine (alternate download) Andrew Lamb, Yijie Shen, Daniël Heres, Jayjeet Chakraborty, Mehmet Ozan Kabak, Chao Sun, and Liang-Chi Hsieh, 2024 International Conference on Management of Data (SIGMOD 2024), June 9-15, 2024, Santiago, Chile

2014-03-31 The Vertica Query Optimizer: The Case for Specialized Query Optimizers. (alternate download) Nga Tran, Andrew Lamb, L. Shrinivas, Sreenath Bodagala and Jaimin Dave, IEEE International Conference on Data Engineering (ICDE - 2014)

2012-08-27 The Vertica Analytic Database: C-Store 7 Years Later. (alternate download) Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandiver, Lyric Doshi, Chuck Bear. 38th International Conference on Very Large Data Bases, Proceedings of the VLDB Endowment, Vol. 5, No. 12

2003-06-08 Linear analysis and optimization of stream programs. (alternate download) Andrew A. Lamb, William Thies and Saman Amarasinghe. ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI)

2002-08-05 A stream compiler for communication-exposed architectures. (alternate download) Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, Saman Amarasinghe. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)


Really Old Content

Old Blog
Six Hertz, Six Bytes
Pre-github projects
Class List
School Projects