97 Things Every Data Engineer Should Know: Collective Wisdom from the Experts

August 2021

Abstract

With this in-depth book, data engineers will learn powerful, real-world best practices for managing data—both big and small. Contributors from companies including Google, Microsoft, IBM, Facebook, Databricks, and GitHub share their experiences and lessons learned on cleaning, prepping, wrangling, and storing data.

I contributed the six chapters on topics ranging from data documentation, community building, field naming, documentation, and validation.

Type

Book section

Publication

O’Reilly Media

I contributed six chapters to the book:

Develop communities - not just code: On building developing communities along with code bases and empowering versus patronizing your data product’s customers
Give data products a front-end with latent documentation: On low effort practices for improving data documentation and usability
There’s no such thing as data quality: On the value of data “fit for purpose”
The many meanings of missingness: On causes and consequences of null field encoding
Column names as contracts: On embedding metadata and performance “contracts” in column names
Data validation needs more than summary statistics: On the importance of context-informed data validation

data