Is This A Lemon?

Methodology

How the data on this site gets from NHTSA to the page you’re reading.

Sources

Pipeline

  1. Parse. Each NHTSA flat file is parsed against a typed schema into a local DuckDB warehouse — roughly 8.8 million rows across recalls, complaints, investigations, bulletins, and ratings.
  2. Aggregate. The warehouse is projected into a small read-only SQLite file. Per-vehicle counts, per-component complaint tallies, and severity rollups are precomputed. Long campaign and investigation descriptions are stored once and joined, not duplicated per affected model.
  3. Render. The site is a static Next.js build that reads only from the shipped SQLite. No data is fetched from NHTSA at request time.
  4. Refresh. A nightly job pulls deltas from NHTSA’s public APIs and rebuilds the SQLite. The site redeploys automatically.

Caveats

For the original data and primary documentation, see nhtsa.gov/nhtsa-datasets-and-apis.