Data Storage

Tailpipe uses a hive-partitioned storage structure that organizes data for efficient querying. Let's look at how data is stored:

The structure has several key components:

  • Partition: Groups data by source (e.g., nginx_access_log)
  • Index: Sub-divides data by a meaningful key (e.g., server name for NGINX logs)
  • Date: Further partitions data by date
  • Each partition contains parquet files with the actual log data

This hierarchical structure enables efficient querying through partition pruning. When you query with conditions on tp_partition, tp_index, or tp_date, Tailpipe (and DuckDB) can skip reading irrelevant parquet files entirely.