Hive

Tailpipe uses hive partitioning to leverage automatic filter pushdown and Tailpipe is opinionated on the layout:

  • The data is written to Parquet files in the workspace directory, with a prescribed directory and filename structure. Other than index the layout is dictated by the Tailpipe core.

  • The plugin may choose the index value, but it is not user-definable

The standard partitioning/hive structure enables efficient queries that only need to read subsets of the hive filtered by index or date.

Index: Custom Partition Key

Each plugin chooses what the index is for a given table. Because the data is laid out into partitions, performance is optimized when the partition appears in a where or join clause. The index provides a way to segment the data to optimize lookup performance in a way that is optimal for the specific plugin. For example, AWS tables index on account id, Azure tables on subscription, and GCP on project id.