Hive
Tailpipe uses hive partitioning to leverage automatic filter pushdown and Tailpipe is opinionated on the layout:
-
The data is written to Parquet files in the workspace directory, with a prescribed directory and filename structure. Other than index the layout is dictated by the Tailpipe core.
-
The plugin may choose the index value, but it is not user-definable
The standard partitioning/hive structure enables efficient queries that only need to read subsets of the hive filtered by index or date.
Index: Custom Partition Key
Each plugin chooses what the index is for a given table. Because the data is laid out into partitions, performance is optimized when the partition appears in a where or join clause. The index provides a way to segment the data to optimize lookup performance in a way that is optimal for the specific plugin. For example, AWS tables index on account id, Azure tables on subscription, and GCP on project id.