Tips and Tricks

Take advantage of hive partitioning

You can speed up a query by using a where or join clause to restrict the number of files Tailpipe will read to satisfy the query. This restriction operates at three levels.

Partition. When a table defines more than one partition, you can filter to include only files belonging to that partition.

Index. When a partition defines more than one index, you can filter to include all files belonging to that index.

Date. Each file contains log data for one day. You can filter to include only files for that day.

The hive directory structure enables you to exclude large numbers of Parquet files.

Use common fields

Tailpipe plugins map a subset of log-specific fields to common fields. Use them to correlate across tables, for example to join aws_cloudtrail_log and aws_alb_access_log on the tp_ip field (IP address).

Use JSON functions vs operators

In DuckDB, AND and OR have a higher precedence than -> and ->> (and other keywords), so they’re evaluated first. So this query will result in an error: Conversion Error: Failed to cast value to numerical: ….

You can work around the problem by parenthesizing the JSON expression.

But you may prefer to avoid the problem by using a JSON function.