partition
A partition represents data gathered from a source. A given Tailpipe table, like aws_cloudtrail_log, can include multiple partitions. Partitions are defined in HCL and are required for collection.
The partition has two labels:
-
The table name. The table name is meaningful and must match a table name for an installed plugin or a custom table.
-
A partition name. The partition name must be unique for all partitions in a given table (though different tables may use the same partition names).
Arguments
| Argument | Type | Optional? | Description |
|---|---|---|---|
| source | Block | Required | a source from which to collect data. |
| filter | String | Optional | A SQL where clause condition to filter log entries. Supports expressions using table columns. |
| tp_index | String | Optional | The column whose value should be used as tp_index. Defaults to "default" if not specified. This is used in the hive partitioning scheme. |
source
A partition acquires data from a source. The source block specifies the type and location of the source data, as well as the connection to use to connect to it.
The block label denotes the source type - aws_s3_bucket, file, etc. The source types are defined in plugins, and you can view them with the tailpipe source list command. The file source is provided by the core plugin, which is included in every Tailpipe installation.
Source Arguments
The source arguments vary by source type. The Tailpipe Hub provides extended documentation and examples for plugin sources.
| Argument | Type | Optional? | Description |
|---|---|---|---|
| connection | connection reference | Varies by source type | The connection to use to connect to the source. This is required for most sources except file. |
| file_layout | String | Optional | The Grok pattern that defines the log file structure. file_layout is optional if not provided all files at the path(s) from paths will be collected. |
| format | format reference | Optional | The default format of the source data. This must refer to either a format block or a format preset defined by a plugin. If no format is specified, the default for the table will be used. |
| patterns | Map | Optional | A map of custom Grok patterns that can be referenced in the file_layout. This is optional, and the standard patterns are available out-of-the-box. |
format
While the arguments to source vary by type, every source supports specifying a format for the source. This must refer to either a format block or a format preset defined by a plugin. If no format is specified, the default for the table will be used.
file_layout
The arguments to the source vary by type, but many source types include the file_layout argument. The file_layout specifies a Grok pattern that defines the log file directory structure. The file_layout is usually optional; the plugin will provide a default that works for the most common case.
The file_layout is not merely used to locate log files but also to describe any fields that appear in the path that are meaningful to the collection process. Log files are often stored in a predictable, meaningful path structure that can be used to identify the log file dates, accounts, etc. Date and time fields are particularly important, as Tailpipe uses them to maintain collection state so that it can resume collection from its previous checkpoint. When setting file_layout, make sure you identify any date or time fields (year, month, day, hour, minute, second) that appear in the path, for example:
Refer to the documentation for your table on the Tailpipe Hub for examples.
TipUse backticks (`) to delimit the file_layout. Tailpipe treats anything in backticks as a non-interpolated string, so you don't have to escape quotes, backslashes, etc.
file source
The file source enables you to collect files on your local filesystem.
| Argument | Type | Optional? | Description |
|---|---|---|---|
| paths | String | Required | The path to the files to collect. |
| file_layout | String | Optional | The Grok pattern that defines the log file structure. file_layout is optional if not provided all files at the path(s) from paths will be collected. |
| format | format reference | Optional | The default format of the source data. This must refer to either a format block or a format preset defined by a plugin. If no format is specified, the default for the table will be used. |
| patterns | Map | Optional | A map of custom Grok patterns that can be referenced in the file_layout. This is optional, and the standard patterns are available out-of-the-box. |
Examples
You can define a partition that uses the aws_s3_bucket type to collect all the CloudTrail log files from an S3 bucket:
You can use the filter argument to exclude specific log entries with expressions using table columns:
You can use the file_layout argument to scope the set of collected log files using grok patterns. This source block matches only us-east-1 rows.
You can configure the tp_index to use a specific column as the partition index:
Another source type, file, enables you to collect from local log files that you've downloaded. This partition collects the flaws.cloud files.