format
The format block enables you to define source formats for tables and sources. Formats describe the layout of the source data so that it can be collected into a table.
TipUse backticks (`) to delimit the layout. Tailpipe treats anything in backticks as a non-interpolated string, so you don't have to escape quotes, backslashes, etc.
Formats
You can define a format with the format block:
Format blocks have two labels:
- The format type. This can be a core format type or any format type in any installed plugin
- A name for the format
Plugins may also export preset formats, which may be referenced by name. For example, the Nginx plugin provides the nginx_access_log.combined format, which defines the Nginx default combined log format:
You can list and view details of both your custom formats and the plugin preset formats using the introspection tailpipe format list and tailpipe format show commands.
Format Types
The format type defines the parsing mechanism which should be used. The properties of the format are specific to the format type.
Format types are implemented by plugins. A number of "generic" format types are provided by the core plugin, which is included in every Tailpipe installation. These core format types provide a mechanism for describing file layouts using general-purpose syntax such as regular expressions, Grok, and JSONL.
Any plugin may include a format type to simplify describing the layout of log files specific to the plugin using its "native" syntax. For example, the Nginx plugin provides the nginx_access_log format type. When using the nginx_access_log format, you can specify the layout using the same Nginx log_format as you use in your Nginx configuration files:
You can discover the installed format types with the introspection tailpipe format list command.
Core Plugin Formats
Grok Format
The grok format is used for parsing log lines using Grok patterns, which are a way to parse log lines into structured data.
| Argument | Type | Optional? | Description |
|---|---|---|---|
| layout | String | Required | The Grok pattern that defines how to parse the log line |
| patterns | Map | Optional | A map of custom Grok patterns that can be referenced in the layout. This is optional, and the standard patterns are available out-of-the-box. |
| description | String | Optional | A description of the format |
TipUse the Grok Debugger to help create and test your grok expressions.
Regex Format
The regex format is used to parse log lines using regular expressions with named capture groups.
| Argument | Type | Optional? | Description |
|---|---|---|---|
| layout | String | Required | The regular expression pattern with named capture groups |
| description | String | Optional | A description of the format |
TipUse the RegEx 101 to help create and test your regular expressions.
Delimited Format
The delimited format is used for parsing CSV, TSV, and other delimited file formats. The properties are passed directly to DuckDB, which implements the delimited data parsing.
| Argument | Type | Optional? | Description |
|---|---|---|---|
| delimiter | String | Optional | The character that separates columns |
| header | Boolean | Optional | Whether the file contains a header row |
JSONL Format
The jsonl (JSON Lines) format is used to parse JSON data where each line is a valid JSON object.
Example:
| Argument | Type | Optional? | Description |
|---|---|---|---|
| description | String | Optional | A description of the format |
TipSince the jsonl format type has no arguments other than the description, you may want to use the default format (format.jsonl.default) instead of defining your own.