Learn Tailpipe

Tailpipe is a high-performance data collection and querying tool that makes it easy to collect, store, and analyze log data. With Tailpipe, you can:

Collect logs from various sources and store them efficiently
Query your data with familiar SQL syntax using Tailpipe (or DuckDB!)
Use Powerpipe to visualize your logs and run detections

Prerequisites

To get started, you will need to download and install Tailpipe.

This tutorial uses the AWS plugin to demonstrate the collection and analysis of Cloudtrail logs. Let's install the plugin:

Configure data collection

Tailpipe uses HCL configuration files to define what data to collect. Tailpipe will read all files with a .tpc extension from the configuration directory (~/.tailpipe/config by default), so you are free to arrange your config files as you please. A common convention, however, is to create a file per plugin for all the resources that pertain to it. Create a file called ~/.tailpipe/config/aws.tpc for this tutorial.

We will configure Tailpipe to download logs from an S3 bucket, so you will need to define a connection to provide credentials. The AWS plugin documentation on the Tailpipe Hub provides examples. Add a connection to your aws.tpc file, for example:

Now let's add a partition to our aws.tpc file. A partition is used to collect data from a source and import it into a table.

Note that the partition has two labels. The first one (aws_cloudtrail_log) is the name of a table to import the data into, and the second is a unique name for the partition. This table is defined by the AWS plugin that we installed earlier. You can view the available tables with the tailpipe table list command.

The partition also includes a source that uses the connection that we created earlier to download the logs from a bucket. The block label denotes the source type (aws_s3_bucket). The source types are defined in plugins, and you can view them with the tailpipe source list command:

The Tailpipe Hub provides documentation and examples of how to configure the source.

Note

If you don't have access to live Cloudtrail logs, you can use the flaws.cloud sample logs instead. To get them:

Then add a partition with file source to your aws.tpc file that point to the extracted files:

Collect log data

Now, let's collect the logs:

Tailpipe will download the files from the source, decompress and parse them, and add the data to the Tailpipe database in the standard hive file structure.

Note

By default, Tailpipe will only collect the last 7 days during the initial collection. You can override this behavior by passing the --from argument, e.g. tailpipe collect aws_cloudtrail_log.prod --from T-180d

Query your logs

Tailpipe provides an interactive SQL shell for analyzing your collected data. Run tailpipe query to start the query shell. To see the table that was created:

You can query the logs using SQL. And you don't have to start from scratch — The Tailpipe Hub provides hundreds of sample queries!

You can get statistics, like the count of records in the table:

or the most frequent error codes

or the activity by day.

Or look for suspicious activity, like root events

or unsuccessful AWS console login attempts

What's next?

We've demonstrated basic log collection and analysis with Tailpipe. Here's what to explore next: