Learn Tailpipe
Tailpipe is a high-performance data collection and querying tool that makes it easy to collect, store, and analyze log data. With Tailpipe you can:
- Collect logs from various sources and store them efficiently
- Query your data with familiar SQL syntax using Tailpipe (or DuckDB!)
- Use Powerpipe to visualize your logs and run detections
Prerequisites
This tutorial uses the AWS plugin to demonstrate collection and analysis of Cloudtrail logs. First, download and install Tailpipe.
Then install the plugin:
Configure data collection
Tailpipe uses HCL configuration files to define what data to collect. You will need to define a connection that governs how Tailpipe accesses logs. For example:
Tailpipe can use the default AWS credentials from your credential file and/or environment variables; if you can run aws ls s3, for example, then you should be able to collect CloudTrail logs. The AWS plugin documentation describes other access patterns.
You will also need to define a partition which refers to a plugin-defined table (aws_cloudtrail_log) that describes the data found in each line of a Cloudtrail log, and a source that governs how Tailpipe acquires the data that populates the partition.
Create a file, e.g. ~/.tailpipe/config/aws.tpc, with a connection and partition block similar to these examples.
NoteIf you don't have access to live Cloudtrail logs, you can use the flaws.cloud sample logs. To get them:
To source the log data from the .gz file extracted from the tar file, your aws.tpc file won't include a connection block. Its partition block will follow this format:
Info
Collect log data
Now let's collect the logs:
Tailpipe will download the files from the source, decompress and parse them, and add the data to the Tailpipe database in the standard hive file structure.
Query your logs
Tailpipe provides an interactive SQL shell for analyzing your collected data. Run tailpipe query to start the query shell. To see the table that was created:
You can count the records in the table:
or find the oldest and newest records:
This query finds the top 10 IPs:
This query lists Cloudtrail event types for a specified day:
Because we specified tp_date = '2024-11-07', Tailpipe only needs to read one of many files created by the collection process.
What's next?
We've demonstrated basic log collection and analysis with Tailpipe. Here's what to explore next: