Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
72f32df
New executor documentation (#192)
mavam Feb 9, 2026
1c9f466
Add MySQL integration documentation and from_mysql operator reference…
mavam Feb 9, 2026
3de8ccc
Add from_tcp operator documentation (#199)
mavam Feb 10, 2026
d9b7347
Add live streaming mode docs for from_mysql (#198)
mavam Feb 12, 2026
8ae462b
Document prepared statements security improvement in from_mysql (#215)
mavam Feb 24, 2026
17b8b2b
Document `route_by` parameter and source mode for `parallel`
jachris Feb 24, 2026
ee80884
Add read_tql operator documentation (#224)
mavam Feb 25, 2026
907e95c
Doc batch operator timeout and multiple schemas (#238)
aljazerzen Mar 9, 2026
3ed77e5
Remove measure real_time argument (#244)
aljazerzen Mar 12, 2026
b43bfda
Merge branch 'main' into topic/new-executor
mavam Mar 15, 2026
2621f79
Use semantic components in branch docs
mavam Mar 15, 2026
6956c8c
Document from_tcp, accept_tcp, and serve_tcp operators (#214)
mavam Mar 16, 2026
b5e2142
Document `$file` let binding for file-based operators (#257)
raxyte Apr 13, 2026
c499f79
Fix duplicate TLS option docs
mavam Apr 14, 2026
2fa6758
Fix remaining doc review issues
mavam Apr 14, 2026
9d44519
Merge remote-tracking branch 'origin/main' into topic/new-executor
mavam Apr 15, 2026
eb1b645
Document accept_http and update HTTP operators (#212)
mavam Apr 15, 2026
b88c467
Document hostname resolution for accept_tcp (#258)
mavam Apr 16, 2026
8ee3905
Document from_nic, read_pcap, and write_pcap (#259)
mavam Apr 16, 2026
2600eb4
Document dns_lookup cache behavior (#262)
mavam Apr 16, 2026
37fcc73
Add `mmap` option to `from_file` docs (#263)
raxyte Apr 16, 2026
e79ec7e
Document `yara` whole-stream semantics (#261)
mavam Apr 16, 2026
4c0b134
Document Feather compression options (#264)
mavam Apr 17, 2026
50e2dd9
Merge remote-tracking branch 'origin/main' into topic/new-executor
mavam Apr 17, 2026
ace1b16
Clarify BITZ operator docs (#270)
mavam Apr 17, 2026
aaf6bcd
Update opensearch operators (#272)
aljazerzen Apr 22, 2026
3bc10df
Document serve_http operator (#276)
aljazerzen Apr 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions src/content/docs/explanations/configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -141,8 +141,8 @@ Tenzir provides node-level TLS configuration that applies to all operators and
connectors using TLS/HTTPS connections. These settings are used by operators
that make outbound connections (e.g., <Op>to_opensearch</Op>,
<Op>to_splunk</Op>, <Op>save_email</Op>)
and those that accept inbound connections (e.g., <Op>load_tcp</Op>,
<Op>save_tcp</Op>).
and those that accept inbound connections (e.g., <Op>accept_tcp</Op>,
<Op>serve_tcp</Op>).

:::note[Use Only When Required]
We do not recommend manually configuring TLS settings unless required for
Expand Down Expand Up @@ -192,9 +192,10 @@ configuration:
- <Op>to_opensearch</Op>: Applies min version and ciphers to HTTPS connections
- <Op>to_splunk</Op>: Applies min version and ciphers to Splunk HEC connections
- <Op>save_email</Op>: Applies min version and ciphers to SMTP connections
- <Op>load_tcp</Op>: Applies min version and ciphers to TLS server mode
- <Op>save_tcp</Op>: Applies min version and ciphers to TLS client and server modes
- <Op>from_opensearch</Op>: Applies min version and ciphers to HTTPS connections
- <Op>accept_tcp</Op>: Applies min version and ciphers to TLS server mode
- <Op>from_tcp</Op>: Applies min version and ciphers to TLS client mode
- <Op>serve_tcp</Op>: Applies min version and ciphers to TLS server mode
- <Op>accept_opensearch</Op>: Applies min version and ciphers to HTTPS connections

## Plugins

Expand Down
178 changes: 101 additions & 77 deletions src/content/docs/guides/collecting/fetch-via-http-and-apis.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,32 +7,33 @@ This guide shows you how to fetch data from HTTP APIs using the
<Op>http</Op> operators. You'll learn to make GET
requests, handle authentication, and implement pagination for large result sets.

## Choosing the Right Operator

Tenzir has two HTTP client operators that share nearly identical options:

- [`from_http`](/reference/operators/from_http) is a **source** operator that
starts a pipeline with an HTTP request. Use it for standalone API calls.
- [`http`](/reference/operators/http) is a **transformation** operator that
enriches events flowing through a pipeline with HTTP responses. Use it when
you have existing data and want to make per-event API lookups.

Most examples in this guide use `from_http`. Unless noted otherwise, the same
options work with `http` as well.

## Basic API Requests

Start with these fundamental patterns for making HTTP requests to APIs.

### Simple GET Requests

To fetch data from an API endpoint, pass the URL as the first parameter to the
`from_http` operator:
To fetch data from an API endpoint, pass the URL as the first parameter:

```tql
from_http "https://api.example.com/data"
```

The operator makes a GET request by default and forwards the response as an
event. The `from_http` operator is an input operator, i.e., it starts a
pipeline. The companion operator `http` is a transformation, allowing you to
specify the URL as a field by referencing an event field that contains the URL:

```tql
from {url: "https://api.example.com/data"}
http url
```

This pattern is useful when processing multiple URLs or when URLs are generated
dynamically. Most of our subsequent examples use `from_http`, as the operator
options are very similar.
event.

### Parsing the HTTP Response Body

Expand Down Expand Up @@ -150,27 +151,35 @@ API tokens, as in the above example.

### TLS and Security

Enable TLS by setting the `tls` parameter to `true` and configure client
certificates using the `certfile` and `keyfile` parameters:
Configure TLS by passing a record to the `tls` parameter with certificate
paths:

```tql
from_http "https://secure-api.example.com/data",
tls=true,
certfile="/path/to/client.crt",
keyfile="/path/to/client.key"
tls={
certfile: "/path/to/client.crt",
keyfile: "/path/to/client.key",
}
```

Use these options when APIs require client certificate authentication.

To skip peer verification (e.g., for self-signed certificates in development):

```tql
from_http "https://dev-api.example.com/data",
tls={skip_peer_verification: true}
```

### Timeout and Retry Configuration

Configure timeouts and retry behavior by setting the `connection_timeout`,
`max_retry_count`, and `retry_delay` parameters:

```tql
from_http "https://api.example.com/data",
timeout=10s,
max_retries=3,
connection_timeout=10s,
max_retry_count=3,
retry_delay=2s
```

Expand All @@ -183,62 +192,45 @@ Use HTTP requests to enrich existing data with information from external APIs.
### Preserving Input Context

Keep original event data while adding API responses by specifying the
`response_field` parameter to control where the response is stored:
`response_field` parameter on the [`http`](/reference/operators/http) operator to
control where the response is stored:

```tql
from {
domain: "example.com",
severity: "HIGH",
api_url: "https://threat-intel.example.com/lookup",
response_field: "threat_data",
}
http f"{api_url}?domain={domain}", response_field=response_field
http f"https://threat-intel.example.com/lookup?domain={domain}",
response_field=threat_data
```

This approach preserves your original data and adds API responses in a specific
field.

### Adding Metadata
### Accessing Response Metadata

Capture HTTP response metadata by specifying the `metadata_field` parameter to
store status codes and headers separately from the response body:
With `from_http`, use the `$response` variable inside a parsing pipeline to
access HTTP status codes and headers:

```tql
from_http "https://api.example.com/status", metadata_field=http_meta
from_http "https://api.example.com/status" {
read_json
status_code = $response.code
server = $response.headers.Server
}
```

The metadata includes status codes and response headers for debugging and
monitoring.

## Pagination and Bulk Processing

Handle APIs that return large datasets across multiple pages.

### Lambda-Based Pagination

Implement automatic pagination by providing a lambda function to the `paginate`
parameter that extracts the next page URL from the response:
With the `http` operator, use the `metadata_field` parameter instead:

```tql
from_http "https://api.example.com/search?q=query",
paginate=(response => "next_page_url" if response.has_more)
from {url: "https://api.example.com/status"}
http url, metadata_field=http_meta
where http_meta.code >= 200 and http_meta.code < 300
```

The operator continues making requests as long as the pagination lambda function
returns a valid URL.

### Complex Pagination Logic

Handle APIs with custom pagination schemes by building pagination URLs
dynamically using expressions that reference response data:

```tql
let $base_url = "https://api.example.com/items"
from_http f"{$base_url}?page=1",
paginate=(x => f"{$base_url}?page={x.page + 1}" if x.page < x.total_pages),
```
## Pagination and Bulk Processing

This example builds pagination URLs dynamically based on response data.
Handle APIs that return large datasets across multiple pages.

### Link Header Pagination

Expand Down Expand Up @@ -271,22 +263,43 @@ from {url: "https://api.github.com/repos/tenzir/tenzir/issues?per_page=10"}
http url, paginate="link"
```

### Lambda-Based Pagination

The [`http`](/reference/operators/http) operator additionally supports
lambda-based pagination for APIs with custom pagination schemes. Provide a
lambda function to the `paginate` parameter that extracts the next page URL from
the response:

```tql
from {query: "tenzir"}
http f"https://api.example.com/search?q={query}",
paginate=(x => x.next_url if x.has_more)
```

The operator continues making requests as long as the pagination lambda returns
a valid URL.

You can also build pagination URLs dynamically:

```tql
let $base = "https://api.example.com/items"
from {category: "security"}
http f"{$base}?category={category}&page=1",
paginate=(x => f"{$base}?category={category}&page={x.page + 1}" if x.page < x.total_pages)
```

### Rate Limiting

Control request frequency by configuring the `paginate_delay` parameter to add
delays between requests and the `parallel` parameter to limit concurrent
requests:

```tql
from {
url: "https://api.example.com/data",
paginate_delay: 500ms,
parallel: 2
}
http url,
paginate="next_url" if has_next,
paginate_delay=paginate_delay,
parallel=parallel
from {domain: "example.com"}
http f"https://api.example.com/scan?q={domain}",
paginate=(x => x.next_url if x.has_next),
paginate_delay=500ms,
parallel=2
```

Use `paginate_delay` and `parallel` to manage request rates appropriately.
Expand All @@ -310,18 +323,18 @@ scenarios.
Monitor API health and response times:

```tql
from_http "https://api.example.com/health", metadata_field=metadata
select date=metadata.headers.Date.parse_time("%a, %d %b %Y %H:%M:%S %Z")
latency = now() - date
from_http "https://api.example.com/health" {
read_json
date = $response.headers.Date.parse_time("%a, %d %b %Y %H:%M:%S %Z")
latency = now() - date
}
```

The above example parses the `Date` header from the HTTP response via
<Fn>parse_time</Fn> into a timestamp and then
compares it to the current wallclock time using the
<Fn>now</Fn> function.

Nit: `%T` is a shortcut for `%H:%M:%S`.

## Error Handling

Handle API errors and failures gracefully in your data pipelines.
Expand All @@ -334,18 +347,28 @@ between retries:

```tql
from_http "https://unreliable-api.example.com/data",
max_retries=5,
max_retry_count=5,
retry_delay=2s
```

### Status Code Handling

Check HTTP status codes by capturing metadata and filtering based on the
`code` field to handle different response types:
Check HTTP status codes using the `$response` variable to handle different
response types:

```tql
from_http "https://api.example.com/data", metadata_field=metadata
where metadata.code >= 200 and metadata.code < 300
from_http "https://api.example.com/data" {
read_json
where $response.code >= 200 and $response.code < 300
}
```

With the `http` operator, use `metadata_field` instead:

```tql
from {url: "https://api.example.com/data"}
http url, metadata_field=meta
where meta.code >= 200 and meta.code < 300
```

## Best Practices
Expand All @@ -358,8 +381,9 @@ Follow these practices for reliable and efficient API integration:
handling transient failures.
3. **Respect rate limits**. Use `parallel` and `paginate_delay` to control
request rates.
4. **Handle errors gracefully**. Check status codes in metadata
(`metadata_field`) and implement fallback logic.
4. **Handle errors gracefully**. Use `$response` in `from_http` parsing
pipelines or `metadata_field` with `http` to check status codes and implement
fallback logic.
5. **Secure credentials**. Access API keys and tokens via
[secrets](/explanations/secrets), not in code.
6. **Monitor API usage**. Track response times and error rates for
Expand Down
Loading
Loading