diff --git a/src/content/docs/explanations/configuration.mdx b/src/content/docs/explanations/configuration.mdx index 21a32fb14..8edecbc93 100644 --- a/src/content/docs/explanations/configuration.mdx +++ b/src/content/docs/explanations/configuration.mdx @@ -141,8 +141,8 @@ Tenzir provides node-level TLS configuration that applies to all operators and connectors using TLS/HTTPS connections. These settings are used by operators that make outbound connections (e.g., to_opensearch, to_splunk, save_email) -and those that accept inbound connections (e.g., load_tcp, -save_tcp). +and those that accept inbound connections (e.g., accept_tcp, +serve_tcp). :::note[Use Only When Required] We do not recommend manually configuring TLS settings unless required for @@ -192,9 +192,10 @@ configuration: - to_opensearch: Applies min version and ciphers to HTTPS connections - to_splunk: Applies min version and ciphers to Splunk HEC connections - save_email: Applies min version and ciphers to SMTP connections -- load_tcp: Applies min version and ciphers to TLS server mode -- save_tcp: Applies min version and ciphers to TLS client and server modes -- from_opensearch: Applies min version and ciphers to HTTPS connections +- accept_tcp: Applies min version and ciphers to TLS server mode +- from_tcp: Applies min version and ciphers to TLS client mode +- serve_tcp: Applies min version and ciphers to TLS server mode +- accept_opensearch: Applies min version and ciphers to HTTPS connections ## Plugins diff --git a/src/content/docs/guides/collecting/fetch-via-http-and-apis.mdx b/src/content/docs/guides/collecting/fetch-via-http-and-apis.mdx index d413f055a..312e3ff69 100644 --- a/src/content/docs/guides/collecting/fetch-via-http-and-apis.mdx +++ b/src/content/docs/guides/collecting/fetch-via-http-and-apis.mdx @@ -7,32 +7,33 @@ This guide shows you how to fetch data from HTTP APIs using the http operators. You'll learn to make GET requests, handle authentication, and implement pagination for large result sets. +## Choosing the Right Operator + +Tenzir has two HTTP client operators that share nearly identical options: + +- [`from_http`](/reference/operators/from_http) is a **source** operator that + starts a pipeline with an HTTP request. Use it for standalone API calls. +- [`http`](/reference/operators/http) is a **transformation** operator that + enriches events flowing through a pipeline with HTTP responses. Use it when + you have existing data and want to make per-event API lookups. + +Most examples in this guide use `from_http`. Unless noted otherwise, the same +options work with `http` as well. + ## Basic API Requests Start with these fundamental patterns for making HTTP requests to APIs. ### Simple GET Requests -To fetch data from an API endpoint, pass the URL as the first parameter to the -`from_http` operator: +To fetch data from an API endpoint, pass the URL as the first parameter: ```tql from_http "https://api.example.com/data" ``` The operator makes a GET request by default and forwards the response as an -event. The `from_http` operator is an input operator, i.e., it starts a -pipeline. The companion operator `http` is a transformation, allowing you to -specify the URL as a field by referencing an event field that contains the URL: - -```tql -from {url: "https://api.example.com/data"} -http url -``` - -This pattern is useful when processing multiple URLs or when URLs are generated -dynamically. Most of our subsequent examples use `from_http`, as the operator -options are very similar. +event. ### Parsing the HTTP Response Body @@ -150,18 +151,26 @@ API tokens, as in the above example. ### TLS and Security -Enable TLS by setting the `tls` parameter to `true` and configure client -certificates using the `certfile` and `keyfile` parameters: +Configure TLS by passing a record to the `tls` parameter with certificate +paths: ```tql from_http "https://secure-api.example.com/data", - tls=true, - certfile="/path/to/client.crt", - keyfile="/path/to/client.key" + tls={ + certfile: "/path/to/client.crt", + keyfile: "/path/to/client.key", + } ``` Use these options when APIs require client certificate authentication. +To skip peer verification (e.g., for self-signed certificates in development): + +```tql +from_http "https://dev-api.example.com/data", + tls={skip_peer_verification: true} +``` + ### Timeout and Retry Configuration Configure timeouts and retry behavior by setting the `connection_timeout`, @@ -169,8 +178,8 @@ Configure timeouts and retry behavior by setting the `connection_timeout`, ```tql from_http "https://api.example.com/data", - timeout=10s, - max_retries=3, + connection_timeout=10s, + max_retry_count=3, retry_delay=2s ``` @@ -183,62 +192,45 @@ Use HTTP requests to enrich existing data with information from external APIs. ### Preserving Input Context Keep original event data while adding API responses by specifying the -`response_field` parameter to control where the response is stored: +`response_field` parameter on the [`http`](/reference/operators/http) operator to +control where the response is stored: ```tql from { domain: "example.com", severity: "HIGH", - api_url: "https://threat-intel.example.com/lookup", - response_field: "threat_data", } -http f"{api_url}?domain={domain}", response_field=response_field +http f"https://threat-intel.example.com/lookup?domain={domain}", + response_field=threat_data ``` This approach preserves your original data and adds API responses in a specific field. -### Adding Metadata +### Accessing Response Metadata -Capture HTTP response metadata by specifying the `metadata_field` parameter to -store status codes and headers separately from the response body: +With `from_http`, use the `$response` variable inside a parsing pipeline to +access HTTP status codes and headers: ```tql -from_http "https://api.example.com/status", metadata_field=http_meta +from_http "https://api.example.com/status" { + read_json + status_code = $response.code + server = $response.headers.Server +} ``` -The metadata includes status codes and response headers for debugging and -monitoring. - -## Pagination and Bulk Processing - -Handle APIs that return large datasets across multiple pages. - -### Lambda-Based Pagination - -Implement automatic pagination by providing a lambda function to the `paginate` -parameter that extracts the next page URL from the response: +With the `http` operator, use the `metadata_field` parameter instead: ```tql -from_http "https://api.example.com/search?q=query", - paginate=(response => "next_page_url" if response.has_more) +from {url: "https://api.example.com/status"} +http url, metadata_field=http_meta +where http_meta.code >= 200 and http_meta.code < 300 ``` -The operator continues making requests as long as the pagination lambda function -returns a valid URL. - -### Complex Pagination Logic - -Handle APIs with custom pagination schemes by building pagination URLs -dynamically using expressions that reference response data: - -```tql -let $base_url = "https://api.example.com/items" -from_http f"{$base_url}?page=1", - paginate=(x => f"{$base_url}?page={x.page + 1}" if x.page < x.total_pages), -``` +## Pagination and Bulk Processing -This example builds pagination URLs dynamically based on response data. +Handle APIs that return large datasets across multiple pages. ### Link Header Pagination @@ -271,6 +263,31 @@ from {url: "https://api.github.com/repos/tenzir/tenzir/issues?per_page=10"} http url, paginate="link" ``` +### Lambda-Based Pagination + +The [`http`](/reference/operators/http) operator additionally supports +lambda-based pagination for APIs with custom pagination schemes. Provide a +lambda function to the `paginate` parameter that extracts the next page URL from +the response: + +```tql +from {query: "tenzir"} +http f"https://api.example.com/search?q={query}", + paginate=(x => x.next_url if x.has_more) +``` + +The operator continues making requests as long as the pagination lambda returns +a valid URL. + +You can also build pagination URLs dynamically: + +```tql +let $base = "https://api.example.com/items" +from {category: "security"} +http f"{$base}?category={category}&page=1", + paginate=(x => f"{$base}?category={category}&page={x.page + 1}" if x.page < x.total_pages) +``` + ### Rate Limiting Control request frequency by configuring the `paginate_delay` parameter to add @@ -278,15 +295,11 @@ delays between requests and the `parallel` parameter to limit concurrent requests: ```tql -from { - url: "https://api.example.com/data", - paginate_delay: 500ms, - parallel: 2 -} -http url, - paginate="next_url" if has_next, - paginate_delay=paginate_delay, - parallel=parallel +from {domain: "example.com"} +http f"https://api.example.com/scan?q={domain}", + paginate=(x => x.next_url if x.has_next), + paginate_delay=500ms, + parallel=2 ``` Use `paginate_delay` and `parallel` to manage request rates appropriately. @@ -310,9 +323,11 @@ scenarios. Monitor API health and response times: ```tql -from_http "https://api.example.com/health", metadata_field=metadata -select date=metadata.headers.Date.parse_time("%a, %d %b %Y %H:%M:%S %Z") -latency = now() - date +from_http "https://api.example.com/health" { + read_json + date = $response.headers.Date.parse_time("%a, %d %b %Y %H:%M:%S %Z") + latency = now() - date +} ``` The above example parses the `Date` header from the HTTP response via @@ -320,8 +335,6 @@ The above example parses the `Date` header from the HTTP response via compares it to the current wallclock time using the now function. -Nit: `%T` is a shortcut for `%H:%M:%S`. - ## Error Handling Handle API errors and failures gracefully in your data pipelines. @@ -334,18 +347,28 @@ between retries: ```tql from_http "https://unreliable-api.example.com/data", - max_retries=5, + max_retry_count=5, retry_delay=2s ``` ### Status Code Handling -Check HTTP status codes by capturing metadata and filtering based on the -`code` field to handle different response types: +Check HTTP status codes using the `$response` variable to handle different +response types: ```tql -from_http "https://api.example.com/data", metadata_field=metadata -where metadata.code >= 200 and metadata.code < 300 +from_http "https://api.example.com/data" { + read_json + where $response.code >= 200 and $response.code < 300 +} +``` + +With the `http` operator, use `metadata_field` instead: + +```tql +from {url: "https://api.example.com/data"} +http url, metadata_field=meta +where meta.code >= 200 and meta.code < 300 ``` ## Best Practices @@ -358,8 +381,9 @@ Follow these practices for reliable and efficient API integration: handling transient failures. 3. **Respect rate limits**. Use `parallel` and `paginate_delay` to control request rates. -4. **Handle errors gracefully**. Check status codes in metadata - (`metadata_field`) and implement fallback logic. +4. **Handle errors gracefully**. Use `$response` in `from_http` parsing + pipelines or `metadata_field` with `http` to check status codes and implement + fallback logic. 5. **Secure credentials**. Access API keys and tokens via [secrets](/explanations/secrets), not in code. 6. **Monitor API usage**. Track response times and error rates for diff --git a/src/content/docs/guides/collecting/get-data-from-the-network.mdx b/src/content/docs/guides/collecting/get-data-from-the-network.mdx index f20c9a073..473229bec 100644 --- a/src/content/docs/guides/collecting/get-data-from-the-network.mdx +++ b/src/content/docs/guides/collecting/get-data-from-the-network.mdx @@ -8,29 +8,31 @@ capture raw packets from network interfaces. ## TCP sockets -The [Transmission Control Protocol (TCP)](/integrations/tcp) provides reliable, -ordered byte streams. Use TCP when you need guaranteed delivery and message -ordering. +tcp provides reliable, ordered byte streams. Use TCP +when you need guaranteed delivery and message ordering. ### Listen for connections -Start a TCP server that accepts incoming connections: +Use accept_tcp to start a TCP server that accepts incoming +connections: ```tql -from "tcp://0.0.0.0:9000" { +accept_tcp "0.0.0.0:9000" { read_json } ``` This listens on all interfaces (`0.0.0.0`) on port 9000. Specify a parsing -pipeline to convert incoming bytes to events. +pipeline to convert incoming bytes to events. Inside the nested pipeline, +`$peer.ip` and `$peer.port` identify the connecting client. Set +`resolve_hostnames=true` to also expose `$peer.hostname` from reverse DNS. ### Connect to a remote server -Act as a TCP client by connecting to an existing server: +Use from_tcp to connect to an existing server: ```tql -from "tcp://192.168.1.100:9000", connect=true { +from_tcp "192.168.1.100:9000" { read_json } ``` @@ -40,7 +42,7 @@ from "tcp://192.168.1.100:9000", connect=true { Secure your TCP connections with TLS by passing a `tls` record: ```tql -from "tcp://0.0.0.0:9443", tls={certfile: "cert.pem", keyfile: "key.pem"} { +accept_tcp "0.0.0.0:9443", tls={certfile: "cert.pem", keyfile: "key.pem"} { read_json } ``` @@ -56,9 +58,8 @@ For production TLS configuration, including mutual TLS and cipher settings, see ## UDP sockets -The [User Datagram Protocol (UDP)](/integrations/udp) is a connectionless -protocol ideal for high-volume, loss-tolerant data like syslog messages or -metrics. +udp is a connectionless protocol ideal for +high-volume, loss-tolerant data like syslog messages or metrics. ### Receive UDP datagrams @@ -99,8 +100,8 @@ this = { ## Packet capture -Capture raw network packets from a [network interface card (NIC)](/integrations/nic) -for deep packet inspection or network forensics. +Capture raw network packets with nic for deep +packet inspection or network forensics. ### List available interfaces @@ -169,13 +170,10 @@ correlating network flows across different tools. ### Filter by BPF expression -Apply Berkeley Packet Filter (BPF) expressions to capture only specific -traffic: +Use a Berkeley Packet Filter (BPF) expression to drop unwanted traffic before Tenzir parses packets: ```tql -from_nic filter="tcp port 443", "eth0" { - read_pcap -} +from_nic "eth0", filter="tcp port 443" ``` ### Read PCAP files diff --git a/src/content/docs/guides/node-setup/configure-tls.mdx b/src/content/docs/guides/node-setup/configure-tls.mdx index dde429be4..90774dc47 100644 --- a/src/content/docs/guides/node-setup/configure-tls.mdx +++ b/src/content/docs/guides/node-setup/configure-tls.mdx @@ -21,9 +21,9 @@ tenzir: ``` These settings apply automatically to operators like from_http, -load_tcp, save_tcp, -to_opensearch, from_opensearch, -to_splunk, save_email, +accept_tcp, from_tcp, +serve_tcp, to_opensearch, +accept_opensearch, to_splunk, save_email, and to_fluent_bit. ### Available options diff --git a/src/content/docs/guides/parsing/parse-binary-data.mdx b/src/content/docs/guides/parsing/parse-binary-data.mdx index 270a4fb7f..522e51c45 100644 --- a/src/content/docs/guides/parsing/parse-binary-data.mdx +++ b/src/content/docs/guides/parsing/parse-binary-data.mdx @@ -4,7 +4,7 @@ title: Parse binary data This guide shows you how to parse binary data formats into structured events. You'll learn to work with columnar formats like Parquet and Feather, packet -captures in PCAP format, Tenzir's native Bitz format, and compressed data. +captures in PCAP format, Tenzir's native BITZ format, and compressed data. The examples use from_file with a [parsing subpipeline](/reference/programs#parsing-subpipelines) to illustrate @@ -66,9 +66,9 @@ from_file "capture.pcap" { {linktype: 1, timestamp: 2024-01-15T10:30:45.123456Z, captured_packet_length: 74, original_packet_length: 74, data: "ABY88f1tZJ7zvttmCABFAAA8..."} ``` -Use `from_nic` to parse directly from a live interface. TQL furhter comes with light-weight packet processing functions. For -example, you can extract protocol headers from raw packet data using the -decapsulate function: +Use from_nic to parse directly from a live interface. TQL also +includes lightweight packet processing functions. For example, you can extract +protocol headers from raw packet data using the decapsulate function: ```tql from_file "capture.pcap" { @@ -81,10 +81,12 @@ packet = decapsulate(this) {packet: {ether: {src: "64-9E-F3-BE-DB-66", dst: "00-16-3C-F1-FD-6D", type: 2048}, ip: {src: "192.168.1.100", dst: "10.0.0.1", type: 6}, tcp: {src_port: 54321, dst_port: 443}, community_id: "1:YXWfTYEyYLKVv5Ge4WqijUnKTrM="}} ``` -## Bitz +## BITZ -Bitz is Tenzir's native columnar format, optimized for schema-rich security -data. Use read_bitz to parse it: +BITZ, short for **Bi**nary **T**en**z**ir, is Tenzir's native columnar format, +optimized for schema-rich security data. Use read_bitz to parse it. +Use write_bitz to serialize events into the same format for later +reuse: ```tql from_file "archive.bitz" { diff --git a/src/content/docs/guides/routing/expose-data-as-server.mdx b/src/content/docs/guides/routing/expose-data-as-server.mdx new file mode 100644 index 000000000..e307b5c2f --- /dev/null +++ b/src/content/docs/guides/routing/expose-data-as-server.mdx @@ -0,0 +1,101 @@ +--- +title: Expose data as a server +--- + +import Op from '@components/see-also/Op.astro'; + +This guide shows you how to make pipeline data available to external consumers +by starting an HTTP server. You'll learn how to stream serialized pipeline +output to HTTP clients, pick a wire format, and configure connection limits and +TLS. + +## Spin up an HTTP server + +Use serve_http at the end of a pipeline to start an HTTP server. The +nested pipeline chooses how to serialize your events: + +```tql +from_file "example.yaml" +serve_http "0.0.0.0:8080" { + write_ndjson +} +``` + +Any HTTP client connecting to `http://host:8080/` receives a continuous NDJSON +stream. Each event is JSON-encoded on a single line, separated by newlines: + +```bash +curl http://localhost:8080/ +``` + +```json +{"timestamp":"2025-01-15T10:30:00Z","src_ip":"192.168.1.100","event":"login"} +{"timestamp":"2025-01-15T10:30:01Z","src_ip":"10.0.0.50","event":"file_access"} +``` + +Multiple clients can connect simultaneously. Each connected client receives a +copy of the bytes produced after it connects. + +### Choose a wire format + +Use the nested pipeline to control the response body format and content type. +For example, use write_lines to stream plain text instead of NDJSON: + +```tql +from_file "alerts.txt" +serve_http "0.0.0.0:8080" { + write_lines +} +``` + +You can also add filters or transforms before the printer: + +```tql +from_file "alerts.json" +where severity == "high" +serve_http "0.0.0.0:8080" { + write_ndjson +} +``` + +### Understand delivery semantics + +`serve_http` does not buffer output for future clients. A client only receives +bytes produced after it connects. + +If a client cannot keep up with the producer, Tenzir may disconnect it to keep +memory usage bounded. + +### Connection limits + +Control the maximum number of simultaneous client connections: + +```tql +from_file "data.csv" +serve_http "0.0.0.0:8080", max_connections=10 { + write_ndjson +} +``` + +Additional clients wait until a connection slot becomes available. + +### TLS encryption + +Serve data over HTTPS by providing TLS certificates: + +```tql +from_file "secret.json" +serve_http "0.0.0.0:8443", + tls={ + certfile: "/path/to/cert.pem", + keyfile: "/path/to/key.pem", + } { + write_ndjson +} +``` + +## See Also + +- serve_http +- serve_tcp +- to_http diff --git a/src/content/docs/guides/routing/load-balance-pipelines.mdx b/src/content/docs/guides/routing/load-balance-pipelines.mdx index 094fc4004..1b859a988 100644 --- a/src/content/docs/guides/routing/load-balance-pipelines.mdx +++ b/src/content/docs/guides/routing/load-balance-pipelines.mdx @@ -17,8 +17,7 @@ nested pipelines, enabling you to spread load across multiple destinations. let $endpoints = ["host1:8080", "host2:8080", "host3:8080"] subscribe "events" load_balance $endpoints { - write_json - save_tcp $endpoints + to_tcp $endpoints { write_json } } ``` @@ -36,8 +35,7 @@ let $cfg = ["192.168.0.30:8080", "192.168.0.31:8080"] subscribe "input" load_balance $cfg { - write_json - save_tcp $cfg + to_tcp $cfg { write_json } } ``` diff --git a/src/content/docs/guides/routing/send-to-destinations.mdx b/src/content/docs/guides/routing/send-to-destinations.mdx index 171f54727..5e0deb9f4 100644 --- a/src/content/docs/guides/routing/send-to-destinations.mdx +++ b/src/content/docs/guides/routing/send-to-destinations.mdx @@ -57,8 +57,8 @@ to_opensearch "https://opensearch.example.com:9200", ### Cloud services -Route events to cloud destinations like [Amazon SQS](/integrations/amazon/sqs) -and [Google Cloud Pub/Sub](/integrations/google/cloud-pubsub). +Route events to cloud destinations like amazon/sqs +and google/cloud-pubsub. Send to SQS: @@ -105,8 +105,7 @@ save_file "s3://bucket/logs/events.jsonl" Send NDJSON over tcp: ```tql -write_json -save_tcp "collector.example.com:5044" +to_tcp "collector.example.com:5044" { write_json } ``` ## Expression-based serialization diff --git a/src/content/docs/integrations/http.mdx b/src/content/docs/integrations/http.mdx index 18bf50774..9af581c6d 100644 --- a/src/content/docs/integrations/http.mdx +++ b/src/content/docs/integrations/http.mdx @@ -4,47 +4,55 @@ sidebar: label: HTTP(S) --- -Tenzir supports HTTP and HTTPS, both as sender and receiver. +[HTTP](https://en.wikipedia.org/wiki/HTTP) is the foundation of data exchange +on the web. Tenzir provides operators for all sides of an HTTP conversation: +fetching data from APIs, sending events to webhooks, streaming pipeline output +to clients, and accepting incoming requests. + +## Fetching data from APIs When retrieving data from an API or website, you prepare your HTTP request and get back the HTTP response body as your pipeline data: ![HTTP from](http-from.svg) -When sending data from a pipeline to an API or website, the events in the -pipeline make up the HTTP request body. If the HTTP status code is not 2\*\*, -you will get a warning. +Use [`from_http`](/reference/operators/from_http) to issue a one-shot HTTP +request, or [`http`](/reference/operators/http) to enrich events flowing through +a pipeline with HTTP responses. Both operators automatically infer the response +format from the URL extension or `Content-Type` header. + +See the [Fetch via HTTP and APIs](/guides/collecting/fetch-via-http-and-apis) +guide for practical examples covering authentication, pagination, error +handling, and data enrichment. -![HTTP from](http-to.svg) +## Sending data to webhooks and APIs -In both cases, you can only provide static header data. +Use [`to_http`](/reference/operators/to_http) to send events as HTTP requests to +a webhook or API endpoint. Each input event is sent as a separate request, with +the event JSON-encoded as the body by default. This is useful for pushing alerts +to webhooks, forwarding events to SIEMs, or calling external APIs for each +event. -Use from_http to perform HTTP requests or -run an HTTP server. This operator automatically tries to infer the format from the -`Content-Type` header. For sending, use -save_http with a write operator. +![HTTP to](http-to.svg) -## Examples +## Streaming data to HTTP clients -### Perform a GET request with URL parameters +Use serve_http to start an HTTP server that streams the bytes produced +by a nested pipeline to connected clients. For example, use +write_ndjson when you want NDJSON over HTTP or write_lines +when you want plain text. -```tql -from_http "http://example.com:8888/api?query=tenzir" -``` +See the routing/expose-data-as-server guide for practical +examples covering serialization, connection limits, and TLS. -### Perform a POST request with JSON body +## Accepting incoming requests -```tql -from_http "http://example.com:8888/api", method="post", body={query: "tenzir"} -``` +Use [`accept_http`](/reference/operators/accept_http) to spin up an HTTP server +that turns incoming requests into pipeline events. This is useful for receiving +webhooks, building custom API endpoints, or ingesting data pushed by external +systems. -### Call a webhook API with pipeline data +## SSL/TLS -```tql -from { - x: 42, - y: "foo", -} -write_json -save_http "http://example.com:8888/api", method="POST" -``` +All HTTP operators support TLS. Pass `tls={}` to enable TLS with defaults, or +provide a record with specific options like `certfile` and `keyfile`. diff --git a/src/content/docs/integrations/index.mdx b/src/content/docs/integrations/index.mdx index 80d263c57..da5001fed 100644 --- a/src/content/docs/integrations/index.mdx +++ b/src/content/docs/integrations/index.mdx @@ -9,43 +9,33 @@ packages at the top to native protocol connectors at the core. ## Packages -packages are 1-click deployable integrations that deliver instant value. -They bundle pipelines, [enrichment contexts](/explanations/enrichment/), and -configurations for common security tools like Splunk, CrowdStrike, Elastic, -SentinelOne, Palo Alto, and many more. +packages are 1-click deployable integrations that +deliver instant value. They bundle pipelines, +enrichment, and configurations for common security +tools like Splunk, CrowdStrike, Elastic, SentinelOne, Palo Alto, and many more. -Browse our freely available [package library on -GitHub](https://github.com/tenzir/library). +Browse our freely available [package library on GitHub](https://github.com/tenzir/library). You can also use ai-workbench/use-agent-skills to generate custom packages with AI assistance. ## Core Integrations Core integrations are native connectors to the ecosystem, enabling communication over numerous protocols and APIs: -- **Cloud storage**: amazon/s3, - [GCS](/integrations/google/cloud-storage), - microsoft/azure-blob-storage -- **Message queues**: kafka, - amazon/sqs, amqp -- **Databases**: snowflake, - clickhouse -- **Network protocols**: tcp, udp, - http, syslog +- **Cloud storage**: amazon/s3, google/cloud-storage, microsoft/azure-blob-storage +- **Message queues**: kafka, amazon/sqs, amqp +- **Databases**: snowflake, clickhouse, mysql +- **Network protocols**: tcp, udp, http, syslog Under the hood, core integrations use a C++ plugin abstraction to provide an [operator](/reference/operators/), [function](/reference/functions/), or [context](/explanations/enrichment/) that you can use in TQL to directly interface with the respective resource, such as a TCP socket or cloud storage -bucket. We typically implement this functionality using the respective SDK, such as the -[AWS SDK](https://aws.amazon.com/sdk-for-cpp/), [Google Cloud +bucket. We typically implement this functionality using the respective SDK, such +as the [AWS SDK](https://aws.amazon.com/sdk-for-cpp/), [Google Cloud SDK](https://cloud.google.com/cpp), or [librdkafka](https://github.com/confluentinc/librdkafka), though some integrations require a custom implementation. -:::note[Dedicated Operators] -For some applications, we provide a **dedicated operator** that dramatically -simplifies the user experience. For example, -to_splunk and -from_opensearch offer a -streamlined interface compared to composing generic HTTP or protocol operators. +:::note[Dedicated operators] +For some applications, we provide a **dedicated operator** that dramatically simplifies the user experience. For example, to_splunk and accept_opensearch offer a streamlined interface compared to composing generic HTTP or protocol operators. ::: diff --git a/src/content/docs/integrations/microsoft/windows-event-logs.mdx b/src/content/docs/integrations/microsoft/windows-event-logs.mdx index 4fedc2268..83101e2a8 100644 --- a/src/content/docs/integrations/microsoft/windows-event-logs.mdx +++ b/src/content/docs/integrations/microsoft/windows-event-logs.mdx @@ -221,10 +221,8 @@ configuration: Import the logs via TCP: ```tql -load_tcp "127.0.0.1:4000", - tls=true, - certfile="key_and_cert.pem", - keyfile="key_and_cert.pem" { +accept_tcp "127.0.0.1:4000", + tls={certfile: "key_and_cert.pem", keyfile: "key_and_cert.pem"} { read_json } import @@ -251,7 +249,7 @@ configuration to publish to the `nxlog` topic: ``` -Then use our [Kafka integration](/integrations/kafka) to read from the topic: +Then use kafka to read from the topic: ```tql from_kafka "nxlog" @@ -589,7 +587,7 @@ Accept the logs sent with the configuration above into Tenzir via tcp: ```tql -load_tcp "10.0.0.1:1514" { +accept_tcp "10.0.0.1:1514" { read_json } publish "wec" @@ -618,7 +616,7 @@ Security monitoring often focuses on specific event types. Filter for logon events (Event ID 4624) and failed logon attempts (Event ID 4625): ```tql -load_tcp "10.0.0.1:1514" { +accept_tcp "10.0.0.1:1514" { read_delimited "\n", include_separator=true } this = data.parse_winlog() @@ -631,7 +629,7 @@ The `EventData` section contains event-specific fields. For a successful logon event, extract the relevant information: ```tql -load_tcp "10.0.0.1:1514" { +accept_tcp "10.0.0.1:1514" { read_delimited "\n", include_separator=true } this = data.parse_winlog() diff --git a/src/content/docs/integrations/mysql.excalidraw b/src/content/docs/integrations/mysql.excalidraw new file mode 100644 index 000000000..2beae0e56 --- /dev/null +++ b/src/content/docs/integrations/mysql.excalidraw @@ -0,0 +1,947 @@ +{ + "type": "excalidraw", + "version": 2, + "source": "https://app.excalidraw.com", + "elements": [ + { + "id": "8MN5I4hT3wVen_QUHAUnu", + "type": "rectangle", + "x": 1029.4206230693742, + "y": 462.25159339855855, + "width": 150.57937693062584, + "height": 143.23101048605469, + "angle": 0, + "strokeColor": "#1e1e1e", + "backgroundColor": "transparent", + "fillStyle": "solid", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "groupIds": [], + "frameId": null, + "index": "Zy", + "roundness": { + "type": 3 + }, + "seed": 239993862, + "version": 526, + "versionNonce": 175282956, + "isDeleted": false, + "boundElements": [ + { + "type": "text", + "id": "Q90BHXvcB9LwBRiLiSjA9" + } + ], + "updated": 1770653676903, + "link": null, + "locked": false + }, + { + "id": "Q90BHXvcB9LwBRiLiSjA9", + "type": "text", + "x": 1067.1423107412302, + "y": 467.25159339855855, + "width": 75.13600158691406, + "height": 20, + "angle": 0, + "strokeColor": "#1e1e1e", + "backgroundColor": "transparent", + "fillStyle": "solid", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "groupIds": [], + "frameId": null, + "index": "Zz", + "roundness": null, + "seed": 1597739846, + "version": 506, + "versionNonce": 1537026956, + "isDeleted": false, + "boundElements": [], + "updated": 1770653655303, + "link": null, + "locked": false, + "text": "Database", + "fontSize": 16, + "fontFamily": 5, + "textAlign": "center", + "verticalAlign": "top", + "containerId": "8MN5I4hT3wVen_QUHAUnu", + "originalText": "Database", + "autoResize": true, + "lineHeight": 1.25 + }, + { + "type": "rectangle", + "version": 481, + "versionNonce": 1842653364, + "index": "aU", + "isDeleted": false, + "id": "MdBLs5DrRJv7pPr1ECBdP", + "fillStyle": "solid", + "strokeWidth": 1, + "strokeStyle": "dotted", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 860, + "y": 380, + "strokeColor": "#1e1e1e", + "backgroundColor": "transparent", + "width": 480.0000000000001, + "height": 260, + "seed": 1012386970, + "groupIds": [], + "frameId": null, + "roundness": null, + "boundElements": [ + { + "id": "anAP1e8hQ4MPrGd5mJIPw", + "type": "arrow" + } + ], + "updated": 1770653666972, + "link": null, + "locked": true + }, + { + "type": "arrow", + "version": 1418, + "versionNonce": 1665469708, + "isDeleted": true, + "id": "g_xmr_rAUoQx04wSxq2yQ", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 764.4167779398937, + "y": 525.2324875938532, + "strokeColor": "#1e1e1e", + "backgroundColor": "#ffec99", + "width": 231.80542997696136, + "height": 0, + "seed": 61286746, + "groupIds": [], + "frameId": null, + "roundness": null, + "boundElements": [], + "updated": 1770653628213, + "link": null, + "locked": false, + "startBinding": null, + "endBinding": null, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": "triangle", + "points": [ + [ + 0, + 0 + ], + [ + 231.80542997696136, + 0 + ] + ], + "index": "aV" + }, + { + "type": "line", + "version": 1403, + "versionNonce": 690428340, + "isDeleted": true, + "id": "K1eRli9Ss4OCth8Rxisx0", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 584.4167779398937, + "y": 485.22529758183066, + "strokeColor": "#000000", + "backgroundColor": "#e9ecef", + "width": 160, + "height": 80, + "seed": 174911002, + "groupIds": [ + "nS55pT0iEmYN13lrATL_r" + ], + "frameId": null, + "roundness": null, + "boundElements": [], + "updated": 1770653628213, + "link": null, + "locked": false, + "startBinding": null, + "endBinding": null, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": null, + "points": [ + [ + 0, + 0 + ], + [ + 40, + 39.67197250080198 + ], + [ + -1.4210854715202004e-14, + 80 + ], + [ + 160, + 80 + ], + [ + 160, + 0 + ], + [ + 0, + 0 + ] + ], + "index": "aW", + "polygon": false + }, + { + "type": "image", + "version": 3810, + "versionNonce": 867666828, + "index": "aX", + "isDeleted": true, + "id": "Ob5zHrhyWqtZkihkGCreM", + "fillStyle": "solid", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 650.5291677541359, + "y": 499.0482924112763, + "strokeColor": "transparent", + "backgroundColor": "#ffc9c9", + "width": 61.54214099463984, + "height": 50.681763172056336, + "seed": 1704729306, + "groupIds": [ + "fCtmhUGKWexr3OXqoPqQN", + "nS55pT0iEmYN13lrATL_r" + ], + "frameId": null, + "roundness": null, + "boundElements": [], + "updated": 1770653628213, + "link": null, + "locked": false, + "status": "saved", + "fileId": "d1ee05fd41e4fad60a7372901fb1a4d206070a057f9b3a48768235e2c5775230a90d16ba6804ee1fbd79befd1ec8cbbf", + "scale": [ + 1, + 1 + ], + "crop": null + }, + { + "id": "tYpVzQAJgHPTVmA_sLOkS", + "type": "text", + "x": 886.072345512997, + "y": 537.0310114160328, + "width": 108.73332977294922, + "height": 20, + "angle": 0, + "strokeColor": "#1e1e1e", + "backgroundColor": "#ffec99", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "groupIds": [], + "frameId": null, + "index": "ae", + "roundness": null, + "seed": 1953084442, + "version": 241, + "versionNonce": 771867444, + "isDeleted": true, + "boundElements": [], + "updated": 1770653628213, + "link": null, + "locked": false, + "text": "clickhouse-cpp", + "fontSize": 16, + "fontFamily": 5, + "textAlign": "left", + "verticalAlign": "top", + "containerId": null, + "originalText": "clickhouse-cpp", + "autoResize": true, + "lineHeight": 1.25 + }, + { + "id": "NsxTI00fJ7Tk4JFzEOGL_", + "type": "image", + "x": 994.4490034404632, + "y": 385.84069071345743, + "width": 221.3677416168746, + "height": 67.5359211712499, + "angle": 0, + "strokeColor": "transparent", + "backgroundColor": "transparent", + "fillStyle": "solid", + "strokeWidth": 4, + "strokeStyle": "solid", + "roughness": 0, + "opacity": 100, + "groupIds": [], + "frameId": null, + "index": "ah", + "roundness": null, + "seed": 135755354, + "version": 509, + "versionNonce": 1203241908, + "isDeleted": true, + "boundElements": [], + "updated": 1770653606682, + "link": null, + "locked": false, + "status": "saved", + "fileId": "74235aaa80d5377e7af618a88eaf3165fae35889019cba2a4a62571f5ce6a7f9ad2989f927cd6093f58b899f2e11d669", + "scale": [ + 1, + 1 + ], + "crop": null + }, + { + "id": "grimbjvvQaXcX0fZ1QGkH", + "type": "rectangle", + "x": 1065.2209485133449, + "y": 508.5832717947994, + "width": 79.72278435602915, + "height": 56.34336263774654, + "angle": 0, + "strokeColor": "#1e1e1e", + "backgroundColor": "transparent", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "groupIds": [ + "3KLhhHW84bohbevIB1Hzc" + ], + "frameId": null, + "index": "amd", + "roundness": null, + "seed": 1503711494, + "version": 762, + "versionNonce": 932232454, + "isDeleted": false, + "boundElements": [], + "updated": 1741098374272, + "link": null, + "locked": false + }, + { + "id": "jDCSnbgRZUSNDrfjEd8Fc", + "type": "line", + "x": 1065.581736075349, + "y": 533.2097230148814, + "width": 78.67259902626967, + "height": 0, + "angle": 0, + "strokeColor": "#1e1e1e", + "backgroundColor": "#e9ecef", + "fillStyle": "solid", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "groupIds": [ + "3KLhhHW84bohbevIB1Hzc" + ], + "frameId": null, + "index": "aml", + "roundness": { + "type": 2 + }, + "seed": 816627782, + "version": 333, + "versionNonce": 566176838, + "isDeleted": false, + "boundElements": [], + "updated": 1741098374272, + "link": null, + "locked": false, + "points": [ + [ + 0, + 0 + ], + [ + 78.67259902626967, + 0 + ] + ], + "lastCommittedPoint": null, + "startBinding": null, + "endBinding": null, + "startArrowhead": null, + "endArrowhead": null, + "polygon": false + }, + { + "id": "x1F877POVQOiJKca5RDRq", + "type": "line", + "x": 1081.316255880603, + "y": 533.2097230148814, + "width": 0.12437256409457782, + "height": 31.7169114176644, + "angle": 0, + "strokeColor": "#1e1e1e", + "backgroundColor": "#e9ecef", + "fillStyle": "solid", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "groupIds": [ + "3KLhhHW84bohbevIB1Hzc" + ], + "frameId": null, + "index": "amt", + "roundness": { + "type": 2 + }, + "seed": 1996015494, + "version": 330, + "versionNonce": 1927132038, + "isDeleted": false, + "boundElements": [], + "updated": 1741098374272, + "link": null, + "locked": false, + "points": [ + [ + 0, + 0 + ], + [ + -0.12437256409457782, + 31.7169114176644 + ] + ], + "lastCommittedPoint": null, + "startBinding": null, + "endBinding": null, + "startArrowhead": null, + "endArrowhead": null, + "polygon": false + }, + { + "id": "okS3KPOvVBZwY3ofsj0Uc", + "type": "line", + "x": 1097.3058925944924, + "y": 533.52609872761, + "width": 0.17604688976762406, + "height": 31.400535704935898, + "angle": 0, + "strokeColor": "#1e1e1e", + "backgroundColor": "#e9ecef", + "fillStyle": "solid", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "groupIds": [ + "3KLhhHW84bohbevIB1Hzc" + ], + "frameId": null, + "index": "an", + "roundness": { + "type": 2 + }, + "seed": 2093082310, + "version": 341, + "versionNonce": 86594246, + "isDeleted": false, + "boundElements": [], + "updated": 1741098374272, + "link": null, + "locked": false, + "points": [ + [ + 0, + 0 + ], + [ + -0.17604688976762406, + 31.400535704935898 + ] + ], + "lastCommittedPoint": null, + "startBinding": null, + "endBinding": null, + "startArrowhead": null, + "endArrowhead": null, + "polygon": false + }, + { + "id": "ZJrOj4tP-Rs6DlBgjxBwu", + "type": "line", + "x": 1113.4949144809275, + "y": 533.5112964970239, + "width": 0.42710638798645123, + "height": 31.415337935521947, + "angle": 0, + "strokeColor": "#1e1e1e", + "backgroundColor": "#e9ecef", + "fillStyle": "solid", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "groupIds": [ + "3KLhhHW84bohbevIB1Hzc" + ], + "frameId": null, + "index": "anV", + "roundness": { + "type": 2 + }, + "seed": 44375558, + "version": 337, + "versionNonce": 1665075718, + "isDeleted": false, + "boundElements": [], + "updated": 1741098374272, + "link": null, + "locked": false, + "points": [ + [ + 0, + 0 + ], + [ + -0.42710638798645123, + 31.415337935521947 + ] + ], + "lastCommittedPoint": null, + "startBinding": null, + "endBinding": null, + "startArrowhead": null, + "endArrowhead": null, + "polygon": false + }, + { + "id": "vYTHyZXj8YpcxC4gKL6cW", + "type": "line", + "x": 1128.9366108631093, + "y": 533.4277221087583, + "width": 0.06915961804813092, + "height": 31.498912323787575, + "angle": 0, + "strokeColor": "#1e1e1e", + "backgroundColor": "#e9ecef", + "fillStyle": "solid", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "groupIds": [ + "3KLhhHW84bohbevIB1Hzc" + ], + "frameId": null, + "index": "ao", + "roundness": { + "type": 2 + }, + "seed": 765422918, + "version": 337, + "versionNonce": 1008458054, + "isDeleted": false, + "boundElements": [], + "updated": 1741098374272, + "link": null, + "locked": false, + "points": [ + [ + 0, + 0 + ], + [ + 0.06915961804813092, + 31.498912323787575 + ] + ], + "lastCommittedPoint": null, + "startBinding": null, + "endBinding": null, + "startArrowhead": null, + "endArrowhead": null, + "polygon": false + }, + { + "id": "AL2_d8YDszRMsO8BuMuM5", + "type": "text", + "x": 1084.4878148765415, + "y": 510.21968757478163, + "width": 42.39994812011719, + "height": 20, + "angle": 0, + "strokeColor": "#1e1e1e", + "backgroundColor": "transparent", + "fillStyle": "solid", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "groupIds": [ + "3KLhhHW84bohbevIB1Hzc" + ], + "frameId": null, + "index": "ap", + "roundness": null, + "seed": 2064503942, + "version": 222, + "versionNonce": 1461502086, + "isDeleted": false, + "boundElements": [], + "updated": 1741098374272, + "link": null, + "locked": false, + "text": "Table", + "fontSize": 16, + "fontFamily": 5, + "textAlign": "left", + "verticalAlign": "top", + "containerId": null, + "originalText": "Table", + "autoResize": true, + "lineHeight": 1.25 + }, + { + "id": "sr_2aHlCdgC88c4nWzhF_", + "type": "image", + "x": 1040, + "y": 390, + "width": 140, + "height": 70, + "angle": 0, + "strokeColor": "transparent", + "backgroundColor": "transparent", + "fillStyle": "solid", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "groupIds": [], + "frameId": null, + "index": "aq", + "roundness": null, + "seed": 1850443956, + "version": 24, + "versionNonce": 500443700, + "isDeleted": false, + "boundElements": [], + "updated": 1770653635369, + "link": null, + "locked": false, + "status": "saved", + "fileId": "2e07883d1d9f64a8e2b93ea39a266e26f42f707d5579bf9a1348070b734c295d8c0dbca08e635fd04d72dbf979c6b05b", + "scale": [ + 1, + 1 + ], + "crop": null + }, + { + "type": "arrow", + "version": 542, + "versionNonce": 407558580, + "isDeleted": false, + "id": "anAP1e8hQ4MPrGd5mJIPw", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 1224.5442617137614, + "y": 539.1944379296762, + "strokeColor": "#1e1e1e", + "backgroundColor": "transparent", + "width": 205.2903283354567, + "height": 0, + "seed": 1781230860, + "groupIds": [], + "frameId": null, + "roundness": null, + "boundElements": [], + "updated": 1770654115841, + "link": null, + "locked": false, + "startBinding": null, + "endBinding": null, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": "triangle", + "points": [ + [ + 0, + 0 + ], + [ + 205.2903283354567, + 0 + ] + ], + "index": "ar", + "moveMidPointsWithElement": false + }, + { + "type": "line", + "version": 1174, + "versionNonce": 1814268980, + "isDeleted": false, + "id": "kVLOfrXYUm_bhK00KWUdN", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 1441.5929310543029, + "y": 500.72686469852636, + "strokeColor": "#000000", + "backgroundColor": "#e9ecef", + "width": 160, + "height": 80, + "seed": 1322380172, + "groupIds": [ + "FqtwZvej4SaxiNmq7rWnL" + ], + "frameId": null, + "roundness": null, + "boundElements": [], + "updated": 1770653674836, + "link": null, + "locked": false, + "startBinding": null, + "endBinding": null, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": null, + "points": [ + [ + 0, + 0 + ], + [ + -1.4896611218806584, + 80 + ], + [ + 118.51033887811936, + 79.99999999999999 + ], + [ + 158.51033887811934, + 40.000000000000014 + ], + [ + 118.51033887811934, + -7.105427357601002e-15 + ], + [ + 0, + 0 + ] + ], + "index": "as", + "polygon": false + }, + { + "type": "image", + "version": 3781, + "versionNonce": 1842079156, + "index": "at", + "isDeleted": false, + "id": "qcU_lCbEEJfrEK51E_Uwk", + "fillStyle": "solid", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 1480.3186257099073, + "y": 513.6204399910713, + "strokeColor": "transparent", + "backgroundColor": "#ffc9c9", + "width": 61.54214099463984, + "height": 50.681763172056336, + "seed": 119839244, + "groupIds": [ + "SH9Y4XZl0F0a3RqnagE-E", + "FqtwZvej4SaxiNmq7rWnL" + ], + "frameId": null, + "roundness": null, + "boundElements": [], + "updated": 1770653674836, + "link": null, + "locked": false, + "status": "saved", + "fileId": "d1ee05fd41e4fad60a7372901fb1a4d206070a057f9b3a48768235e2c5775230a90d16ba6804ee1fbd79befd1ec8cbbf", + "scale": [ + 1, + 1 + ], + "crop": null + }, + { + "id": "udf33bdWTCHF0aOMjJPnP", + "type": "arrow", + "x": 1199.9999997629093, + "y": 539.6403573296411, + "width": 20, + "height": 0.46871015741385236, + "angle": 0, + "strokeColor": "#1e1e1e", + "backgroundColor": "#e9ecef", + "fillStyle": "solid", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "groupIds": [], + "frameId": null, + "index": "au", + "roundness": null, + "seed": 1735537332, + "version": 414, + "versionNonce": 1476357516, + "isDeleted": false, + "boundElements": [], + "updated": 1770654102092, + "link": null, + "locked": false, + "points": [ + [ + 0, + 0 + ], + [ + -20, + -0.46871015741385236 + ] + ], + "lastCommittedPoint": null, + "startBinding": null, + "endBinding": null, + "startArrowhead": null, + "endArrowhead": null, + "elbowed": false + }, + { + "id": "2YeGA667pdqopGszKYHHk", + "type": "ellipse", + "x": 1201.3800898153793, + "y": 531.4518874011399, + "width": 13.508293047904658, + "height": 13.508293047904658, + "angle": 0, + "strokeColor": "#1e1e1e", + "backgroundColor": "#e9ecef", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "groupIds": [], + "frameId": null, + "index": "av", + "roundness": { + "type": 2 + }, + "seed": 92388404, + "version": 423, + "versionNonce": 1397873676, + "isDeleted": false, + "boundElements": [], + "updated": 1770654102092, + "link": null, + "locked": false + }, + { + "id": "Y6dgqoPj8ThOFAFcVcc8l", + "type": "text", + "x": 1194.0364892454452, + "y": 549.4106528039305, + "width": 28.195493698120117, + "height": 21.829895655804975, + "angle": 0, + "strokeColor": "#1e1e1e", + "backgroundColor": "transparent", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "dashed", + "roughness": 1, + "opacity": 100, + "groupIds": [], + "frameId": null, + "index": "aw", + "roundness": null, + "seed": 467932596, + "version": 369, + "versionNonce": 1155449268, + "isDeleted": false, + "boundElements": [], + "updated": 1770654104858, + "link": null, + "locked": false, + "text": "MySQL\nAPI", + "fontSize": 8.73195826232199, + "fontFamily": 5, + "textAlign": "center", + "verticalAlign": "top", + "containerId": null, + "originalText": "MySQL\nAPI", + "autoResize": true, + "lineHeight": 1.25 + } + ], + "appState": { + "gridSize": 20, + "gridStep": 5, + "gridModeEnabled": false, + "viewBackgroundColor": "#ffffff", + "lockedMultiSelections": {} + }, + "files": { + "2e07883d1d9f64a8e2b93ea39a266e26f42f707d5579bf9a1348070b734c295d8c0dbca08e635fd04d72dbf979c6b05b": { + "mimeType": "image/svg+xml", + "id": "2e07883d1d9f64a8e2b93ea39a266e26f42f707d5579bf9a1348070b734c295d8c0dbca08e635fd04d72dbf979c6b05b", + "dataURL": "data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIxMjAiIGhlaWdodD0iNjAiIHZpZXdCb3g9IjAgMCA5LjI1MiA0LjYyNiI+PGcgdHJhbnNmb3JtPSJtYXRyaXgoLjAzNzM3NiAwIDAgLjAzNzM3NiAxLjA2OTk5NCAtMS4zMTkzMzkpIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiPjxwYXRoIGQ9Ik04LjUwNCAxMjguMjE1aDUuOHYtMjIuOTc3bDkuMDU4IDIwLjAzM2MxLjAyNiAyLjQwOCAyLjUgMy4zIDUuMzU0IDMuM3M0LjI0LS44OTMgNS4zLTMuM2w5LjAxMy0yMC4wMzN2MjIuOTc3aDUuODQ1di0yMi45NzdjMC0yLjIzLS44OTMtMy4zMDMtMi43NjctMy44ODMtNC40MTctMS4zMzgtNy4zNjItLjE3OC04LjcgMi44MWwtOC44NzggMTkuODEtOC41NjctMTkuODFjLTEuMjk0LTIuOTg4LTQuMjg0LTQuMTQ4LTguNzQ1LTIuODEtMS44My41OC0yLjcyMiAxLjY1Mi0yLjcyMiAzLjg4M2wtLjAwMSAyMi45Nzd6bTQ1LjE5OC0xOC42OTRoNS44NDV2MTIuNjI3Yy0uMDQ0LjcxMy4yMjMgMi4zMiAzLjQgMi4zNjMgMS42NS4wNDUgMTIuNTgyIDAgMTIuNjcgMHYtMTUuMDhoNS44NDV2MjAuNjU4YzAgNS4wODYtNi4zIDYuMi05LjIzNiA2LjI0NmgtMTguMzh2LTMuODhoMTguNDI3YzMuNzQ4LS40MDIgMy4zMDItMi4yNzUgMy4zMDItMi45di0xLjUxOGgtMTIuMzZjLTUuNzU2LS4wNDUtOS40Ni0yLjU4OC05LjUwMy01LjQ4OHYtMTMuMDN6bTEyNS4zNzQtMTQuNjM1Yy0zLjU2OC0uMDktNi4zMzYuMjY4LTguNjU2IDEuMjUtLjY2OC4yNy0xLjc0LjI3LTEuODI4IDEuMTE2LjM1Ny4zNTUuNC45MzYuNzEzIDEuNDI4LjUzNS44OTMgMS40NzMgMi4wOTYgMi4zMiAyLjcyLjkzOC43MTUgMS44NzUgMS40MjggMi44NTUgMi4wNTMgMS43NCAxLjA3IDMuNzAzIDEuNjk1IDUuMzk4IDIuNzY2Ljk4Mi42MjUgMS45NjMgMS40MjggMi45NDUgMi4wOTguNS4zNTcuODAzLjkzOCAxLjQyOCAxLjE2di0uMTM1Yy0uMzEyLS40LS40MDItLjk4LS43MTMtMS40MjgtLjQ0Ny0uNDQ1LS44OTMtLjg0OC0xLjM0LTEuMjkzLTEuMjkzLTEuNzQtMi45LTMuMjU4LTQuNjQtNC41MDYtMS40MjgtLjk4Mi00LjU1LTIuMzItNS4xMy0zLjk3bC0uMDg4LS4wOWMuOTgtLjA5IDIuMTQtLjQ0NyAzLjA3OC0uNzE1IDEuNTE4LS40IDIuOS0uMzEyIDQuNDYtLjcxMy43MTUtLjE4IDEuNDI4LS40MDIgMi4xNDMtLjYyNXYtLjRjLS44MDMtLjgwMy0xLjM4My0xLjg3NC0yLjIzLTIuNjMyLTIuMjc1LTEuOTYzLTQuNzc1LTMuODgyLTcuMzYzLTUuNDg4LTEuMzgzLS44OTItMy4xNjgtMS40NzMtNC42NC0yLjIzLS41MzctLjI2OC0xLjQyOC0uNDAyLTEuNzQtLjg0OC0uODA1LS45OC0xLjI1LTIuMjc1LTEuODMtMy40MzYtMS4yOTMtMi40NTQtMi41NDMtNS4xNzUtMy42NTgtNy43NjMtLjgwMy0xLjc0LTEuMjk1LTMuNDgtMi4yNzUtNS4wODYtNC41OTYtNy41ODUtOS41OTQtMTIuMTgtMTcuMjY4LTE2LjY4Ny0xLjY1LS45MzctMy42MTMtMS4zNC01LjctMS44M2wtMy4zNDYtLjE4Yy0uNzE1LS4zMTItMS40MjgtMS4xNi0yLjA1My0xLjU2Mi0yLjU0My0xLjYwNi05LjEwMi01LjA4Ni0xMC45NzctLjQ5LTEuMjA1IDIuOSAxLjc4NSA1Ljc1NSAyLjggNy4yMjguNzYgMS4wMjYgMS43NCAyLjE4NiAyLjI3NyAzLjM0Ni4zLjc1OC40IDEuNTYyLjcxMyAyLjM2NS43MTMgMS45NjMgMS4zODMgNC4xNSAyLjMyIDUuOTguNS45MzcgMS4wMjUgMS45MiAxLjY1IDIuNzY3LjM1Ny40OS45ODIuNzE0IDEuMTE1IDEuNTE3LS42MjUuODkzLS42NjggMi4yMy0xLjAyNSAzLjM0Ny0xLjYwNyA1LjA0Mi0uOTgyIDExLjI4OCAxLjI5MyAxNC45OS43MTUgMS4xMTUgMi40IDMuNTcgNC42ODYgMi42MzIgMi4wMDgtLjgwMyAxLjU2LTMuMzQ2IDIuMTQtNS41NzcuMTM1LS41MzUuMDQ1LS44OTIuMzEyLTEuMjV2LjA5bDEuODMgMy43MDNjMS4zODMgMi4xODYgMy43OTMgNC40NjIgNS44IDUuOTggMS4wNy44MDMgMS45MTggMi4xODcgMy4yNTYgMi42Nzd2LS4xMzVoLS4wODhjLS4yNjgtLjQtLjY3LS41OC0xLjAyNy0uODkyLS44MDMtLjgwMy0xLjY5NS0xLjc4NS0yLjMyLTIuNjc3LTEuODczLTIuNDk4LTMuNTIzLTUuMjY1LTQuOTk2LTguMTItLjcxNS0xLjM4My0xLjM0LTIuOS0xLjkxOC00LjI4My0uMjctLjUzNi0uMjctMS4zNC0uNzE1LTEuNjA2LS42Ny45OC0xLjY1IDEuODMtMi4xNDMgMy4wMzQtLjg0OCAxLjkxOC0uOTM2IDQuMjgzLTEuMjQ4IDYuNzM3LS4xOC4wNDUtLjEgMC0uMTguMDktMS40MjYtLjM1Ni0xLjkxOC0xLjgzLTIuNDUzLTMuMDc4LTEuMzM4LTMuMTY4LTEuNTYyLTguMjU0LS40MDItMTEuOTEzLjMxMi0uOTM3IDEuNjUyLTMuODgyIDEuMTE3LTQuNzc0LS4yNy0uODQ4LTEuMTYtMS4zMzgtMS42NTItMi4wMDgtLjU4LS44NDgtMS4yMDMtMS45MTgtMS42MDUtMi44NTUtMS4wNy0yLjUtMS42MDUtNS4yNjUtMi43NjYtNy43NjQtLjUzNy0xLjE2LTEuNDczLTIuMzY1LTIuMjMyLTMuNDM1LS44NDgtMS4yMDUtMS43ODMtMi4wNTMtMi40NTMtMy40OC0uMjIzLS40OS0uNTM1LTEuMjk0LS4xNzgtMS44My4wODgtLjM1Ny4yNjgtLjQ5LjYyMy0uNTguNTgtLjQ5IDIuMjMyLjEzNCAyLjgxMi40IDEuNjUuNjcgMy4wMzMgMS4yOTQgNC40MTYgMi4yMy42MjUuNDQ2IDEuMjk1IDEuMjk0IDIuMDk4IDEuNTE4aC45MzhjMS40MjguMzEyIDMuMDMzLjA5IDQuMzcuNDkgMi4zNjUuNzYgNC41MDYgMS44NzQgNi40MjYgMy4wOCA1Ljg0NCAzLjcwMyAxMC42NjQgOC45NjggMTMuOTIgMTUuMjYuNTM1IDEuMDI2Ljc1OCAxLjk2MyAxLjI1IDMuMDM0LjkzOCAyLjE4NyAyLjA5OCA0LjQxNyAzLjAzMyA2LjU2LjkzOCAyLjA5NyAxLjgzIDQuMjQgMy4xNjggNS45OC42Ny45MzcgMy4zNDYgMS40MjcgNC41NSAxLjkxOC44OTMuNCAyLjI3NS43NiAzLjA4IDEuMjUgMS41MTYuOTM3IDMuMDMzIDIuMDA4IDQuNDYgMy4wMzQuNzEzLjUzNCAyLjk0NSAxLjY1IDMuMDc4IDIuNTR6bS00NS41LTM4Ljc3MmE3LjA5IDcuMDkgMCAwIDAtMS44MjguMjIzdi4wOWguMDg4Yy4zNTcuNzE0Ljk4MiAxLjIwNSAxLjQyOCAxLjgzbDEuMDI3IDIuMTQyLjA4OC0uMDljLjYyNS0uNDQ2LjkzOC0xLjE2LjkzOC0yLjIzLS4yNjgtLjMxMi0uMzEyLS42MjUtLjUzNS0uOTM3LS4yNjgtLjQ0Ni0uODQ4LS42Ny0xLjIwNi0xLjAyNnoiIGZpbGw9IiMwMDY3OGMiLz48cGF0aCBkPSJNODUuOTE2IDEyOC4yMTVoMTYuNzc2YzEuOTYzIDAgMy44MzgtLjQgNS4zNTQtMS4xMTUgMi41NDMtMS4xNiAzLjc0OC0yLjcyIDMuNzQ4LTQuNzczdi00LjI4M2MwLTEuNjUtMS4zODMtMy4yMTMtNC4xNDgtNC4yODMtMS40MjgtLjUzNS0zLjIxMy0uODQ4LTQuOTUzLS44NDhoLTcuMDVjLTIuMzY1IDAtMy40OC0uNzE1LTMuNzkzLTIuMjc1LS4wNDQtLjE3OC0uMDQ0LS4zNTctLjA0NC0uNTM1di0yLjYzM2MwLS4xMzUgMC0uMzEyLjA0NC0uNDkuMzEyLTEuMjA1LjkzNy0xLjUxOCAzLTEuNzRoMTcuMTc3di0zLjg4M2gtMTYuMzNjLTIuMzY1IDAtMy42MTQuMTM1LTQuNzMuNDkyLTMuNDM2IDEuMDctNC45NTMgMi43NjYtNC45NTMgNS43NTR2My4zOTNjMCAyLjYzIDIuOTQ1IDQuODYzIDcuOTQyIDUuMzk4LjUzNS4wNDUgMS4xMTUuMDQ1IDEuNjk1LjA0NWg2LjAyNGMuMjIzIDAgLjQ0NSAwIC42MjMuMDQ1IDEuODMuMTc4IDIuNjMzLjQ5IDMuMTY4IDEuMTU4LjM1Ny4zNTcuNDQ3LjY3LjQ0NyAxLjA3MnYzLjM5YzAgLjQtLjI2OC45MzgtLjgwMyAxLjM4M3MtMS4zODUuNzU4LTIuNS44MDNjLS4yMjMgMC0uMzU1LjA0NS0uNTguMDQ1SDg1LjkxNnptNjIuMTk1LTYuNzM2YzAgMy45NyAzIDYuMiA4Ljk3IDYuNjQ4LjU4LjA0NSAxLjExNS4wODggMS42OTUuMDg4aDE1LjE3di0zLjg4aC0xNS4zMDNjLTMuMzkzIDAtNC42ODYtLjg0OC00LjY4Ni0yLjl2LTIwLjA3OEgxNDguMXYyMC4xMjN6bS0zMi42MTUuMTc3di0xMy44M2MwLTMuNTI1IDIuNDk4LTUuNjY4IDcuMzYzLTYuMzM2LjUzNS0uMDQ1IDEuMDctLjA5IDEuNTYtLjA5aDExLjA2NGMuNTggMCAxLjA3Mi4wNDUgMS42NTIuMDkgNC44NjMuNjY4IDcuMzE2IDIuODEgNy4zMTYgNi4zMzZ2MTMuODNjMCAyLjg1NS0xLjAyNSA0LjM3My0zLjQzNiA1LjRsNS43MSA1LjE3NGgtNi43MzZsLTQuNjQtNC4xOTMtNC42ODYuMjY4aC02LjI0NmExMy42NiAxMy42NiAwIDAgMS0zLjM5MS0uNDQ1Yy0zLjctMS4wMjgtNS41My0yLjk5LTUuNTMtNi4yMDR6bTYuMjktLjMxYzAgLjE3OC4xLjM1NS4xMzUuNTguMzEyIDEuNjA1IDEuODI4IDIuNDk4IDQuMTQ4IDIuNDk4aDUuMjY2bC00LjgxOC00LjM3M2g2LjczNmw0LjIzOCAzLjgzOGMuODA1LS40NDcgMS4yOTUtMS4wNzIgMS40NzMtMS44NzUuMDQ1LS4xNzguMDQ1LS40LjA0NS0uNTh2LTEzLjI1MmMwLS4xNzggMC0uMzU1LS4wNDUtLjUzNS0uMzEyLTEuNTE2LTEuODI4LTIuMzYzLTQuMTA0LTIuMzYzaC04Ljc5Yy0yLjU4OCAwLTQuMjgzIDEuMTE1LTQuMjgzIDIuODk4eiIgZmlsbD0iI2NlOGIyYyIvPjwvZz48L3N2Zz4=", + "created": 1770653604842 + }, + "d1ee05fd41e4fad60a7372901fb1a4d206070a057f9b3a48768235e2c5775230a90d16ba6804ee1fbd79befd1ec8cbbf": { + "mimeType": "image/svg+xml", + "id": "d1ee05fd41e4fad60a7372901fb1a4d206070a057f9b3a48768235e2c5775230a90d16ba6804ee1fbd79befd1ec8cbbf", + "dataURL": "data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI4NSIgaGVpZ2h0PSI3MCIgdmlld0JveD0iMCAwIDg1IDcwIiBmaWxsPSJub25lIj4KPHBhdGggZD0iTTI5LjQ1IDQ0LjQyTDQyLjc3IDEyLjkzQzQyLjgyNjkgMTIuODA1OCA0Mi44NTc4IDEyLjY3MTMgNDIuODYwNyAxMi41MzQ4QzQyLjg2MzYgMTIuMzk4MiA0Mi44Mzg2IDEyLjI2MjUgNDIuNzg3IDEyLjEzNkM0Mi43MzU1IDEyLjAwOTUgNDIuNjU4NiAxMS44OTQ5IDQyLjU2MTEgMTEuNzk5M0M0Mi40NjM1IDExLjcwMzcgNDIuMzQ3NSAxMS42MjkgNDIuMjIgMTEuNThDNDIuMDkyOCAxMS41Mjg2IDQxLjk1NzEgMTEuNTAxNSA0MS44MiAxMS41SDkuMjc5OTlDOS4wNzYzMiAxMS40OTU4IDguODc2MjEgMTEuNTUzOCA4LjcwNjQzIDExLjY2NjRDOC41MzY2NiAxMS43NzkgOC40MDUzMiAxMS45NDA3IDguMzI5OTkgMTIuMTNMMy4xMDk5OSAyNC42M0MzLjA1IDI0Ljc1NTggMy4wMTY5NSAyNC44OTI3IDMuMDEyOTggMjUuMDMyQzMuMDA5MDEgMjUuMTcxMyAzLjAzNDE5IDI1LjMwOTkgMy4wODY5MiAyNS40Mzg5QzMuMTM5NjUgMjUuNTY3OSAzLjIxODc1IDI1LjY4NDQgMy4zMTkxNiAyNS43ODExQzMuNDE5NTcgMjUuODc3NyAzLjUzOTA3IDI1Ljk1MjMgMy42Njk5OSAyNkMzLjc5MzMxIDI2LjA1MjUgMy45MjU5NCAyNi4wNzk3IDQuMDU5OTkgMjYuMDhIMTkuODJDMjAuMDg1MiAyNi4wOCAyMC4zMzk2IDI2LjE4NTQgMjAuNTI3MSAyNi4zNzI5QzIwLjcxNDYgMjYuNTYwNCAyMC44MiAyNi44MTQ4IDIwLjgyIDI3LjA4QzIwLjgxODUgMjcuMjE3MiAyMC43OTE0IDI3LjM1MjggMjAuNzQgMjcuNDhMNy41OTk5OSA1OS4wN0M3LjU0MzMyIDU5LjE5NDggNy41MTI5MiA1OS4zMjk5IDcuNTEwNjUgNTkuNDY2OUM3LjUwODM5IDU5LjYwNCA3LjUzNDMyIDU5Ljc0IDcuNTg2ODMgNTkuODY2NkM3LjYzOTM1IDU5Ljk5MzIgNy43MTczMiA2MC4xMDc3IDcuODE1OTIgNjAuMjAyOUM3LjkxNDUyIDYwLjI5OCA4LjAzMTYzIDYwLjM3MiA4LjE1OTk5IDYwLjQyQzguMjgzMzEgNjAuNDcyNSA4LjQxNTk0IDYwLjQ5OTcgOC41NDk5OSA2MC41SDQzQzQzLjIxMzIgNjAuNTE0MSA0My40MjUzIDYwLjQ1OTYgNDMuNjA1MiA2MC4zNDQ1QzQzLjc4NTIgNjAuMjI5MyA0My45MjM1IDYwLjA1OTUgNDQgNTkuODZMNDkuMjIgNDcuMjhDNDkuMjc2NyA0Ny4xNTUyIDQ5LjMwNzEgNDcuMDIwMSA0OS4zMDkzIDQ2Ljg4MzFDNDkuMzExNiA0Ni43NDYgNDkuMjg1NyA0Ni42MSA0OS4yMzMxIDQ2LjQ4MzRDNDkuMTgwNiA0Ni4zNTY4IDQ5LjEwMjcgNDYuMjQyNCA0OS4wMDQxIDQ2LjE0NzJDNDguOTA1NSA0Ni4wNTIgNDguNzg4MyA0NS45NzggNDguNjYgNDUuOTNDNDguNTM2NyA0NS44Nzc1IDQ4LjQwNCA0NS44NTAzIDQ4LjI3IDQ1Ljg1SDMwLjRDMzAuMTM0OCA0NS44NSAyOS44ODA0IDQ1Ljc0NDYgMjkuNjkyOSA0NS41NTcxQzI5LjUwNTMgNDUuMzY5NiAyOS40IDQ1LjExNTIgMjkuNCA0NC44NUMyOS4zODggNDQuNzA0OCAyOS40MDUgNDQuNTU4NiAyOS40NSA0NC40MloiIGZpbGw9IiMxMjEyMTIiLz4KPHBhdGggZD0iTTM3LjE3IDQxLjU4SDUwLjg4QzUxLjA4MzcgNDEuNTg0MyA1MS4yODM4IDQxLjUyNjIgNTEuNDUzNSA0MS40MTM2QzUxLjYyMzMgNDEuMzAxIDUxLjc1NDcgNDEuMTM5MyA1MS44MyA0MC45NUw1Ny44MyAyNi42OUM1Ny45MDUzIDI2LjUwMDcgNTguMDM2NyAyNi4zMzkgNTguMjA2NCAyNi4yMjY0QzU4LjM3NjIgMjYuMTEzOCA1OC41NzYzIDI2LjA1NTcgNTguNzggMjYuMDZINzYuNDFDNzYuNjIyIDI2LjA3NDkgNzYuODMzMiAyNi4wMjE4IDc3LjAxMyAyNS45MDg1Qzc3LjE5MjggMjUuNzk1MyA3Ny4zMzE5IDI1LjYyNzYgNzcuNDEgMjUuNDNMODIuNjMgMTIuOTNDODIuNjg1MiAxMi44MDA3IDgyLjcxMjYgMTIuNjYxMyA4Mi43MTAzIDEyLjUyMDdDODIuNzA3OSAxMi4zODAxIDgyLjY3NiAxMi4yNDE2IDgyLjYxNjYgMTIuMTE0M0M4Mi41NTcxIDExLjk4NjkgODIuNDcxNSAxMS44NzM0IDgyLjM2NTIgMTEuNzgxNEM4Mi4yNTkgMTEuNjg5MyA4Mi4xMzQ1IDExLjYyMDcgODIgMTEuNThDODEuODc2NyAxMS41Mjc1IDgxLjc0NCAxMS41MDAzIDgxLjYxIDExLjVINDguODJDNDguNjA4IDExLjQ4NTEgNDguMzk2OCAxMS41MzgyIDQ4LjIxNyAxMS42NTE1QzQ4LjAzNzEgMTEuNzY0NyA0Ny44OTgxIDExLjkzMjQgNDcuODIgMTIuMTNMMzYuMTcgNDAuMTNDMzYuMTEzMSA0MC4yNTQyIDM2LjA4MjIgNDAuMzg4NyAzNi4wNzkzIDQwLjUyNTJDMzYuMDc2MyA0MC42NjE4IDM2LjEwMTQgNDAuNzk3NSAzNi4xNTI5IDQwLjkyNEMzNi4yMDQ1IDQxLjA1MDUgMzYuMjgxNCA0MS4xNjUxIDM2LjM3ODkgNDEuMjYwN0MzNi40NzY0IDQxLjM1NjMgMzYuNTkyNSA0MS40MzEgMzYuNzIgNDEuNDhDMzYuODYxMiA0MS41NDQ3IDM3LjAxNDYgNDEuNTc4OCAzNy4xNyA0MS41OFoiIGZpbGw9IiMxMjEyMTIiLz4KPC9zdmc+", + "created": 1684733732697 + } + } +} \ No newline at end of file diff --git a/src/content/docs/integrations/mysql.mdx b/src/content/docs/integrations/mysql.mdx new file mode 100644 index 000000000..2199bb694 --- /dev/null +++ b/src/content/docs/integrations/mysql.mdx @@ -0,0 +1,50 @@ +--- +title: MySQL +--- + +import Op from '@components/see-also/Op.astro'; +import { Steps } from '@astrojs/starlight/components'; + +[MySQL](https://www.mysql.com/) is an open-source relational database management +system widely used for web applications, data warehousing, and enterprise +applications. + +![MySQL Diagram](mysql.excalidraw) + +Tenzir connects to MySQL over the network using the MySQL wire protocol. Tenzir communicates with MySQL via the host and port you specify in the from_mysql operator. This means: + +- **Network**: Tenzir and MySQL can run on the same machine (using `localhost`) + or on different machines in the same network. You just need to make sure that + Tenzir can reach the MySQL server. +- **IPC**: There is no direct inter-process communication (IPC) mechanism; all + communication uses MySQL's network protocol. +- **Co-deployment**: For best performance and security, deploy Tenzir and MySQL + in the same trusted network or use TLS for encrypted connections. + +## Examples + +These examples assume that the MySQL server is running on the same host as +Tenzir. + +### List available tables + +```tql +from_mysql show="tables", host="localhost", database="mydb" +``` + +### Execute a custom SQL query + +```tql +from_mysql sql=r"SELECT id, name, email FROM users WHERE active = 1", + host="localhost", database="mydb" +``` + +### Stream new rows + +```tql +from_mysql table="events", live=true, host="localhost", database="mydb" +``` + +## See Also + +- from_mysql diff --git a/src/content/docs/integrations/nic.mdx b/src/content/docs/integrations/nic.mdx index 7323bcffb..68bdb5cdb 100644 --- a/src/content/docs/integrations/nic.mdx +++ b/src/content/docs/integrations/nic.mdx @@ -2,18 +2,13 @@ title: Network Interface --- -Tenzir supports reading packets from a network interface card (NIC). +Tenzir supports capturing packets from a network interface card (NIC). -The load_nic produces a stream of bytes -in PCAP file format: +Use from_nic to capture live packets as events: ![Packet pipeline](nic.svg) -We designed `load_nic` such that it produces a byte stream in the form of a PCAP -file. That is, when the pipeline starts, it first produces a file header, -followed by chunks of packets. This creates a byte stream that is -wire-compatible with the PCAP format, allowing you to exchange `load_nic` -with load_file and It Just Works™. +`from_nic` uses read_pcap by default. ## Examples @@ -57,11 +52,10 @@ where up ### Read packets from a network interface -Load packets from `eth0` and parse them as PCAP: +Capture packets from `eth0`: ```tql -load_nic "eth0" -read_pcap +from_nic "eth0" head 3 ``` @@ -96,8 +90,7 @@ After you have structured data in the form of PCAP events, you can use the binary `data`: ```tql -load_nic "eth0" -read_pcap +from_nic "eth0" select packet = decapsulate(this) head 1 ``` diff --git a/src/content/docs/integrations/syslog.mdx b/src/content/docs/integrations/syslog.mdx index 297eef267..31a7a47b2 100644 --- a/src/content/docs/integrations/syslog.mdx +++ b/src/content/docs/integrations/syslog.mdx @@ -30,17 +30,17 @@ this = data.parse_syslog() publish "syslog" ``` -To use TCP instead of UDP, use load_tcp with +To use TCP instead of UDP, use accept_tcp with read_syslog: ```tql -load_tcp "0.0.0.0:514" { +accept_tcp "0.0.0.0:514" { read_syslog } publish "syslog" ``` -The pipeline inside `load_tcp` executes _for each accepted connection_. +The pipeline inside accept_tcp executes _for each accepted connection_. ### Parsing CEF, LEEF, or JSON Payloads @@ -125,7 +125,7 @@ original syslog line alongside the parsed fields. Use the `raw_message` parameter to store the unparsed input: ```tql -load_tcp "0.0.0.0:514" { +accept_tcp "0.0.0.0:514" { read_syslog raw_message=raw } ``` diff --git a/src/content/docs/integrations/tcp.mdx b/src/content/docs/integrations/tcp.mdx index a744be8f5..02b6b327f 100644 --- a/src/content/docs/integrations/tcp.mdx +++ b/src/content/docs/integrations/tcp.mdx @@ -4,52 +4,42 @@ title: TCP The [Transmission Control Protocol (TCP)](https://en.wikipedia.org/wiki/Transmission_Control_Protocol) provides a -bidirectional byte stream over IP. Tenzir supports reading from and writing to -TCP sockets in both server (listening) and client (connect) mode. +bidirectional byte stream over IP. Tenzir provides operators for both sides of +a TCP conversation: connecting to remote endpoints, accepting incoming +connections, and serving data to connected clients. ![TCP](tcp.svg) Use the IP address `0.0.0.0` to listen on all available network interfaces. -:::tip[URL Support] -The URL schemes `tcp://` and `tcps://` dispatch to -load_tcp and -save_tcp for seamless URL-style use via -from and to. +:::tip[URL support] +The URL schemes `tcp://` and `tcps://` dispatch to load_tcp and save_tcp for seamless URL-style use via from and to. ::: -## SSL/TLS - -To enable TLS, use `tls=true`. You can optionally pass a PEM-encoded certificate -and private key via the `certfile` and `keyfile` options. +## Connecting to remote endpoints -For testing purposes, you can quickly generate a self-signed certificate as -follows: +Use from_tcp to connect to a remote TCP endpoint as a client and read +data from it, or to_tcp to send data to a remote endpoint. Both +operators reconnect automatically with exponential backoff on connection failure. -```bash -openssl req -x509 -newkey rsa:2048 -keyout key_and_cert.pem -out key_and_cert.pem -days 365 -nodes -``` +## Accepting incoming connections -An easy way to test a TLS connection is to try connecting via OpenSSL: +Use accept_tcp to listen on a local endpoint and accept incoming TCP +connections. Each connection spawns a nested pipeline that processes the +incoming byte stream independently. Inside that pipeline, `$peer.ip` and +`$peer.port` describe the connected client. Set `resolve_hostnames=true` to +also expose `$peer.hostname` from reverse DNS. -```bash -openssl s_client 127.0.0.1:443 -``` +## Serving data to clients -## Examples +Use serve_tcp to start a TCP server that broadcasts pipeline output to +all connected clients. A nested pipeline serializes events into bytes before +sending. -### Read data by connecting to a remote TCP server +See collecting/get-data-from-the-network for practical examples. -```tql -from "tcp://127.0.0.1:443", connect=true { - read_json -} -``` - -### Read data by listen on localhost with TLS enabled +## SSL/TLS -```tql -from "tcp://127.0.0.1:443", tls=true, certfile="cert.pem", keyfile="key.pem" { - read_json -} -``` +All TCP operators support TLS via the `tls` option. Pass an empty record +(`tls={}`) for defaults, or provide specific options like `certfile` and +`keyfile`. diff --git a/src/content/docs/reference/functions/secret.mdx b/src/content/docs/reference/functions/secret.mdx index 42786297b..4129f28bf 100644 --- a/src/content/docs/reference/functions/secret.mdx +++ b/src/content/docs/reference/functions/secret.mdx @@ -20,8 +20,7 @@ to. If the secret is not found in the node, a request is made to the Tenzir Platform. Should the platform also not be able to find the secret, an error is raised. -See the [explanation page for secrets](/explanations/secrets) for more -details. +See secrets for more details. ### `name: string` @@ -42,7 +41,7 @@ We do not recommend enabling this option. ### Using secrets in an operator ```tql -load_tcp "127.0.0.1:4000" { +accept_tcp "127.0.0.1:4000" { read_ndjson } to_splunk "https://localhost:8088", hec_token=secret("splunk-hec-token") diff --git a/src/content/docs/reference/operators.mdx b/src/content/docs/reference/operators.mdx index f53f44fb9..d5390fbc9 100644 --- a/src/content/docs/reference/operators.mdx +++ b/src/content/docs/reference/operators.mdx @@ -335,6 +335,18 @@ operators: description: 'Shows a snapshot of open sockets.' example: 'sockets' path: 'reference/operators/sockets' + - name: 'accept_http' + description: 'Accepts incoming HTTP requests and forwards them as events.' + example: 'accept_http "0.0.0.0:8080" { read_json }' + path: 'reference/operators/accept_http' + - name: 'accept_opensearch' + description: 'Accepts incoming OpenSearch Bulk API requests and forwards them as events.' + example: 'accept_opensearch "0.0.0.0:9200"' + path: 'reference/operators/accept_opensearch' + - name: 'accept_tcp' + description: 'Accepts incoming TCP or TLS connections and yields events.' + example: 'accept_tcp "0.0.0.0:8090" { read_json }' + path: 'reference/operators/accept_tcp' - name: 'from' description: 'Obtains events from an URI, inferring the source, compression and format.' example: 'from "data.json"' @@ -363,14 +375,26 @@ operators: description: 'Sends and receives HTTP/1.1 requests.' example: 'from_http "0.0.0.0:8080"' path: 'reference/operators/from_http' + - name: 'from_stdin' + description: 'Reads and parses events from standard input.' + example: 'from_stdin { read_json }' + path: 'reference/operators/from_stdin' + - name: 'from_tcp' + description: 'Connects to a remote TCP or TLS endpoint and receives events.' + example: 'from_tcp "example.org:4000" { read_json }' + path: 'reference/operators/from_tcp' - name: 'from_kafka' description: 'Receives events from an Apache Kafka topic.' example: 'from_kafka "logs"' path: 'reference/operators/from_kafka' - - name: 'from_opensearch' - description: 'Receives events via Opensearch Bulk API.' - example: 'from_opensearch' - path: 'reference/operators/from_opensearch' + - name: 'from_mysql' + description: 'Reads events from a MySQL database.' + example: 'from_mysql table="users", host="db.example.com", database="mydb"' + path: 'reference/operators/from_mysql' + - name: 'from_nic' + description: 'Captures packets from a network interface and outputs events.' + example: 'from_nic "eth0"' + path: 'reference/operators/from_nic' - name: 'from_s3' description: 'Reads one or multiple files from Amazon S3.' example: 'from_s3 "s3://my-bucket/data/**.json"' @@ -603,6 +627,10 @@ operators: description: 'Parses an incoming Syslog stream into events.' example: 'read_syslog' path: 'reference/operators/read_syslog' + - name: 'read_tql' + description: 'Parses an incoming byte stream of TQL-formatted records into events.' + example: 'read_tql' + path: 'reference/operators/read_tql' - name: 'read_tsv' description: 'Read TSV (Tab-Separated Values) from a byte stream.' example: 'read_tsv auto_expand=true' @@ -683,13 +711,21 @@ operators: description: 'Sends bytes as ZeroMQ messages.' example: 'save_zmq' path: 'reference/operators/save_zmq' + - name: 'serve_http' + description: 'Starts an HTTP server and streams bytes produced by a nested pipeline to connected clients.' + example: 'serve_http "0.0.0.0:8080" { write_ndjson }' + path: 'reference/operators/serve_http' + - name: 'serve_tcp' + description: 'Listens for incoming TCP connections and sends events to all connected clients.' + example: 'serve_tcp "0.0.0.0:8090" { write_json }' + path: 'reference/operators/serve_tcp' - name: 'sigma' description: 'Filter the input with Sigma rules and output matching events.' example: 'sigma "/tmp/rules/"' path: 'reference/operators/sigma' - name: 'yara' description: 'Executes YARA rules on byte streams.' - example: 'yara "/path/to/rules", blockwise=true' + example: 'yara "/path/to/rules"' path: 'reference/operators/yara' - name: 'to' description: 'Saves to an URI, inferring the destination, compression and format.' @@ -727,6 +763,10 @@ operators: description: 'Writes events to a URI using hive partitioning.' example: 'to_hive "s3://…", partition_by=[x]' path: 'reference/operators/to_hive' + - name: 'to_http' + description: 'Sends events as HTTP requests to a webhook or API endpoint.' + example: 'to_http "https://example.com/webhook"' + path: 'reference/operators/to_http' - name: 'to_kafka' description: 'Sends messages to an Apache Kafka topic.' example: 'to_kafka "topic", message=this.print_json()' @@ -747,6 +787,10 @@ operators: description: 'Sends events to a Splunk HTTP Event Collector (HEC).' example: 'to_splunk "localhost:8088", …' path: 'reference/operators/to_splunk' + - name: 'to_tcp' + description: 'Connects to a remote TCP or TLS endpoint and sends events.' + example: 'to_tcp "collector.example.com:5044" { write_json }' + path: 'reference/operators/to_tcp' - name: 'write_bitz' description: 'Writes events in *BITZ* format.' example: 'write_bitz' @@ -1049,7 +1093,7 @@ sigma "/tmp/rules/" ```tql -yara "/path/to/rules", blockwise=true +yara "/path/to/rules" ``` @@ -1786,6 +1830,14 @@ read_syslog + + +```tql +read_tql +``` + + + ```tql @@ -2131,6 +2183,30 @@ load_zmq + + +```tql +accept_http "0.0.0.0:8080" { read_json } +``` + + + + + +```tql +accept_opensearch "0.0.0.0:9200" +``` + + + + + +```tql +accept_tcp "0.0.0.0:8090" { read_json } +``` + + + ```tql @@ -2187,6 +2263,24 @@ from_http "0.0.0.0:8080" + + +```tql +from_stdin { read_json } +``` + + + + + +```tql +from_tcp "example.org:4000" { + read_json +} +``` + + + ```tql @@ -2195,10 +2289,18 @@ from_kafka "logs" - + + +```tql +from_mysql table="users", host="db.example.com", database="mydb" +``` + + + + ```tql -from_opensearch +from_nic "eth0" ``` @@ -2461,6 +2563,22 @@ save_zmq + + +```tql +serve_http "0.0.0.0:8080" { write_ndjson } +``` + + + + + +```tql +serve_tcp "0.0.0.0:8090" { write_json } +``` + + + ```tql @@ -2533,6 +2651,14 @@ to_hive "s3://…", partition_by=[x] + + +```tql +to_http "https://example.com/webhook" +``` + + + ```tql @@ -2573,4 +2699,12 @@ to_splunk "localhost:8088", … + + +```tql +to_tcp "collector.example.com:5044" { write_json } +``` + + + diff --git a/src/content/docs/reference/operators/accept_http.mdx b/src/content/docs/reference/operators/accept_http.mdx new file mode 100644 index 000000000..6f134d7df --- /dev/null +++ b/src/content/docs/reference/operators/accept_http.mdx @@ -0,0 +1,159 @@ +--- +title: accept_http +category: Inputs/Events +example: 'accept_http "0.0.0.0:8080" { read_json }' +--- + +import Op from '@components/see-also/Op.astro'; +import Guide from '@components/see-also/Guide.astro'; +import Integration from '@components/see-also/Integration.astro'; + +Accepts incoming HTTP requests and forwards them as events. + +```tql +accept_http url:string, [responses=record, max_request_size=int, + max_connections=int, tls=record] + { … } +``` + +## Description + +The `accept_http` operator starts an HTTP/1.1 server on the given address and +forwards incoming requests as events. Each request spawns a sub-pipeline that +processes the request body independently. + +The sub-pipeline has access to a `$request` variable containing the request +metadata. + +### `url: string` + +The endpoint to listen on. Must have the form `:`. Use `0.0.0.0` to +accept connections on all interfaces. IPv6 addresses are not supported. + +### `responses = record (optional)` + +Specify custom responses for endpoints on the server. For example, + +```tql +responses = { + "/resource/create": { code: 200, content_type: "text/html", body: "Created!" }, + "/resource/delete": { code: 401, content_type: "text/html", body: "Unauthorized!" } +} +``` + +creates two special routes on the server with different responses. + +Each route must be a record with `code`, `content_type`, and `body` fields. + +Requests to an unspecified endpoint are responded with HTTP Status `200 OK`. + +### `max_request_size = int (optional)` + +The maximum size of an incoming request to accept. Requests that exceed this +limit are rejected with HTTP `413 Content Too Large`. + +Defaults to `10MiB`. + +### `max_connections = int (optional)` + +The maximum number of simultaneous incoming connections to accept. Connections +that exceed this limit are rejected with HTTP `503 Service Unavailable`. + +Defaults to `10`. + +import TLSOptions from '@partials/operators/TLSOptions.mdx'; + + + +### `{ … }` + +The pipeline to run for each incoming HTTP request. Inside the pipeline, the +`$request` variable is available as a record with the following fields: + +| Field | Type | Description | +| :--------- | :------- | :----------------------------------- | +| `headers` | `record` | The request headers. | +| `query` | `record` | The query parameters of the request. | +| `path` | `string` | The path requested. | +| `fragment` | `string` | The URI fragment of the request. | +| `method` | `string` | The HTTP method of the request (lowercase, e.g. `"post"`). | +| `version` | `string` | The HTTP version of the request. | +| `body` | `blob` | The raw request body. | + +## Examples + +### Accept JSON requests on port 8080 + +Listen on all interfaces and parse incoming request bodies as JSON: + +```tql +accept_http "0.0.0.0:8080" { + read_json +} +``` + +Send a request to the endpoint via `curl`: + +```bash +echo '{"key": "value"}' | curl localhost:8080 --data-binary @- -H 'Content-Type: application/json' +``` + +### Filter requests by path + +Use the `$request` variable to filter or route requests: + +```tql +accept_http "0.0.0.0:8080" { + read_json + where $request.path == "/events" and $request.method == "post" +} +``` + +### Custom responses per endpoint + +Return different HTTP responses based on the request path: + +```tql +accept_http "0.0.0.0:8080", + responses={ + "/webhook": { + code: 201, + content_type: "text/plain", + body: "accepted", + }, + } { + read_json + where $request.path == "/webhook" +} +``` + +### Accept HTTPS requests with TLS + +```tql +accept_http "0.0.0.0:8443", + tls={ + certfile: "/path/to/cert.pem", + keyfile: "/path/to/key.pem", + } { + read_json +} +``` + +### Capture all request metadata + +```tql +accept_http "0.0.0.0:8443" { + read_json + metadata = $request +} +``` + +## See Also + +- from_http +- http +- to_http +- serve_http +- serve +- collecting/fetch-via-http-and-apis +- http diff --git a/src/content/docs/reference/operators/accept_opensearch.mdx b/src/content/docs/reference/operators/accept_opensearch.mdx new file mode 100644 index 000000000..504bdefca --- /dev/null +++ b/src/content/docs/reference/operators/accept_opensearch.mdx @@ -0,0 +1,86 @@ +--- +title: accept_opensearch +category: Inputs/Events +example: 'accept_opensearch "0.0.0.0:9200"' +--- + +Accepts incoming OpenSearch Bulk API requests and forwards them as events. + +```tql +accept_opensearch [url:string, keep_actions=bool, max_request_size=int, + tls=record] +``` + +## Description + +The `accept_opensearch` operator starts an OpenSearch-compatible HTTP server and +accepts bulk ingestion requests on `/_bulk`. + +For each bulk request, the operator buffers the request body in memory, up to +`max_request_size`, optionally decompresses it based on the HTTP +`Content-Encoding` header, parses the NDJSON payload, and emits the resulting +records as events. + +By default, the operator drops OpenSearch action objects such as +`{"create": ...}` and emits only the document records. To keep the action +objects, set `keep_actions=true`. + +The operator also responds to `GET /` with a minimal OpenSearch-compatible info +response so that basic health checks and client probes succeed. + +### `url: string (optional)` + +The endpoint to listen on. + +Use the form `host:port`, `[host]:port`, `http://host:port`, or +`https://host:port`. + +Defaults to `"0.0.0.0:9200"`. + +### `keep_actions = bool (optional)` + +Whether to keep action objects such as `{"create": ...}`. + +Defaults to `false`. + +### `max_request_size = int (optional)` + +The maximum size of an incoming request to accept. + +Requests that exceed this limit are rejected with HTTP `413 Content Too Large`. + +Defaults to `10MiB`. + +import TLSOptions from '@partials/operators/TLSOptions.mdx'; + + + +## Examples + +### Listen on port 8080 + +```tql +accept_opensearch "0.0.0.0:8080" +``` + +### Keep action objects in the output + +```tql +accept_opensearch keep_actions=true +``` + +### Accept HTTPS requests with TLS + +```tql +accept_opensearch "0.0.0.0:8443", + tls={ + certfile: "/path/to/cert.pem", + keyfile: "/path/to/key.pem", + } +``` + +## See Also + +- to_opensearch +- opensearch +- elasticsearch diff --git a/src/content/docs/reference/operators/accept_tcp.mdx b/src/content/docs/reference/operators/accept_tcp.mdx new file mode 100644 index 000000000..7bc35d167 --- /dev/null +++ b/src/content/docs/reference/operators/accept_tcp.mdx @@ -0,0 +1,106 @@ +--- +title: accept_tcp +category: Inputs/Events +example: 'accept_tcp "0.0.0.0:8090" { read_json }' +--- + +import Op from '@components/see-also/Op.astro'; +import Integration from '@components/see-also/Integration.astro'; + +Listens for incoming TCP or TLS connections and receives events. + +```tql +accept_tcp endpoint:string, [max_connections=int, resolve_hostnames=bool, + tls=record { … }] +``` + +## Description + +Listens on the specified endpoint for incoming TCP connections. For each +accepted connection, the operator spawns the nested pipeline and feeds it the +bytes received from that connection. + +### `endpoint: string` + +The endpoint to listen on. Must be of the form `[tcp://]:`. Use +`0.0.0.0` as the host to accept connections on all interfaces. + +import TLSOptions from '@partials/operators/TLSOptions.mdx'; + + + +### `max_connections = int (optional)` + +The maximum number of simultaneous incoming connections to accept. Additional +connections beyond this limit are rejected. + +Defaults to `128`. + +### `resolve_hostnames = bool (optional)` + +Perform reverse DNS lookups for accepted peers and expose the result as +`$peer.hostname` inside the nested pipeline. + +When enabled, the `hostname` field exists on `$peer` and is `null` if no PTR +record is available. When disabled, the `hostname` field is omitted. + +Defaults to `false`. + +### `{ … } (optional)` + +The pipeline to run for each individual TCP connection. If none is specified, no +transformations are applied to the output streams. Unless you are sure that +there is at most one active connection at a time, it is recommended to specify a +pipeline that parses the individual connection streams into events, for instance +`{ read_json }`. Otherwise, the output can be interleaved. + +Inside the pipeline, the `$peer` variable is available as a record with the +following fields: + +| Field | Type | Description | +| :--------- | :------ | :----------------------------------------------------------------------------------------------------------------------------- | +| `ip` | `ip` | The IP address of the connected peer. | +| `port` | `int64` | The port number of the connected peer. | +| `hostname` | `string` | The reverse-DNS hostname of the connected peer. | + +## Examples + +### Accept incoming JSON over TCP + +```tql +accept_tcp "0.0.0.0:8090" { + read_json +} +``` + +### Accept incoming Syslog over TCP + +```tql +accept_tcp "0.0.0.0:514" { + read_syslog +} +``` + +### Enrich events with peer hostnames + +```tql +accept_tcp "0.0.0.0:514", resolve_hostnames=true { + read_syslog + collector = $peer +} +``` + +### Accept connections with TLS + +```tql +accept_tcp "0.0.0.0:4443", tls={certfile: "cert.pem", keyfile: "key.pem"} { + read_json +} +``` + +## See Also + +- from_tcp +- to_tcp +- serve_tcp +- tcp diff --git a/src/content/docs/reference/operators/batch.md b/src/content/docs/reference/operators/batch.md index 186b74e7a..d50c5e8e4 100644 --- a/src/content/docs/reference/operators/batch.md +++ b/src/content/docs/reference/operators/batch.md @@ -15,12 +15,16 @@ batch [limit:int, timeout=duration] The `batch` operator takes its input and rewrites it into batches of up to the desired size. -:::caution[Expert Operator] +:::caution[Advanced feature] The `batch` operator is a lower-level building block that lets users explicitly control batching, which otherwise is controlled automatically by Tenzir's underlying pipeline execution engine. Use with caution! ::: +Note that the operator maintains separate buffers for each distinct schema. Each +buffer has independent timeout tracking and fills until reaching the `limit`, at +which point it flushes immediately. + ### `limit: int (optional)` How many events to put into one batch at most. @@ -29,5 +33,7 @@ Defaults to `65536`. ### `timeout = duration (optional)` -Specifies a maximum latency for events passing through the batch operator. When -unspecified, an infinite duration is used. +Specifies a maximum latency for events passing through the batch operator. If no +new events arrive within the timeout period, any buffered events are flushed. + +Defaults to `1min`. diff --git a/src/content/docs/reference/operators/dns_lookup.mdx b/src/content/docs/reference/operators/dns_lookup.mdx index cef0ededb..d453b7b1e 100644 --- a/src/content/docs/reference/operators/dns_lookup.mdx +++ b/src/content/docs/reference/operators/dns_lookup.mdx @@ -21,7 +21,10 @@ reverse lookup (IP to hostname) based on the field's content. - **Forward lookup**: When the field contains a string, the operator performs A and AAAA queries to find associated IP addresses. -The result is stored as a record in the specified result field. +The result is stored in the specified result field. + +Tenzir caches DNS results and reuses them across lookups. For forward lookups, +the `ttl` field shows the remaining lifetime of the cached answer. ### `field: ip|string` @@ -59,7 +62,9 @@ Where each record has the structure: } ``` -If the lookup fails or times out, the result field will be `null`. +If an individual lookup fails or times out, the result field will be `null`. +If Tenzir cannot initialize DNS resolution at all, the operator emits an error +and stops instead of writing `null` results. ## Examples diff --git a/src/content/docs/reference/operators/from.mdx b/src/content/docs/reference/operators/from.mdx index f077c073a..e017ae2fa 100644 --- a/src/content/docs/reference/operators/from.mdx +++ b/src/content/docs/reference/operators/from.mdx @@ -97,8 +97,8 @@ just plain `json`. ### The pipeline argument & its relation to the loader -Some loaders, such as the load_tcp operator, accept a sub-pipeline -directly. If the selected loader accepts a sub-pipeline, the `from` operator +Some loaders accept a sub-pipeline directly. +If the selected loader accepts a sub-pipeline, the `from` operator will dispatch decompression and parsing into that sub-pipeline. If a an explicit pipeline argument is provided it is forwarded as-is. If the loader does not accept a sub-pipeline, the decompression and parsing steps are simply performed @@ -138,7 +138,7 @@ load_tcp "tcp://0.0.0.0:12345", parallel=10 { | :-------------- | :------------------------------- | :---------------------------------------------- | | `abfs`,`abfss` | load_azure_blob_storage | `from "abfs://path/to/file.json"` | | `amqp` | load_amqp | `from "amqp://…` | -| `elasticsearch` | from_opensearch | `from "elasticsearch://1.2.3.4:9200` | +| `elasticsearch` | accept_opensearch | `from "elasticsearch://1.2.3.4:9200` | | `file` | load_file | `from "file://path/to/file.json"` | | `fluent-bit` | from_fluent_bit | `from "fluent-bit://elasticsearch"` | | `ftp`, `ftps` | load_ftp | `from "ftp://example.com/file.json"` | @@ -146,7 +146,7 @@ load_tcp "tcp://0.0.0.0:12345", parallel=10 { | `http`, `https` | load_http | `from "http://example.com/file.json"` | | `inproc` | load_zmq | `from "inproc://127.0.0.1:56789" { read_json }` | | `kafka` | load_kafka | `from "kafka://topic" { read_json }` | -| `opensearch` | from_opensearch | `from "opensearch://1.2.3.4:9200` | +| `opensearch` | accept_opensearch | `from "opensearch://1.2.3.4:9200` | | `s3` | load_s3 | `from "s3://bucket/file.json"` | | `sqs` | load_sqs | `from "sqs://my-queue" { read_json }` | | `tcp` | load_tcp | `from "tcp://127.0.0.1:13245" { read_json }` | diff --git a/src/content/docs/reference/operators/from_file.mdx b/src/content/docs/reference/operators/from_file.mdx index 95e48fd76..10ded2a93 100644 --- a/src/content/docs/reference/operators/from_file.mdx +++ b/src/content/docs/reference/operators/from_file.mdx @@ -10,7 +10,7 @@ Reads one or multiple files from a filesystem. ```tql from_file url:string, [watch=bool, remove=bool, rename=string->string, - path_field=field, max_age=duration] { … } + path_field=field, max_age=duration, mmap=bool] { … } ``` ## Description @@ -35,6 +35,13 @@ included in the URI as query parameters are `region`, `scheme`, +### `mmap = bool (optional)` + +Uses memory-mapped I/O for reading files instead of regular reads. This can +improve performance for large files. + +Defaults to `false`. + The pipeline uses the same logic as from. ## Examples diff --git a/src/content/docs/reference/operators/from_http.mdx b/src/content/docs/reference/operators/from_http.mdx index 8ca5085f8..10b29dbec 100644 --- a/src/content/docs/reference/operators/from_http.mdx +++ b/src/content/docs/reference/operators/from_http.mdx @@ -1,25 +1,27 @@ --- title: from_http category: Inputs/Events -example: 'from_http "0.0.0.0:8080"' +example: 'from_http "https://example.com/api"' --- -Sends and receives HTTP/1.1 requests. +import Op from '@components/see-also/Op.astro'; +import Guide from '@components/see-also/Guide.astro'; +import Integration from '@components/see-also/Integration.astro'; + +Sends an HTTP/1.1 request and returns the response as events. ```tql from_http url:string, [method=string, body=record|string|blob, encode=string, - headers=record, metadata_field=field, error_field=field, - paginate=record->string|string, paginate_delay=duration, - connection_timeout=duration, max_retry_count=int, - retry_delay=duration, tls=record] -from_http url:string, server=true, [metadata_field=field, responses=record, - max_request_size=int, max_connections=int, tls=record] + headers=record, error_field=field, paginate=string, + paginate_delay=duration, connection_timeout=duration, + max_retry_count=int, retry_delay=duration, tls=record] + { … } ``` ## Description -The `from_http` operator issues HTTP requests or spins up an HTTP/1.1 server on -a given address and forwards received requests as events. +The `from_http` operator issues an HTTP request and returns the response as +events. :::tip[Format and Compression Inference] @@ -32,87 +34,47 @@ If neither the URL nor the HTTP headers provide enough information, you can expl ### `url: string` -URL to listen on or to connect to. +URL to connect to. Both `http://` and `https://` schemes are supported. If the +scheme is omitted, `https://` is assumed. -Must have the form `:` when `server=true`. +The URL is resolved as a [secret](/explanations/secrets), so you can pass a +secret name to avoid hardcoding sensitive URLs. import HTTPClientOptions from '@partials/operators/HTTPClientOptions.mdx'; -### `metadata_field = field (optional)` - -Field to insert metadata into when using the parsing pipeline. - -The response metadata (when using the client mode) has the following schema: - -| Field | Type | Description | -| :-------- | :------- | :------------------------------------ | -| `code` | `uint64` | The HTTP status code of the response. | -| `headers` | `record` | The response headers. | - -The request metadata (when using the server mode) has the following schema: - -| Field | Type | Description | -| :--------- | :------- | :----------------------------------- | -| `headers` | `record` | The request headers. | -| `query` | `record` | The query parameters of the request. | -| `path` | `string` | The path requested. | -| `fragment` | `string` | The URI fragment of the request. | -| `method` | `string` | The HTTP method of the request. | -| `version` | `string` | The HTTP version of the request. | - ### `error_field = field (optional)` Field to insert the response body for HTTP error responses (status codes not in the 2xx or 3xx range). When set, any HTTP response with a status code outside the 200–399 range will -have its body stored in this field as a `blob`. Otherwise, error responses, -alongside the original event, are skipped and an error is emitted. - -### `server = bool (optional)` - -Whether to spin up an HTTP server or act as an HTTP client. - -Defaults to `false`, i.e., the HTTP client. - -### `responses = record (optional)` - -Specify custom responses for endpoints on the server. For example, - -```tql -responses = { - "/resource/create": { code: 200, content_type: "text/html", body: "Created!" }, - "/resource/delete": { code: 401, content_type: "text/html", body: "Unauthorized!" } -} -``` - -creates two special routes on the server with different responses. - -Requests to an unspecified endpoint are responded with HTTP Status `200 OK`. - -### `max_request_size = int (optional)` - -The maximum size of an incoming request to accept. - -Defaults to `10MiB`. - -### `max_connections = int (optional)` - -The maximum number of simultaneous incoming connections to accept. - -Defaults to `10`. +have its body stored in this field as a `blob`. Otherwise, error responses are +skipped and an error is emitted. import TLSOptions from '@partials/operators/TLSOptions.mdx'; +### Migration from `server=true` + +The `server=true` flag is no longer supported. Use accept_http to +listen for incoming HTTP requests. + ### `{ … } (optional)` A pipeline that receives the response body as bytes, allowing parsing per request. This is especially useful in scenarios where the response body can be parsed into multiple events. +Inside the pipeline, the `$response` variable is available as a record with the +following fields: + +| Field | Type | Description | +| :-------- | :------- | :------------------------------------ | +| `code` | `uint64` | The HTTP status code of the response. | +| `headers` | `record` | The response headers. | + If not provided, the operator will attempt to infer the parsing operator from the `Content-Type` header. Should this inference fail (e.g., unsupported or missing `Content-Type`), the operator raises an error. @@ -149,17 +111,26 @@ head 1 } ``` -### Paginate with a Lambda - -Use the `paginate` parameter with a lambda to extract the next page URL from the -response body: +### Send a POST request with JSON body ```tql -from_http "https://api.example.com/data", paginate=(x => x.next_url?) +from_http "https://httpbin.org/post", body={key: "value"}, encode="json" { + read_json +} ``` -This sends a GET request to the initial URL and evaluates the `x.next_url` field -in the response to determine the next URL for subsequent requests. +### Access response metadata + +Use the `$response` variable inside a parsing pipeline to access the HTTP +response code and headers: + +```tql +from_http "https://example.com/api", method="put" { + read_json + where $response.code == 200 + response = $response +} +``` ### Paginate via Link Headers @@ -185,44 +156,12 @@ from_http "https://api.example.com/data", max_retry_count=3, retry_delay=2s This tries up to 3 times, waiting 2 seconds between each retry. -### Listen on port 8080 - -Spin up a server with: - -```tql -from_http "0.0.0.0:8080", server=true, metadata_field=metadata -``` - -Send a request to the HTTP endpoint via `curl`: - -```sh -echo '{"key": "value"}' | gzip | curl localhost:8080 --data-binary @- -H 'Content-Encoding: gzip' -H 'Content-Type: application/json' -``` - -Observe the request in the Tenzir pipeline, parsed and decompressed: - -```tql -{ - key: "value", - metadata: { - headers: { - Host: "localhost:8080", - "User-Agent": "curl/8.13.0", - Accept: "*/*", - "Content-Encoding": "gzip", - "Content-Length": "37", - "Content-Type": "application/json", - }, - path: "/", - method: "post", - version: "HTTP/1.1", - }, -} -``` - ## See Also +- accept_http - http +- to_http +- serve_http - serve - collecting/fetch-via-http-and-apis - enrichment/enrich-with-threat-intel diff --git a/src/content/docs/reference/operators/from_mysql.mdx b/src/content/docs/reference/operators/from_mysql.mdx new file mode 100644 index 000000000..cdfece6f3 --- /dev/null +++ b/src/content/docs/reference/operators/from_mysql.mdx @@ -0,0 +1,242 @@ +--- +title: from_mysql +category: Inputs/Events +example: 'from_mysql table="users", host="db.example.com", database="mydb"' +--- + +import Op from '@components/see-also/Op.astro'; +import Integration from '@components/see-also/Integration.astro'; + +Reads events from a MySQL database. + +```tql +from_mysql [uri=string], [table=string], [sql=string], [show=string], + [live=bool], [tracking_column=string], + [host=string], [port=int], [user=string], [password=string], + [database=string], [tls=bool|record] +``` + +## Description + +The `from_mysql` operator reads data from a MySQL database. You can query data +using a table name, raw SQL, or retrieve database metadata. + +The operator supports three primary query modes: + +1. **Table mode**: Read all rows from a table using the `table` parameter. +2. **SQL mode**: Execute a custom SQL query using the `sql` parameter. +3. **Show mode**: List database metadata using the `show` parameter. + When `show="columns"`, also set `table` to the table name. + +Internal metadata queries, such as resolving tracking columns for live mode, +validate user-provided table and column names against a safe list and use MySQL +prepared statements to safely bind values. + +### `uri = string (optional)` + +A MySQL connection URI in the format: + +``` +mysql://[user[:password]@]host[:port][/database] +``` + +When provided, the URI takes precedence over individual connection parameters. +Credentials in the URI can be overridden by explicit `user` and `password` +parameters. + +### `table = string (optional)` + +The name of the table to read from. This is mutually exclusive with `sql`. +When `show="columns"`, set `table` to the table name. + +### `sql = string (optional)` + +A raw SQL query to execute. This is mutually exclusive with `table` and `show`. + +Use raw strings for complex queries: + +```tql +from_mysql sql=r"SELECT id, name FROM users WHERE active = 1" +``` + +### `show = string (optional)` + +Retrieve database metadata. This is mutually exclusive with `sql`. +When `show="columns"`, set `table` to the table name. + +Supported values: + +- `"tables"`: List all tables in the database +- `"columns"`: List all columns for the table specified in `table` + +### `live = bool (optional)` + +Enables continuous polling for new rows from a table. The operator tracks +progress using a watermark on an integer column and polls every second for rows +above the last-seen value. Mutually exclusive with `sql` and `show`. Requires +`table`. + +Defaults to `false`. + +### `tracking_column = string (optional)` + +The integer column to use for watermark tracking in live mode. The operator +queries for rows where this column exceeds the last-seen watermark. + +When omitted, the tracking column is auto-detected from the table's +auto-increment primary key. Requires `live=true`. + +### `host = string (optional)` + +The hostname or IP address of the MySQL server. + +Defaults to `"localhost"`. + +### `port = int (optional)` + +The port number of the MySQL server. + +Defaults to `3306`. + +### `user = string (optional)` + +The username for authentication. Supports the `secret` function for secure +credential management. + +Defaults to `"root"`. + +### `password = string (optional)` + +The password for authentication. Supports the `secret` function for secure +credential management. + +Defaults to `""`. + +### `database = string (optional)` + +The database to connect to. + +### `tls = bool|record (optional)` + +TLS configuration for the MySQL connection. Defaults to `false` (no TLS). + +Use `tls=true` to enable TLS with default settings and certificate verification, +or provide a record to customize specific options: + +```tql +{ + skip_peer_verification: bool, // skip certificate verification. + cacert: string, // CA bundle to verify peers. + certfile: string, // client certificate to present. + keyfile: string, // private key for the client certificate. +} +``` + +## Types + +The operator maps MySQL types to types as follows: + +| MySQL Type | Tenzir Type | Notes | +| :---------------------------- | :------------ | :--------------------- | +| `TINYINT(1)` | `bool` | Boolean representation | +| `TINYINT`, `SMALLINT`, `INT` | `int64` | | +| `BIGINT` | `int64` | | +| `BIGINT UNSIGNED` | `uint64` | | +| `FLOAT`, `DOUBLE` | `double` | | +| `DECIMAL`, `NUMERIC` | `double` | May lose precision | +| `DATE`, `DATETIME` | `time` | | +| `TIMESTAMP` | `time` | | +| `TIME` | `duration` | | +| `CHAR`, `VARCHAR`, `TEXT` | `string` | | +| `BINARY`, `VARBINARY`, `BLOB` | `blob` | | +| `JSON` | `string` | | +| `ENUM` | `enumeration` | | + +## Examples + +### Read all rows from a table + +```tql +from_mysql table="users", host="db.example.com", database="mydb" +``` + +### Use a connection URI + +```tql +from_mysql uri="mysql://admin:secret@db.example.com:3306/production", table="events" +``` + +### Execute a custom SQL query + +```tql +from_mysql sql=r"SELECT id, name, created_at FROM users WHERE active = 1 LIMIT 100", + host="localhost", database="app" +``` + +### Use secure credentials + +```tql +from_mysql table="orders", + host="db.example.com", + user=secret("mysql-user"), + password=secret("mysql-password"), + database="shop" +``` + +### List all tables in a database + +```tql +from_mysql show="tables", host="localhost", database="mydb" +``` + +### List columns for a specific table + +```tql +from_mysql show="columns", table="users", host="localhost", database="mydb" +``` + +### Enable TLS with defaults + +```tql +from_mysql table="events", + host="db.example.com", + database="production", + tls=true +``` + +### Connect with TLS but skip peer verification + +```tql +from_mysql table="events", + host="db.example.com", + database="production", + tls={skip_peer_verification: true} +``` + +### Connect with TLS using a CA certificate + +```tql +from_mysql table="events", + host="db.example.com", + database="production", + tls={cacert: "/path/to/ca.pem"} +``` + +### Stream new rows from a table + +```tql +from_mysql table="events", live=true, + host="db.example.com", database="mydb" +``` + +### Stream with an explicit tracking column + +```tql +from_mysql table="events", live=true, tracking_column="event_id", + host="db.example.com", database="mydb" +``` + +## See Also + +- to_clickhouse +- mysql diff --git a/src/content/docs/reference/operators/from_nic.mdx b/src/content/docs/reference/operators/from_nic.mdx new file mode 100644 index 000000000..b2969ef3e --- /dev/null +++ b/src/content/docs/reference/operators/from_nic.mdx @@ -0,0 +1,86 @@ +--- +title: from_nic +category: Inputs/Events +example: 'from_nic "eth0"' +--- + +Captures packets from a network interface and outputs events. + +```tql +from_nic iface:string, [snaplen=int, filter=string] { … } +``` + +## Description + +The `from_nic` operator captures packets with libpcap and forwards them as events. + +If you omit the optional pipeline, `from_nic` uses read_pcap by default. Provide a pipeline when you want to change how the captured PCAP byte stream is parsed. The pipeline must accept bytes and return events. + +Use `filter` to apply a Berkeley Packet Filter (BPF) expression before Tenzir parses packets. This lets libpcap drop unwanted traffic early. + +### `iface: string` + +The interface to capture packets from. + +### `snaplen = int (optional)` + +Sets the snapshot length of captured packets. + +This value is an upper bound on the packet size. Packets larger than this size get truncated to `snaplen` bytes. + +Defaults to `262144`. + +### `filter = string (optional)` + +Applies a Berkeley Packet Filter (BPF) expression to the capture. + +The filter runs in libpcap before Tenzir parses packets. Use the same filter syntax as `tcpdump`, for example `tcp port 443` or `host 10.0.0.1`. + +### `{ … } (optional)` + +An optional parsing pipeline for the captured PCAP byte stream. + +When omitted, `from_nic` defaults to: + +```tql +{ read_pcap } +``` + +Provide a custom pipeline when you want to adjust parsing behavior, for example to re-emit PCAP file headers. + +## Examples + +### Capture packets from `en1` + +```tql +from_nic "en1" +``` + +### Capture packets and re-emit file headers + +```tql +from_nic "en1" { + read_pcap emit_file_headers=true +} +``` + +### Capture only HTTPS traffic + +```tql +from_nic "en1", filter="tcp port 443" +``` + +### Write a live capture to a PCAP file + +```tql +from_nic "en1" +write_pcap +save_file "trace.pcap" +``` + +## See Also + +- nics +- read_pcap +- write_pcap +- nic diff --git a/src/content/docs/reference/operators/from_opensearch.mdx b/src/content/docs/reference/operators/from_opensearch.mdx index 0501b8107..b8e30674d 100644 --- a/src/content/docs/reference/operators/from_opensearch.mdx +++ b/src/content/docs/reference/operators/from_opensearch.mdx @@ -4,58 +4,6 @@ category: Inputs/Events example: 'from_opensearch' --- -Receives events via Opensearch Bulk API. +The `from_opensearch` operator is no longer available. -```tql -from_opensearch [url:string, keep_actions=bool, max_request_size=int, tls=record] -``` - -## Description - -The `from_opensearch` operator emulates simple situations for the [Opensearch -Bulk -API](https://opensearch.org/docs/latest/api-reference/document-apis/bulk/). - -### `url: string (optional)` - -URL to listen on. - -Must have the form `host[:port]`. - -Defaults to `"0.0.0.0:9200"`. - -### `keep_actions = bool (optional)` - -Whether to keep the command objects such as `{"create": ...}`. - -Defaults to `false`. - -### `max_request_size = int (optional)` - -The maximum size of an incoming request to accept. - -Defaults to `10Mib`. - -import TLSOptions from '@partials/operators/TLSOptions.mdx'; - - - -## Examples - -### Listen on port 8080 on an interface with IP 1.2.3.4 - -```tql -from_opensearch "1.2.3.4:8080" -``` - -### Listen with TLS - -```tql -from_opensearch tls=true, certfile="server.crt", keyfile="private.key" -``` - -## See Also - -- to_opensearch -- opensearch -- elasticsearch +Use accept_opensearch instead. diff --git a/src/content/docs/reference/operators/from_stdin.mdx b/src/content/docs/reference/operators/from_stdin.mdx new file mode 100644 index 000000000..b2c63ba46 --- /dev/null +++ b/src/content/docs/reference/operators/from_stdin.mdx @@ -0,0 +1,74 @@ +--- +title: from_stdin +category: Inputs/Events +example: 'from_stdin { read_json }' +--- + +import Op from '@components/see-also/Op.astro'; + +Reads and parses events from standard input. + +```tql +from_stdin { … } +``` + +## Description + +The `from_stdin` operator reads bytes from standard input and passes them +through the provided parsing pipeline to produce events. This is useful when +piping data into the `tenzir` executable as part of a shell script or command +chain. + +### `{ … }` + +The pipeline to parse the incoming bytes into events. The pipeline receives raw +bytes and must produce events. For example, `{ read_json }` parses the input as +JSON. + +## Examples + +### Parse JSON from standard input + +```sh +echo '{"foo": 42}' | tenzir +``` + +```tql +from_stdin { + read_json +} +``` + +```tql +{ + foo: 42, +} +``` + +### Parse CSV data piped from another command + +```sh +cat data.csv | tenzir -f pipeline.tql +``` + +```tql +from_stdin { + read_csv +} +``` + +### Parse Syslog messages + +```sh +tail -f /var/log/syslog | tenzir -f pipeline.tql +``` + +```tql +from_stdin { + read_syslog +} +``` + +## See Also + +- from_file diff --git a/src/content/docs/reference/operators/from_tcp.mdx b/src/content/docs/reference/operators/from_tcp.mdx new file mode 100644 index 000000000..27ee04641 --- /dev/null +++ b/src/content/docs/reference/operators/from_tcp.mdx @@ -0,0 +1,76 @@ +--- +title: from_tcp +category: Inputs/Events +example: 'from_tcp "example.org:4000" { read_json }' +--- + +import Op from '@components/see-also/Op.astro'; +import Integration from '@components/see-also/Integration.astro'; + +Connects to a remote TCP or TLS endpoint and receives events. + +```tql +from_tcp endpoint:string, [tls=record, { … }] +``` + +## Description + +Connects to the specified TCP endpoint as a client and reads bytes from the +connection, running them through an optional nested pipeline. + +If the connection fails, the operator retries with exponential backoff. + +### `endpoint: string` + +The remote endpoint to connect to. Must be of the form +`[tcp://]:`. + +import TLSOptions from '@partials/operators/TLSOptions.mdx'; + + + +### `{ … } (optional)` + +The pipeline to run for the TCP connection. Use this to parse the incoming byte +stream into events, for instance `{ read_json }`. + +Inside the pipeline, the `$peer` variable is available as a record with the +following fields: + +| Field | Type | Description | +| :----- | :------ | :--------------------------------- | +| `ip` | `ip` | The IP address of the remote peer | +| `port` | `int64` | The port number of the remote peer | + +## Examples + +### Connect to a remote server and read JSON + +```tql +from_tcp "example.org:4000" { + read_json +} +``` + +### Connect with TLS + +```tql +from_tcp "example.org:4443", tls={} { + read_json +} +``` + +### Connect with TLS and a custom CA certificate + +```tql +from_tcp "example.org:4443", tls={cacert: "ca.pem"} { + read_json +} +``` + +## See Also + +- to_tcp +- accept_tcp +- serve_tcp +- tcp diff --git a/src/content/docs/reference/operators/http.mdx b/src/content/docs/reference/operators/http.mdx index f2db358c6..fce16f782 100644 --- a/src/content/docs/reference/operators/http.mdx +++ b/src/content/docs/reference/operators/http.mdx @@ -193,5 +193,9 @@ header with `rel=next`, such as GitHub, GitLab, and Jira. ## See Also +- accept_http - from_http +- to_http +- serve_http - collecting/fetch-via-http-and-apis +- routing/expose-data-as-server diff --git a/src/content/docs/reference/operators/load_balance.mdx b/src/content/docs/reference/operators/load_balance.mdx index 4e00e20db..b13403a4f 100644 --- a/src/content/docs/reference/operators/load_balance.mdx +++ b/src/content/docs/reference/operators/load_balance.mdx @@ -49,8 +49,7 @@ let $cfg = ["192.168.0.30:8080", "192.168.0.30:8081"] subscribe "input" load_balance $cfg { - write_json - save_tcp $cfg + to_tcp $cfg { write_json } } ``` diff --git a/src/content/docs/reference/operators/load_http.mdx b/src/content/docs/reference/operators/load_http.mdx index 8508db051..bafc0f47a 100644 --- a/src/content/docs/reference/operators/load_http.mdx +++ b/src/content/docs/reference/operators/load_http.mdx @@ -68,8 +68,6 @@ Defaults to `false`. import TLSOptions from '@partials/operators/TLSOptions.mdx'; - - ## Examples diff --git a/src/content/docs/reference/operators/load_nic.mdx b/src/content/docs/reference/operators/load_nic.mdx index a4b83433f..76e1105c4 100644 --- a/src/content/docs/reference/operators/load_nic.mdx +++ b/src/content/docs/reference/operators/load_nic.mdx @@ -4,7 +4,7 @@ category: Inputs/Bytes example: 'load_nic "eth0"' --- -Loads bytes from a network interface card (NIC). +Captures raw PCAP bytes from a network interface. [pcap-rfc]: https://datatracker.ietf.org/doc/id/draft-gharris-opsawg-pcap-00.html @@ -14,11 +14,11 @@ load_nic iface:str, [snaplen=int, emit_file_headers=bool] ## Description -The `load_nic` operator uses libpcap to acquire packets from a network interface and -packs them into blocks of bytes that represent PCAP packet records. +The `load_nic` operator uses libpcap to acquire packets from a network interface and packs them into blocks of bytes that represent PCAP packet records. -The received first packet triggers also emission of PCAP file header such that -downstream operators can treat the packet stream as valid PCAP capture file. +The first captured packet also triggers emission of a PCAP file header so downstream operators can treat the packet stream as a valid PCAP capture file. + +Use read_pcap to parse the emitted PCAP byte stream into packet events. ### `iface: str` @@ -28,8 +28,7 @@ The interface to load bytes from. Sets the snapshot length of the captured packets. -This value is an upper bound on the packet size. Packets larger than this size -get truncated to `snaplen` bytes. +This value is an upper bound on the packet size. Packets larger than this size get truncated to `snaplen` bytes. Defaults to `262144`. @@ -37,17 +36,19 @@ Defaults to `262144`. Creates PCAP file headers for every flushed batch. -The operator emits chunk of bytes that represent a stream of packets. -When setting `emit_file_headers` every chunk gets its own PCAP file header, as -opposed to just the very first. This yields a continuous stream of concatenated -PCAP files. +The operator emits chunks of bytes that represent a stream of packets. When setting `emit_file_headers`, every chunk gets its own PCAP file header instead of only the very first one. This yields a continuous stream of concatenated PCAP files. -Our read_pcap operator can handle such concatenated traces, and -optionally re-emit thes file headers as separate events. +Our read_pcap operator can handle such concatenated traces and optionally re-emit those file headers as separate events. ## Examples -### Read PCAP packets from `eth0` +### Load raw PCAP bytes from `eth0` + +```tql +load_nic "eth0" +``` + +### Parse packets from `eth0` ```tql load_nic "eth0" @@ -58,13 +59,12 @@ read_pcap ```tql load_nic "en0" -read_pcap -write_pcap save_file "trace.pcap" ``` ## See Also +- from_nic - nics - read_pcap - write_pcap diff --git a/src/content/docs/reference/operators/load_tcp.mdx b/src/content/docs/reference/operators/load_tcp.mdx index 3d2549b6d..f9cc4f446 100644 --- a/src/content/docs/reference/operators/load_tcp.mdx +++ b/src/content/docs/reference/operators/load_tcp.mdx @@ -4,6 +4,14 @@ category: Inputs/Bytes example: 'load_tcp "0.0.0.0:8090" { read_json }' --- +import Op from '@components/see-also/Op.astro'; +import Integration from '@components/see-also/Integration.astro'; + +:::caution[Deprecated] +Use from_tcp for client connections or accept_tcp for server +connections instead. +::: + Loads bytes from a TCP or TLS connection. ```tql @@ -44,8 +52,6 @@ the `hostname` field in the peer record (see `peer_field`). When disabled, the Defaults to `false`. - - ### `max_buffered_chunks = int (optional)` @@ -138,5 +144,9 @@ openssl s_client -connect 127.0.0.1:4000 -cert client.pem -key client-key.pem -C ## See Also +- from_tcp +- to_tcp +- accept_tcp +- serve_tcp - save_tcp - tcp diff --git a/src/content/docs/reference/operators/measure.md b/src/content/docs/reference/operators/measure.md index 749ad383e..4f6ae2065 100644 --- a/src/content/docs/reference/operators/measure.md +++ b/src/content/docs/reference/operators/measure.md @@ -7,7 +7,7 @@ example: 'measure' Replaces the input with metrics describing the input. ```tql -measure [real_time=bool, cumulative=bool] +measure [cumulative=bool] ``` ## Description @@ -31,13 +31,6 @@ type tenzir.measure.bytes = record{ } ``` -### `real_time = bool (optional)` - -Whether to emit metrics immediately with every batch, rather than buffering -until the upstream operator stalls, i.e., is idle or waiting for further input. - -The is especially useful when `measure` should emit data without latency. - ### `cumulative = bool (optional)` Whether to emit running totals for the `events` and `bytes` fields rather than diff --git a/src/content/docs/reference/operators/nics.mdx b/src/content/docs/reference/operators/nics.mdx index 3be5b9377..ad889fa8d 100644 --- a/src/content/docs/reference/operators/nics.mdx +++ b/src/content/docs/reference/operators/nics.mdx @@ -53,4 +53,4 @@ where status.connected ## See Also -- load_nic +- from_nic diff --git a/src/content/docs/reference/operators/parallel.mdx b/src/content/docs/reference/operators/parallel.mdx index 4d79ba19b..cb1588734 100644 --- a/src/content/docs/reference/operators/parallel.mdx +++ b/src/content/docs/reference/operators/parallel.mdx @@ -7,38 +7,60 @@ example: 'parallel 4 { parsed = data.parse_json() }' Runs a subpipeline across multiple parallel workers. ```tql -parallel jobs:int { … } +parallel [jobs:int] [, route_by=any] { … } ``` ## Description -The `parallel` operator distributes incoming events across multiple parallel -instances of a subpipeline. Each event is processed by exactly one worker. +The parallel operator distributes incoming events across multiple +parallel instances of a subpipeline. Each event is processed by exactly one +worker. Use this operator to parallelize CPU-intensive transformations or I/O-bound operations that would otherwise bottleneck on a single thread. +By default, events are distributed across workers using an adaptive round-robin +strategy that keeps worker loads balanced. Use `route_by` to instead route +events deterministically by a key, ensuring that all events with the same key +value go to the same worker. + This operator may reorder the event stream since workers process events concurrently. +When used as a source operator (without upstream input), `parallel` spawns +multiple independent instances of the subpipeline. This is useful for running +the same source pipeline with concurrent connections. + :::caution[Expert Operator] The `parallel` operator is a building block for performance optimization. Use it when you have identified a specific bottleneck that benefits from parallelization. Not all operations scale linearly with parallelism. ::: -### `jobs: int` +### `jobs: int (optional)` + +The number of parallel workers to spawn. Must be greater than zero. Defaults to +the number of available CPU cores. + +### `route_by = any (optional)` -The number of parallel workers to spawn. Must be greater than zero. +An expression evaluated per event to determine which worker processes it. Events +with the same `route_by` value are always sent to the same worker. This +guarantees that related events are grouped together, which is required for +stateful subpipelines like deduplicate or summarize. + +Cannot be used when `parallel` is used as a source operator. ### `{ … }` -The subpipeline to run in parallel. The subpipeline receives events as input and -may either: +The subpipeline to run in parallel. The subpipeline may either: - Produce events as output (transformation) - End with a sink (void output) +When `parallel` is used as a source operator, the subpipeline runs as an +independent source producing events or as a full pipeline ending with a sink. + The subpipeline must not produce bytes as output. ## Examples @@ -54,6 +76,18 @@ parallel 4 { } ``` +### Route events by source IP + +Ensure events from the same source IP are always handled by the same worker, +enabling per-source deduplication: + +```tql +subscribe "events" +parallel route_by=src_ip { + deduplicate src_ip, dst_ip, dst_port +} +``` + ### Make parallel HTTP requests Send events to Google SecOps with 4 concurrent connections: diff --git a/src/content/docs/reference/operators/read_bitz.mdx b/src/content/docs/reference/operators/read_bitz.mdx index bc03bdfaf..337cf69de 100644 --- a/src/content/docs/reference/operators/read_bitz.mdx +++ b/src/content/docs/reference/operators/read_bitz.mdx @@ -24,9 +24,12 @@ write-once-read-many use cases. Internally, BITZ uses Arrow's IPC format for serialization and deserialization, but prefixes each message with a 64 bit size prefix to support changing schemas between batches—something that Arrow's IPC format does not support on its own. +Use write_bitz to create `.bitz` files that you can later load again +with read_bitz. ## See Also - read_feather - read_parquet - write_bitz +- parsing/parse-binary-data diff --git a/src/content/docs/reference/operators/read_delimited_regex.mdx b/src/content/docs/reference/operators/read_delimited_regex.mdx index b29d0e68d..02d14e7d9 100644 --- a/src/content/docs/reference/operators/read_delimited_regex.mdx +++ b/src/content/docs/reference/operators/read_delimited_regex.mdx @@ -49,8 +49,9 @@ default, the separator is excluded from the results. ### Split Syslog-like events without newline terminators from a TCP input ```tql -load_tcp "0.0.0.0:514" -read_delimited_regex "(?=<[0-9]+>)" +accept_tcp "0.0.0.0:514" { + read_delimited_regex "(?=<[0-9]+>)" +} this = data.parse_syslog() ``` diff --git a/src/content/docs/reference/operators/read_feather.mdx b/src/content/docs/reference/operators/read_feather.mdx index 4200b3c0c..b091b5a2b 100644 --- a/src/content/docs/reference/operators/read_feather.mdx +++ b/src/content/docs/reference/operators/read_feather.mdx @@ -25,7 +25,7 @@ Transforms the input [Feather] (a thin wrapper around ```tql load_file "logs.feather" read_feather -pulish "log" +publish "log" ``` ## See Also diff --git a/src/content/docs/reference/operators/read_gelf.mdx b/src/content/docs/reference/operators/read_gelf.mdx index 71a8159a5..3663c9181 100644 --- a/src/content/docs/reference/operators/read_gelf.mdx +++ b/src/content/docs/reference/operators/read_gelf.mdx @@ -25,8 +25,9 @@ import ParsingOptions from '@partials/operators/ParsingOptions.mdx'; ### Read a GELF stream from a TCP socket ```tql -load_tcp "0.0.0.0:54321" -read_gelf +accept_tcp "0.0.0.0:54321" { + read_gelf +} ``` ## See Also diff --git a/src/content/docs/reference/operators/read_lines.mdx b/src/content/docs/reference/operators/read_lines.mdx index 6c5553ee4..d3e175e92 100644 --- a/src/content/docs/reference/operators/read_lines.mdx +++ b/src/content/docs/reference/operators/read_lines.mdx @@ -60,16 +60,18 @@ is_error = line.starts_with("error:") Consider using read_delimited_regex for regex-based splitting: ```tql -load_tcp "0.0.0.0:514" -read_delimited_regex "(?=<[0-9]+>)" +accept_tcp "0.0.0.0:514" { + read_delimited_regex "(?=<[0-9]+>)" +} this = line.parse_syslog() ``` ::: ```tql -load_tcp "0.0.0.0:514" -read_lines split_at_regex="(?=<[0-9]+>)" +accept_tcp "0.0.0.0:514" { + read_lines split_at_regex="(?=<[0-9]+>)" +} this = line.parse_syslog() ``` diff --git a/src/content/docs/reference/operators/read_pcap.mdx b/src/content/docs/reference/operators/read_pcap.mdx index 5b072a4a3..0660f53c7 100644 --- a/src/content/docs/reference/operators/read_pcap.mdx +++ b/src/content/docs/reference/operators/read_pcap.mdx @@ -4,7 +4,7 @@ category: Parsing example: 'read_pcap' --- -Reads raw network packets in PCAP file format. +Parses PCAP byte streams into packet events. [pcap-rfc]: https://datatracker.ietf.org/doc/id/draft-gharris-opsawg-pcap-00.html @@ -14,8 +14,7 @@ read_pcap [emit_file_headers=bool] ## Description -The `read_pcap` operator converts raw bytes representing a [PCAP][pcap-rfc] file into -events. +The `read_pcap` operator converts raw bytes representing a [PCAP][pcap-rfc] file into events. [pcapng-rfc]: https://www.ietf.org/archive/id/draft-tuexen-opsawg-pcapng-05.html @@ -25,55 +24,64 @@ The current implementation does _not_ support [PCAPNG][pcapng-rfc]. ### `emit_file_headers = bool (optional)` -Emit a `pcap.file_header` event that represents the PCAP file header. If -present, the parser injects this additional event before the subsequent stream -of packets. +Emit a `pcap.file_header` event that represents the PCAP file header. If present, the parser injects this additional event before the subsequent stream of packets. -Emitting this extra event makes it possible to seed the -write_pcap operator with a file header from the input. This -allows for controlling the timestamp formatting (microseconds vs. nanosecond -granularity) and byte order in the packet headers. +Emitting this extra event makes it possible to seed write_pcap with a file header from the input. This allows you to preserve timestamp formatting (microseconds vs. nanoseconds) and byte order in packet headers. -When the PCAP parser processes a concatenated stream of PCAP files, specifying -`emit_file_headers` will also re-emit every intermediate file header as -separate event. +When the parser processes a concatenated stream of PCAP files, `emit_file_headers=true` also re-emits every intermediate file header as a separate event. -Use this option when you would like to reproduce the identical trace file layout -of the PCAP input. +Use this option when you want to reproduce the original trace layout. ## Schemas -The operator emits events with the following schema. +The operator emits events with the following schemas. + +### `pcap.file_header` + +Contains the global header for one PCAP trace. + +| Field | Type | Description | +| :-------------- | :------- | :------------------------------------------ | +| `magic_number` | `uint64` | The PCAP magic number. | +| `major_version` | `uint64` | The major PCAP format version. | +| `minor_version` | `uint64` | The minor PCAP format version. | +| `reserved1` | `uint64` | Reserved header field. | +| `reserved2` | `uint64` | Reserved header field. | +| `snaplen` | `uint64` | The maximum captured packet size. | +| `linktype` | `uint64` | The link-layer type for subsequent packets. | ### `pcap.packet` -Contains information about all accessed API endpoints, emitted once per second. +Contains one captured packet from the trace. | Field | Type | Description | | :----------------------- | :------- | :------------------------------------ | -| `timestamp` | `time` | The time of capturing the packet. | -| `linktype` | `uint64` | The linktype of the captured packet. | +| `timestamp` | `time` | The time when the packet was captured. | +| `linktype` | `uint64` | The link-layer type of the packet. | | `original_packet_length` | `uint64` | The length of the original packet. | | `captured_packet_length` | `uint64` | The length of the captured packet. | -| `data` | `blob` | The captured packet's data as a blob. | +| `data` | `blob` | The captured packet payload. | ## Examples ### Read packets from a PCAP file ```tql -load_file "/tmp/trace.pcap" -read_pcap +from_file "/tmp/trace.pcap" { + read_pcap +} ``` -### Read packets from the [network interface](/reference/operators/load_nic) `eth0` +### Capture packets from `en1` and preserve file headers ```tql -load_nic "eth0" -read_pcap +from_nic "en1" { + read_pcap emit_file_headers=true +} ``` ## See Also -- load_nic +- from_nic - write_pcap +- decapsulate diff --git a/src/content/docs/reference/operators/read_syslog.mdx b/src/content/docs/reference/operators/read_syslog.mdx index 5b6e7e906..234c96efe 100644 --- a/src/content/docs/reference/operators/read_syslog.mdx +++ b/src/content/docs/reference/operators/read_syslog.mdx @@ -176,7 +176,7 @@ When receiving syslog over TCP from systems that use RFC 6587 octet counting, the parser auto-detects and strips the length prefix: ```tql title="Pipeline" -load_tcp "0.0.0.0:514" { +accept_tcp "0.0.0.0:514" { read_syslog } ``` diff --git a/src/content/docs/reference/operators/read_tql.mdx b/src/content/docs/reference/operators/read_tql.mdx new file mode 100644 index 000000000..54dd91378 --- /dev/null +++ b/src/content/docs/reference/operators/read_tql.mdx @@ -0,0 +1,88 @@ +--- +title: read_tql +category: Parsing +example: 'read_tql' +--- + +import Op from '@components/see-also/Op.astro'; + +Parses an incoming byte stream of TQL-formatted records into events. + +```tql +read_tql [schema=string, selector=string, schema_only=bool, + merge=bool, raw=bool, unflatten_separator=string] +``` + +## Description + +Parses an incoming byte stream of TQL-formatted records into events. Each +top-level record expression in the input becomes one event. + +The input format matches the output of write_tql. This makes +`read_tql` useful for round-tripping data through TQL notation, +reading TQL-formatted files, or processing data piped from other Tenzir +pipelines. + +The parser supports all TQL literal types, including `null`, `bool`, `int64`, +`double`, `string`, `duration`, `time`, `ip`, and `subnet`, as well as nested +records and lists. + +import ParsingOptions from '@partials/operators/ParsingOptions.mdx'; + + + +## Examples + +### Read TQL records from a file + +```tql title="events.tql" +{name: "Tenzir", version: 4} +{name: "Suricata", version: 7} +``` + +```tql title="Pipeline" +from_file "events.tql" { + read_tql +} +``` + +```tql title="Output" +{ + name: "Tenzir", + version: 4, +} +{ + name: "Suricata", + version: 7, +} +``` + +### Read records with native types + +TQL notation supports types that JSON cannot represent natively, such as +durations, timestamps, IP addresses, and subnets. + +```tql title="input.tql" +{dur: 5s, ts: 2024-01-01T00:00:00.000000, addr: 192.168.1.1, net: 10.0.0.0/8} +``` + +```tql title="Pipeline" +from_file "input.tql" { + read_tql +} +``` + +```tql title="Output" +{ + dur: 5s, + ts: 2024-01-01T00:00:00Z, + addr: 192.168.1.1, + net: 10.0.0.0/8, +} +``` + +## See Also + +- write_tql +- read_json +- read_ndjson diff --git a/src/content/docs/reference/operators/save_tcp.mdx b/src/content/docs/reference/operators/save_tcp.mdx index d9b4b7a54..1f85c1f7a 100644 --- a/src/content/docs/reference/operators/save_tcp.mdx +++ b/src/content/docs/reference/operators/save_tcp.mdx @@ -4,6 +4,14 @@ category: Outputs/Bytes example: 'save_tcp "0.0.0.0:8090", tls=true' --- +import Op from '@components/see-also/Op.astro'; +import Integration from '@components/see-also/Integration.astro'; + +:::caution[Deprecated] +Use to_tcp for client connections or serve_tcp for server +connections instead. +::: + Saves bytes to a TCP or TLS connection. ```tql @@ -33,7 +41,7 @@ of a numeric port. The amount of time to wait before attempting to reconnect in case a connection attempt fails and the error is deemed recoverable. Defaults to `30s`. -### `max_retry_count = int (optional) +### `max_retry_count = int (optional)` The number of retries to attempt in case of connection errors before transitioning into the error state. Defaults to `10`. @@ -62,5 +70,6 @@ save_tcp "127.0.0.1:4000", tls=true, skip_peer_verification=true ## See Also +- from_tcp - load_tcp - tcp diff --git a/src/content/docs/reference/operators/serve_http.mdx b/src/content/docs/reference/operators/serve_http.mdx new file mode 100644 index 000000000..b90010f9a --- /dev/null +++ b/src/content/docs/reference/operators/serve_http.mdx @@ -0,0 +1,103 @@ +--- +title: serve_http +category: Outputs/Events +example: 'serve_http "0.0.0.0:8080" { write_ndjson }' +--- + +import Op from '@components/see-also/Op.astro'; +import Guide from '@components/see-also/Guide.astro'; +import Integration from '@components/see-also/Integration.astro'; + +Starts an HTTP server and streams bytes produced by a nested pipeline to connected clients. + +```tql +serve_http endpoint:string, [max_connections=int, tls=record] { … } +``` + +## Description + +The `serve_http` operator starts an HTTP server and broadcasts the bytes from a +nested pipeline to all connected clients. Input events flow through the nested +pipeline, which must produce bytes such as NDJSON, JSON, or plain text. + +Clients connect with a `GET` request and receive a continuous HTTP response +body. Each client only receives bytes produced after it connects. The operator +does not buffer output for future clients. + +Use the nested pipeline to choose the wire format. For example, +write_ndjson emits `application/x-ndjson`, and write_lines +emits `text/plain`. If the nested pipeline does not set a content type, +`serve_http` falls back to `application/octet-stream`. + +Slow clients may be disconnected when they cannot keep up with the producer. +When the input pipeline finishes, the server closes all active responses and +stops accepting new connections. + +### `endpoint: string` + +The endpoint to listen on. Use `host:port`, `[host]:port`, `http://host:port`, +or `https://host:port`. Use `0.0.0.0` to accept connections on all interfaces. + +### `max_connections = int (optional)` + +The maximum number of simultaneous client connections to accept. Additional +connections wait until a slot becomes available. + +Defaults to `128`. + +import TLSOptions from '@partials/operators/TLSOptions.mdx'; + + + +### `{ … }` + +The nested pipeline that serializes input events into bytes. It must produce +bytes as output, for example `{ write_ndjson }`, `{ write_json }`, or +`{ write_lines }`. + +## Examples + +### Stream NDJSON over HTTP + +```tql +export +serve_http "0.0.0.0:8080" { + write_ndjson +} +``` + +Connect with `curl`: + +```bash +curl http://localhost:8080/ +``` + +### Stream plain text lines + +```tql +export +serve_http "0.0.0.0:8080" { + write_lines +} +``` + +### Serve over HTTPS + +```tql +export +serve_http "0.0.0.0:8443", + tls={ + certfile: "/path/to/cert.pem", + keyfile: "/path/to/key.pem", + } { + write_ndjson +} +``` + +## See Also + +- accept_http +- serve_tcp +- to_http +- routing/expose-data-as-server +- http diff --git a/src/content/docs/reference/operators/serve_tcp.mdx b/src/content/docs/reference/operators/serve_tcp.mdx new file mode 100644 index 000000000..f87c751b9 --- /dev/null +++ b/src/content/docs/reference/operators/serve_tcp.mdx @@ -0,0 +1,75 @@ +--- +title: serve_tcp +category: Outputs/Events +example: 'serve_tcp "0.0.0.0:8090" { write_json }' +--- + +import Op from '@components/see-also/Op.astro'; +import Integration from '@components/see-also/Integration.astro'; + +Listens for incoming TCP connections and sends events to all connected clients. + +```tql +serve_tcp endpoint:string, [max_connections=int, tls=record] { … } +``` + +## Description + +The `serve_tcp` operator starts a TCP server on the given endpoint and +broadcasts pipeline output to all connected clients. Input events are run +through a nested pipeline that must produce bytes (e.g., `{ write_json }`). + +Clients that connect receive the serialized output as a continuous byte stream. +Clients that disconnect or fail to keep up are dropped with a warning. + +### `endpoint: string` + +The endpoint to listen on. Must be of the form `[tcp://]:`. Use +`0.0.0.0` as the host to accept connections on all interfaces. + +import TLSOptions from '@partials/operators/TLSOptions.mdx'; + + + +### `max_connections = int (optional)` + +The maximum number of simultaneous client connections to accept. Additional +connections beyond this limit are rejected. + +Defaults to `128`. + +### `{ … }` + +The pipeline to serialize input events into bytes. Must produce bytes as output, +for instance `{ write_json }` or `{ write_csv }`. + +## Examples + +### Serve JSON to all connected TCP clients + +```tql +export +serve_tcp "0.0.0.0:8090" { write_json } +``` + +Connect with: + +```bash +nc localhost 8090 +``` + +### Serve with TLS + +```tql +export +serve_tcp "0.0.0.0:8443", tls={certfile: "cert.pem", keyfile: "key.pem"} { + write_json +} +``` + +## See Also + +- accept_tcp +- from_tcp +- to_tcp +- tcp diff --git a/src/content/docs/reference/operators/to_http.mdx b/src/content/docs/reference/operators/to_http.mdx new file mode 100644 index 000000000..5dfa58c41 --- /dev/null +++ b/src/content/docs/reference/operators/to_http.mdx @@ -0,0 +1,181 @@ +--- +title: to_http +category: Outputs/Events +example: 'to_http "https://example.com/webhook"' +--- + +import Op from '@components/see-also/Op.astro'; +import Guide from '@components/see-also/Guide.astro'; +import Integration from '@components/see-also/Integration.astro'; + +Sends events as HTTP requests to a webhook or API endpoint. + +```tql +to_http url:string, [method=string, body=record|string|blob, encode=string, + headers=record, paginate=string, paginate_delay=duration, + parallel=int, tls=record, connection_timeout=duration, + max_retry_count=int, retry_delay=duration] +``` + +## Description + +The `to_http` operator sends each input event as an HTTP request to a webhook or +API endpoint. By default, it JSON-encodes the entire event as the request body +and sends it as a POST request. + +The operator is fire-and-forget: non-success HTTP status codes do not cause +pipeline errors. + +### `url: string` + +URL to send the request to. This is an expression evaluated per event, so you +can use field values to construct the URL dynamically. + +### `method = string (optional)` + +One of the following HTTP methods to use: + +- `get` +- `head` +- `post` +- `put` +- `del` +- `connect` +- `options` +- `trace` + +Defaults to `post`. + +### `body = blob|record|string (optional)` + +Body to send with the HTTP request. + +If the value is a `record`, then the body is encoded according to the `encode` +option and an appropriate `Content-Type` is set for the request. + +If not specified, the entire input event is JSON-encoded as the request body. + +### `encode = string (optional)` + +Specifies how to encode `record` bodies. Supported values: + +- `json` +- `form` + +Defaults to `json`. + +### `headers = record (optional)` + +Record of headers to send with the request. This is an expression evaluated per +event, so you can use field values. + +### `paginate = string (optional)` + +The string `"link"` to automatically follow pagination links in the HTTP `Link` +response header per +[RFC 8288](https://datatracker.ietf.org/doc/html/rfc8288). The operator parses +`Link` headers and follows the `rel=next` relation to fetch the next page. +Pagination stops when the response no longer contains a `rel=next` link or when +a non-success status code is received. + +### `paginate_delay = duration (optional)` + +The duration to wait between consecutive pagination requests. + +Defaults to `0s`. + +### `parallel = int (optional)` + +Maximum number of requests that can be in progress at any time. + +Defaults to `1`. + +import TLSOptions from '@partials/operators/TLSOptions.mdx'; + + + +### `connection_timeout = duration (optional)` + +Timeout for the connection. + +Defaults to `5s`. + +### `max_retry_count = int (optional)` + +The maximum times to retry a failed request. Every request has its own retry +count. + +Defaults to `0`. + +### `retry_delay = duration (optional)` + +The duration to wait between each retry. + +Defaults to `1s`. + +## Examples + +### Send events to a webhook + +Send each event as a JSON POST request: + +```tql +from {message: "hello", severity: "info"} +to_http "https://example.com/webhook" +``` + +The entire event is JSON-encoded as the request body. + +### Use a custom body + +Override the default body with a string: + +```tql +from {foo: "bar"} +to_http "https://example.com/api", body="custom-payload" +``` + +### Send form-encoded data + +```tql +from {user: "alice"} +to_http "https://example.com/api", + body={name: "alice", role: "admin"}, + encode="form" +``` + +### Set a custom method and headers + +```tql +from {foo: "bar"} +to_http "https://example.com/api", + method="put", + headers={"X-Custom": "value"} +``` + +### Send events with TLS + +```tql +from {data: "sensitive"} +to_http "https://secure.example.com/api", + tls={skip_peer_verification: true} +``` + +### Send requests in parallel + +Increase throughput by sending multiple requests concurrently: + +```tql +load_file "events.json" +read_json +to_http "https://example.com/ingest", parallel=4 +``` + +## See Also + +- from_http +- http +- accept_http +- serve_http +- collecting/fetch-via-http-and-apis +- http diff --git a/src/content/docs/reference/operators/to_opensearch.mdx b/src/content/docs/reference/operators/to_opensearch.mdx index 433e8588d..f4a4545e3 100644 --- a/src/content/docs/reference/operators/to_opensearch.mdx +++ b/src/content/docs/reference/operators/to_opensearch.mdx @@ -108,6 +108,7 @@ to_opensearch "localhost:9200", action="create", index="main" ## See Also +- accept_opensearch - from_opensearch - opensearch - elasticsearch diff --git a/src/content/docs/reference/operators/to_tcp.mdx b/src/content/docs/reference/operators/to_tcp.mdx new file mode 100644 index 000000000..8a54dccc5 --- /dev/null +++ b/src/content/docs/reference/operators/to_tcp.mdx @@ -0,0 +1,60 @@ +--- +title: to_tcp +category: Outputs/Events +example: 'to_tcp "collector.example.com:5044" { write_json }' +--- + +import Op from '@components/see-also/Op.astro'; +import Integration from '@components/see-also/Integration.astro'; + +Connects to a remote TCP or TLS endpoint and sends events. + +```tql +to_tcp endpoint:string, [tls=record] { … } +``` + +## Description + +Connects to the specified TCP endpoint as a client and writes serialized events +to the connection. Input events are run through a nested pipeline that must +produce bytes (e.g., `{ write_json }`). + +If the connection fails, the operator reconnects automatically with exponential +backoff. + +### `endpoint: string` + +The remote endpoint to connect to. Must be of the form +`[tcp://]:`. + +import TLSOptions from '@partials/operators/TLSOptions.mdx'; + + + +### `{ … }` + +The pipeline to serialize input events into bytes. Must produce bytes as output, +for instance `{ write_json }` or `{ write_csv }`. + +## Examples + +### Send JSON to a remote server + +```tql +export +to_tcp "collector.example.com:5044" { write_json } +``` + +### Send with TLS + +```tql +export +to_tcp "collector.example.com:5044", tls={} { write_json } +``` + +## See Also + +- from_tcp +- serve_tcp +- accept_tcp +- tcp diff --git a/src/content/docs/reference/operators/write_bitz.mdx b/src/content/docs/reference/operators/write_bitz.mdx index 0e8825903..82b05d3d5 100644 --- a/src/content/docs/reference/operators/write_bitz.mdx +++ b/src/content/docs/reference/operators/write_bitz.mdx @@ -24,6 +24,8 @@ write-once-read-many use cases. Internally, BITZ uses Arrow's IPC format for serialization and deserialization, but prefixes each message with a 64 bit size prefix to support changing schemas between batches—something that Arrow's IPC format does not support on its own. +Use write_bitz when you want to persist events in a compact columnar +format and later load them again with read_bitz. ## See Also @@ -31,3 +33,4 @@ between batches—something that Arrow's IPC format does not support on its own. - to_hive - write_feather - write_parquet +- parsing/parse-binary-data diff --git a/src/content/docs/reference/operators/write_feather.mdx b/src/content/docs/reference/operators/write_feather.mdx index 27413e6bd..9a7a3683f 100644 --- a/src/content/docs/reference/operators/write_feather.mdx +++ b/src/content/docs/reference/operators/write_feather.mdx @@ -27,8 +27,9 @@ Defaults to the compression type's default compression level. ### `compression_type = str (optional)` -Supported options are `zstd` for [Zstandard][zstd-docs] compression -and `lz4` for [LZ4 Frame][lz4-docs] compression. +Supported options are `uncompressed` to disable compression, `zstd` for +[Zstandard][zstd-docs] compression, and `lz4` for [LZ4 Frame][lz4-docs] +compression. [zstd-docs]: http://facebook.github.io/zstd/ [lz4-docs]: https://android.googlesource.com/platform/external/lz4/+/HEAD/doc/lz4_Frame_format.md diff --git a/src/content/docs/reference/operators/write_pcap.mdx b/src/content/docs/reference/operators/write_pcap.mdx index 1988cbb45..a2bfddf93 100644 --- a/src/content/docs/reference/operators/write_pcap.mdx +++ b/src/content/docs/reference/operators/write_pcap.mdx @@ -4,7 +4,7 @@ category: Printing example: 'write_pcap' --- -Transforms event stream to PCAP byte stream. +Serializes packet events as a PCAP byte stream. ```tql write_pcap @@ -12,9 +12,14 @@ write_pcap ## Description -Transforms event stream to [PCAP][pcap-rfc] byte stream. +The `write_pcap` operator transforms packet events into a [PCAP][pcap-rfc] byte stream. [pcap-rfc]: https://datatracker.ietf.org/doc/id/draft-gharris-opsawg-pcap-00.html +[pcapng-rfc]: https://www.ietf.org/archive/id/draft-tuexen-opsawg-pcapng-05.html + +The operator accepts `pcap.packet` events. When present, it also uses `pcap.file_header` events emitted by read_pcap to preserve the original timestamp precision and byte order. + +If no `pcap.file_header` event is present, `write_pcap` generates a file header from the first packet's `linktype` and writes timestamps with nanosecond precision. The structured representation of packets has the `pcap.packet` schema: @@ -22,30 +27,37 @@ The structured representation of packets has the `pcap.packet` schema: pcap.packet: record: - linktype: uint64 - - time: - timestamp: time + - timestamp: time - captured_packet_length: uint64 - original_packet_length: uint64 - - data: string + - data: blob ``` :::note[PCAPNG] The current implementation does _not_ support [PCAPNG][pcapng-rfc]. ::: -[pcapng-rfc]: https://www.ietf.org/archive/id/draft-tuexen-opsawg-pcapng-05.html - ## Examples -### Write packet events as a PCAP file +### Write a live capture to a PCAP file ```tql -subscribe "packets" +from_nic "en1" write_pcap save_file "/logs/packets.pcap" ``` +### Round-trip a PCAP file while preserving its file header + +```tql +from_file "/tmp/trace.pcap" { + read_pcap emit_file_headers=true +} +write_pcap +save_file "/tmp/trace-copy.pcap" +``` + ## See Also -- load_nic +- from_nic - read_pcap diff --git a/src/content/docs/reference/operators/write_tql.mdx b/src/content/docs/reference/operators/write_tql.mdx index b3e3aa9fc..5c256f2e9 100644 --- a/src/content/docs/reference/operators/write_tql.mdx +++ b/src/content/docs/reference/operators/write_tql.mdx @@ -100,5 +100,6 @@ write_tql strip_null_fields=true ## See Also +- read_tql - write_json - map-data-to-ocsf diff --git a/src/content/docs/reference/operators/yara.md b/src/content/docs/reference/operators/yara.md index 0eebe37d4..fbb5d011f 100644 --- a/src/content/docs/reference/operators/yara.md +++ b/src/content/docs/reference/operators/yara.md @@ -1,60 +1,50 @@ --- title: yara category: Detection -example: 'yara "/path/to/rules", blockwise=true' +example: 'yara "/path/to/rules"' --- Executes YARA rules on byte streams. ```tql -yara rule:list, [blockwise=bool, compiled_rules=bool, fast_scan=bool] +yara rule:string|list, [compiled_rules=bool, fast_scan=bool] ``` ## Description The `yara` operator applies [YARA](https://virustotal.github.io/yara/) rules to -an input of bytes, emitting rule context upon a match. +an input of bytes and emits rule context for each match. ![YARA Operator](yara-operator.svg) We modeled the operator after the official [`yara` command-line utility](https://yara.readthedocs.io/en/stable/commandline.html) to enable a -familiar experience for the command users. Similar to the official `yara` -command, the operator compiles the rules by default, unless you provide the -option `compiled_rules=true`. To quote from the above link: +familiar experience for command-line users. Similar to the official `yara` +command, the operator compiles the rules by default unless you provide the +`compiled_rules=true` option. To quote from the above link: > This is a security measure to prevent users from inadvertently using compiled > rules coming from a third-party. Using compiled rules from untrusted sources > can lead to the execution of malicious code in your computer. -The operator uses a YARA _scanner_ under the hood that buffers blocks of bytes -incrementally. Even though the input arrives in non-contiguous blocks of -memories, the YARA scanner engine support matching across block boundaries. For -continuously running pipelines, use the `blockwise=true` option that considers each -block as a separate unit. Otherwise the scanner engine would simply accumulate -blocks but never trigger a scan. +The operator scans the entire logical input as one contiguous byte sequence. +It buffers the full input in memory and runs the YARA scan when the input ends. +This lets matches span chunk boundaries, but it also means the operator is only +suitable for finite byte streams. -### `rule: list` +### `rule: string | list` -The path to the YARA rule(s). +The path to one YARA rule or a list of rule paths. -If the path is a directory, the operator attempts to recursively add all +If a path is a directory, the operator attempts to recursively add all contained files as YARA rules. -### `blockwise = bool (optional)` - -Whether to match on every byte chunk instead of triggering a scan when the input -exhausted. - -This option makes sense for never-ending dataflows where each chunk of bytes -constitutes a self-contained unit, such as a single file. - ### `compiled_rules = bool (optional)` Whether to interpret the rules as compiled. -When providing this flag, you must exactly provide one rule path as positional -argument. +When you provide this flag, you must provide exactly one rule path as the +positional argument. ### `fast_scan = bool (optional)` @@ -62,24 +52,28 @@ Enable fast matching mode. ## Examples -The examples below show how you can scan a single file and how you can create a -simple rule scanning service. +The example below shows how you can scan a file with YARA rules. -### Perform one-shot scanning of files +### Scan a file Scan a file with a set of YARA rules: ```tql -load_file "evil.exe", mmap=true -yara "rule.yara" +from_file "evil.exe", mmap=true { + yara "rule.yara" +} ``` -:::note[Memory Mapping Optimization] -The `mmap` flag is merely an optimization that constructs a single chunk of -bytes instead of a contiguous stream. Without `mmap=true`, -[`load_file`](/reference/operators/load_file) generates a stream of byte chunks and feeds them -incrementally to the `yara` operator. This also works, but performance is better -due to memory locality when using `mmap`. +:::note[Memory mapping optimization] +When reading from a local file, `from_file ..., mmap=true` uses `mmap(2)` so +`yara` can scan one contiguous chunk without an extra copy. Without +`mmap=true`, `from_file` may deliver multiple chunks; `yara` still works +because it buffers and joins the full input before scanning. +::: + +:::caution[Finite inputs only] +`yara` waits for the end of input before it emits any matches. Don't use it on +never-ending byte streams. ::: Let's unpack a concrete example: @@ -142,5 +136,5 @@ You will get one `yara.match` per matching rule: } ``` -Each match has a `rule` field describing the rule and a `matches` record +Each match has a `rule` field that describes the rule and a `matches` record indexed by string identifier to report a list of matches per rule string. diff --git a/src/partials/integrations/OpenSearch.mdx b/src/partials/integrations/OpenSearch.mdx index 7a9606b8b..6cf52286b 100644 --- a/src/partials/integrations/OpenSearch.mdx +++ b/src/partials/integrations/OpenSearch.mdx @@ -15,8 +15,8 @@ and `send_timeout` options. For more details, see the documentation for the [`to_opensearch`](/reference/operators/to_opensearch) operator. -Tenzir can also present an {props.Name}-compatible REST API via the -[`from_opensearch`](/reference/operators/from_opensearch) operator. +Tenzir can also present a {props.Name}-compatible REST API via the +[`accept_opensearch`](/reference/operators/accept_opensearch) operator. ## Examples @@ -59,16 +59,17 @@ Bulk API endpoint. This allows you to point your Logstash or Beats instances to Tenzir instead. export const emulate = ` -from "NAME://localhost:9200", keep_actions=true +accept_opensearch "0.0.0.0:9200", keep_actions=true publish "NAME" `; -This pipeline accepts data on port 9200 and publishes all received events on the +This pipeline accepts data on port 9200 and publishes all received events on +the {props.Name.toLowerCase()} topic for further processing by +other pipelines. -{props.Name.toLowerCase()} topic for further processing by other -pipelines. +Use `accept_opensearch` for new pipelines that receive bulk ingestion data. Setting `keep_actions=true` causes command events to remain in the stream, e.g., like this: diff --git a/src/partials/operators/FromFileCommonParams.mdx b/src/partials/operators/FromFileCommonParams.mdx index bb1024e03..17c868a00 100644 --- a/src/partials/operators/FromFileCommonParams.mdx +++ b/src/partials/operators/FromFileCommonParams.mdx @@ -42,3 +42,17 @@ current time. Files older than this duration will be skipped. Pipeline to use for parsing the file. By default, this pipeline is derived from the path of the file, and will not only handle parsing but also decompression if applicable. + +Inside the subpipeline, the `$file` variable is available as a record with the +following fields: + +| Field | Type | Description | +| :------ | :------- | :--------------------------------------- | +| `path` | `string` | The absolute path of the file being read | +| `mtime` | `time` | The last modification time of the file | + +For example, to add the source path to each event: + +```tql +source = $file.path +``` diff --git a/src/partials/operators/HTTPClientOptions.mdx b/src/partials/operators/HTTPClientOptions.mdx index f098dda06..0fac98cef 100644 --- a/src/partials/operators/HTTPClientOptions.mdx +++ b/src/partials/operators/HTTPClientOptions.mdx @@ -31,7 +31,9 @@ Defaults to `json`. ### `headers = record (optional)` -Record of headers to send with the request. +Record of headers to send with the request. Each value is resolved as a +[secret](/explanations/secrets), so you can pass secret names to avoid +hardcoding tokens or API keys directly in the pipeline. ### `paginate = record -> string | string (optional)` diff --git a/src/sidebar.ts b/src/sidebar.ts index b431bab01..5bbaefa7e 100644 --- a/src/sidebar.ts +++ b/src/sidebar.ts @@ -148,6 +148,7 @@ export const guides = [ collapsed: true, items: [ "guides/routing/send-to-destinations", + "guides/routing/expose-data-as-server", "guides/routing/split-and-merge-streams", "guides/routing/load-balance-pipelines", ], @@ -388,6 +389,7 @@ export const integrations = [ "integrations/clickhouse", "integrations/elasticsearch", "integrations/graylog", + "integrations/mysql", "integrations/opensearch", "integrations/snowflake", "integrations/splunk",