Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
163 changes: 82 additions & 81 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,25 @@
Redpanda Connect
================
# Redpanda Connect

[![Build Status][actions-badge]][actions-url]
[![Apache V2 API][godoc-badge]][godoc-url-apache]
[![Enterprise API][godoc-badge]][godoc-url-enterprise]

API for Apache V2 builds: [![godoc for redpanda-data/connect ASL][godoc-badge]][godoc-url-apache]
Redpanda Connect is a stream processor that moves data between a wide range of [sources][inputs] and [sinks][outputs], with support for [hydration, enrichment, transformation, and filtering][processors] along the way.
Comment thread
prakhargarg105 marked this conversation as resolved.

API for Enterprise builds: [![godoc for redpanda-data/connect RCL][godoc-badge]][godoc-url-enterprise]
That includes a rich set of change-data-capture (CDC) connectors — for Postgres, MySQL, MongoDB, Oracle, MSSQL, and more — so database changes can flow through your pipelines as first-class events.

Redpanda Connect is a high performance and resilient stream processor, able to connect various [sources][inputs] and [sinks][outputs] in a range of brokering patterns and perform [hydration, enrichments, transformations and filters][processors] on payloads.
It uses [Bloblang][bloblang-about] for mapping, runs as a single static binary or container image, and is easy to operate and monitor.

It comes with a [powerful mapping language][bloblang-about], is easy to deploy and monitor, and ready to drop into your pipeline either as a static binary or docker image, making it cloud native as heck.
## Highlights

Redpanda Connect is declarative, with stream pipelines defined in as few as a single config file, allowing you to specify connectors and a list of processing stages:
- **Declarative pipelines** — a stream topology fits in a single YAML file.
- **At-least-once delivery by default** — in-process transactions, no disk state required.
- **A large connector catalog** — cloud services, message brokers, databases, HTTP, and more.
- **First-class CDC** — change-data-capture connectors for Postgres, MySQL, MongoDB, Oracle, and MSSQL.
- **Bloblang** — a mapping language designed for stream data.
- **Cloud-friendly** — stateless and horizontally scalable, with metrics and tracing built in.

## Example

```yaml
input:
Expand All @@ -33,162 +41,155 @@ output:
max_in_flight: 20
```

### !NEW! Check Out the Latest AI Goodies

[Claude Plugin for Redpanda Connect Configs](./.claude-plugin/README.md)

MCP Demo:

[![MCP Demo](https://img.youtube.com/vi/JhF8HMpVmus/0.jpg)](https://www.youtube.com/watch?v=JhF8HMpVmus)

Agentic AI Demo:

[![Agentic AI Demo](https://img.youtube.com/vi/oi8qgtTqQRU/0.jpg)](https://www.youtube.com/watch?v=oi8qgtTqQRU)
## Quickstart

### Delivery Guarantees
### Install

Delivery guarantees [can be a dodgy subject](https://youtu.be/QmpBOCvY8mY). Redpanda Connect processes and acknowledges messages using an in-process transaction model with no need for any disk persisted state, so when connecting to at-least-once sources and sinks it's able to guarantee at-least-once delivery even in the event of crashes, disk corruption, or other unexpected server faults.

This behaviour is the default and free of caveats, which also makes deploying and scaling Redpanda Connect much simpler.

## Supported Sources & Sinks

AWS (DynamoDB, Kinesis, S3, SQS, SNS), Azure (Blob storage, Queue storage, Table storage), GCP (Pub/Sub, Cloud storage, Big query), Kafka, NATS (JetStream, Streaming), NSQ, MQTT, AMQP 0.91 (RabbitMQ), AMQP 1, Redis (streams, list, pubsub, hashes), Cassandra, Elasticsearch, HDFS, HTTP (server and client, including websockets), MongoDB, SQL (MySQL, PostgreSQL, Clickhouse, MSSQL), and [you know what just click here to see them all, they don't fit in a README][about-categories].

## Documentation

If you want to dive fully into Redpanda Connect then don't waste your time in this dump, check out the [documentation site][general-docs].

For guidance on building your own custom plugins in Go check out [the public APIs](https://pkg.go.dev/github.com/redpanda-data/benthos/v4/public/service).

## Install

Install on Linux:
Linux:

```shell
curl -LO https://github.com/redpanda-data/redpanda/releases/latest/download/rpk-linux-amd64.zip
unzip rpk-linux-amd64.zip -d ~/.local/bin/
```

Or use Homebrew:
macOS (Homebrew):

```shell
brew install redpanda-data/tap/redpanda
```

Or pull the docker image:
Docker:

```shell
docker pull docker.redpanda.com/redpandadata/connect
```

For more information check out the [getting started guide][getting-started].
See the [getting started guide][getting-started] for more options.

## Run
### Run

```shell
rpk connect run ./config.yaml
```

Or, with docker:
With Docker:

```shell
# Using a config file
# From a config file
docker run --rm -v /path/to/your/config.yaml:/connect.yaml docker.redpanda.com/redpandadata/connect run

# Using a series of -s flags
# With inline overrides
docker run --rm -p 4195:4195 docker.redpanda.com/redpandadata/connect run \
-s "input.type=http_server" \
-s "output.type=kafka" \
-s "output.kafka.addresses=kafka-server:9092" \
-s "output.kafka.topic=redpanda_topic"
```

## Monitoring
## Connectors

The catalog includes AWS (DynamoDB, Kinesis, S3, SQS, SNS), Azure (Blob, Queue, Table), GCP (Pub/Sub, Cloud Storage, BigQuery), Kafka, NATS (JetStream, Streaming), NSQ, MQTT, AMQP 0.91 (RabbitMQ), AMQP 1, Redis, Cassandra, Elasticsearch, HDFS, HTTP (server, client, websockets), MongoDB, and SQL (MySQL, PostgreSQL, ClickHouse, MSSQL) — and a lot more in the [components documentation][about-categories].

## Delivery guarantees

Delivery guarantees [can be a tricky subject](https://youtu.be/QmpBOCvY8mY). Redpanda Connect processes and acknowledges messages using an in-process transaction model with no disk-persisted state, so when it's connecting at-least-once sources and sinks it can guarantee at-least-once delivery — even through crashes, disk corruption, or other server faults.

That's the default, with no caveats, which keeps deployment and scaling straightforward.

## Observability

### Health checks

### Health Checks
Two HTTP endpoints are exposed for orchestration probes:

Redpanda Connect serves two HTTP endpoints for health checks:
- `/ping` can be used as a liveness probe as it always returns a 200.
- `/ready` can be used as a readiness probe as it serves a 200 only when both the input and output are connected, otherwise a 503 is returned.
- `/ping` — liveness probe; always returns 200.
- `/ready` — readiness probe; returns 200 once both input and output are connected, otherwise 503.

### Metrics

Redpanda Connect [exposes lots of metrics][metrics] either to Statsd, Prometheus, a JSON HTTP endpoint, [and more][metrics].
Redpanda Connect [exposes metrics][metrics] to Statsd, Prometheus, a JSON HTTP endpoint, and [other backends][metrics].

### Tracing

Redpanda Connect also [emits open telemetry tracing events][tracers], which can be used to visualise the processors within a pipeline.
OpenTelemetry traces are [emitted natively][tracers], so you can visualize what's happening inside a pipeline end-to-end.

## Configuration

Redpanda Connect provides lots of tools for making configuration discovery, debugging and organisation easy. You can [read about them here][config-doc].
Redpanda Connect ships with tooling for configuration discovery, debugging, and organization — see the [configuration guide][config-doc].

## Build
## Documentation

- [General documentation][general-docs]
- [Bloblang language guide][bloblang-about]
- [Public Go APIs](https://pkg.go.dev/github.com/redpanda-data/benthos/v4/public/service) for building custom plugins

## Build from source

Build with Go (any [currently supported version](https://go.dev/dl/)):
Requires a [currently supported Go version](https://go.dev/dl/):

```shell
git clone git@github.com:redpanda-data/connect
cd connect
task build:all
```

## Formatting and Linting

Redpanda Connect uses [golangci-lint][golangci-lint] for formatting and linting.

- `task fmt` to format the codebase,
- `task lint` to lint the codebase.
### Plugins with external dependencies

Configure your editor to use `gofumpt` as a formatter, see the instructions for different editors [here](https://github.com/mvdan/gofumpt#installation).

## Plugins

It's pretty easy to write your own custom plugins for Redpanda Connect in Go, for information check out [the API docs][godoc-url], and for inspiration there's an [example repo][plugin-repo] demonstrating a variety of plugin implementations.

## Extra Plugins

By default Redpanda Connect does not build with components that require linking to external libraries, such as the `zmq4` input and outputs. If you wish to build Redpanda Connect locally with these dependencies then set the build tag `x_benthos_extra`:
Components that link against external C libraries (for example `zmq4`) aren't included by default. To pull them in, set the `x_benthos_extra` build tag:

```shell
# With go
go install -tags "x_benthos_extra" github.com/redpanda-data/connect/v4/cmd/redpanda-connect@latest

# Using task
# With task
TAGS=x_benthos_extra task build:all
```

Note that this tag may change or be broken out into granular tags for individual components outside of major version releases. If you attempt a build and these dependencies are not present you'll see error messages such as `ld: library not found for -lzmq`.
This tag may change or be split into more granular tags in future releases. If the required system libraries aren't installed, the build will fail with an error like `ld: library not found for -lzmq`.

## Docker Builds
### Docker image

There's a multi-stage `Dockerfile` for creating a Redpanda Connect docker image which results in a minimal image from scratch. You can build it with:
A multi-stage `Dockerfile` builds a minimal scratch-based image:

```shell
task docker:all
```

Then use the image:

```shell
docker run --rm \
-v /path/to/your/benthos.yaml:/config.yaml \
-v /tmp/data:/data \
-p 4195:4195 \
docker.redpanda.com/redpandadata/connect run /config.yaml
-v /path/to/your/config.yaml:/config.yaml \
-v /tmp/data:/data \
-p 4195:4195 \
docker.redpanda.com/redpandadata/connect run /config.yaml
```

## Custom plugins

Writing your own plugins in Go is straightforward — check out the [API docs][godoc-url] and the [example plugin repository][plugin-repo] for reference implementations.

## Development

Redpanda Connect uses [golangci-lint][golangci-lint] for linting and `gofumpt` for formatting. You can configure your editor to use `gofumpt` automatically — instructions are [here](https://github.com/mvdan/gofumpt#installation).

```shell
task fmt # format the codebase
task lint # lint the codebase
task test # unit and template tests
```

## Contributing

Contributions are welcome! To prevent CI errors, please always make sure a pull request has been:
Contributions are welcome. Before opening a pull request, please make sure it has been:

- Unit tested with `task test`
- Linted with `task lint`
- Formatted with `task fmt`

Note: most integration tests need to spin up Docker containers, so they are skipped by `task test`. You can trigger
them individually via `go test -run "^Test.*Integration.*$" ./internal/impl/<connector directory>/...`.
Most integration tests spin up Docker containers, so they're skipped by `task test`. You can run them individually with:

```shell
go test -run "^Test.*Integration.*$" ./internal/impl/<connector directory>/...
```

[inputs]: https://docs.redpanda.com/redpanda-connect/components/inputs/about
[about-categories]: https://docs.redpanda.com/redpanda-connect/about#components
Expand Down
Loading