The Object Storage destination writes structured pipeline output as files to S3, Google Cloud Storage, or Cloudflare R2. It supports JSON (NDJSON) and Parquet formats with optional compression and time-based partitioning.
Configuration
destination:
type: "object_storage"
provider: "s3"
bucket: "my-bucket"
base_path: "pipeline-data"
format: "json"
compression: "gzip"
partition_by:
- "hour"
batch_size: 10000
batch_interval_ms: 60000
region: "us-east-1"
access_key_id: "${AWS_ACCESS_KEY_ID}"
secret_access_key: "${AWS_SECRET_ACCESS_KEY}"
Fields
| Field | Type | Required | Default | Description |
|---|
provider | string | yes | — | s3, gcs, or r2 |
bucket | string | yes | — | Bucket name |
base_path | string | no | — | Prefix path within bucket |
format | string | no | json | json (NDJSON) or parquet |
compression | string | no | none | none, gzip, snappy, or zstd |
partition_by | list | no | ["hour"] | Time partitioning granularity: hour or day |
batch_size | int | no | 10,000 | Records per file |
batch_interval_ms | long | no | 60,000 | Max time (ms) between file writes |
region | string | yes (S3/R2) | — | AWS region |
endpoint | string | yes (R2) | — | S3-compatible endpoint URL |
access_key_id | string | yes | — | Access key |
secret_access_key | string | yes | — | Secret key |
File Layout
Files are written using the following path structure:
s3://{bucket}/{base_path}/{project}/{stream_name}/year=YYYY/month=MM/day=DD/hour=HH/{file_id}.{ext}
The file ID uses the format {partition}-{minOffset}_{maxOffset}-{uuid}, which enables deduplication on retry. If a write is retried, the same file ID is produced, overwriting the previous attempt rather than creating a duplicate.
File Extensions
| Format | Compression | Extension |
|---|
| json | none | .json |
| json | gzip | .json.gz |
| json | snappy | .json.snappy |
| json | zstd | .json.zst |
| parquet | any | .parquet |
For Parquet files, compression (snappy or zstd) is applied natively by the Parquet writer and does not change the file extension. For JSON files, compression is applied as a stream wrapper around the output.