Skip to main content
The Object Storage destination writes structured pipeline output as files to S3, Google Cloud Storage, or Cloudflare R2. It supports JSON (NDJSON) and Parquet formats with optional compression and time-based partitioning.

Configuration

destination:
  type: "object_storage"
  provider: "s3"
  bucket: "my-bucket"
  base_path: "pipeline-data"
  format: "json"
  compression: "gzip"
  partition_by:
    - "hour"
  batch_size: 10000
  batch_interval_ms: 60000
  region: "us-east-1"
  access_key_id: "${AWS_ACCESS_KEY_ID}"
  secret_access_key: "${AWS_SECRET_ACCESS_KEY}"

Fields

FieldTypeRequiredDefaultDescription
providerstringyess3, gcs, or r2
bucketstringyesBucket name
base_pathstringnoPrefix path within bucket
formatstringnojsonjson (NDJSON) or parquet
compressionstringnononenone, gzip, snappy, or zstd
partition_bylistno["hour"]Time partitioning granularity: hour or day
batch_sizeintno10,000Records per file
batch_interval_mslongno60,000Max time (ms) between file writes
regionstringyes (S3/R2)AWS region
endpointstringyes (R2)S3-compatible endpoint URL
access_key_idstringyesAccess key
secret_access_keystringyesSecret key

File Layout

Files are written using the following path structure:
s3://{bucket}/{base_path}/{project}/{stream_name}/year=YYYY/month=MM/day=DD/hour=HH/{file_id}.{ext}
The file ID uses the format {partition}-{minOffset}_{maxOffset}-{uuid}, which enables deduplication on retry. If a write is retried, the same file ID is produced, overwriting the previous attempt rather than creating a duplicate.

File Extensions

FormatCompressionExtension
jsonnone.json
jsongzip.json.gz
jsonsnappy.json.snappy
jsonzstd.json.zst
parquetany.parquet
For Parquet files, compression (snappy or zstd) is applied natively by the Parquet writer and does not change the file extension. For JSON files, compression is applied as a stream wrapper around the output.