How to Build An HFT Bot on Sonic Using Goldrush Streaming API (Part 2) — Capture, Features & Walk-Forward Backtests

Joseph Appolos

Content Writer

Build a measurable HFT loop on Sonic with GoldRush: capture NDJSON ticks, replay for parity, add features, and run backtests to prepare for live execution.

Introduction

Part 1 provided us with the nerve ending: a clean, once-per-second signal derived from live on-chain prices. Part 2 turns that loop into something you can measure, tune, and defend. We’ll record exactly what the bot saw, replay it through the same code paths, add a few small features to steady the read, and test settings in a way that doesn’t introduce look-ahead bias.

What we’re actually building in Part 2

A capture layer that writes each tick with its timestamp, latencies, etc., to disk in a robust line-by-line format.
A replay engine that feeds those lines back through the same state + signal functions used in live mode, at 1× speed for parity checks or faster for iteration.
A tiny, explainable feature stack (pre/post-fee spread, a short EMA to reduce flicker, a realized-volatility read for noise, and a conservative thin-depth penalty).
A walk-forward backtest that tunes on one slice of time and evaluates on the next, then rolls forward.

In Part 3, we’ll take the tested signal and policy from this part and wire them to live execution on Sonic, such as routing orders to a target DEX, building and submitting transactions with conservative slippage and timeouts, and handling reverts cleanly.

Capture — write what you actually saw

If you can’t replay exactly what the bot saw, you can’t tune it, honestly. Capture gives you the ground truth—the inputs as they arrived—so any improvement you claim later is defensible.

What it is

A lossless log of each tick the bot observed with UTC timestamp, price from venue A, price from venue B, per-pool latency at receipt, and a couple of derived fields (e.g., pre-fee spreads). We store it as NDJSON, where each line is a standalone JSON object, followed by a newline, ensuring that appends are atomic, crashes don’t corrupt the file, and tools can stream large files without loading them into memory. The schema remains stable (with duplicate keys and the same order), and we add a _v field when it changes.

How it is retrieved

Teams typically log ticks in one of three ways: NDJSON, CSV, or SQLite. NDJSON per day is the default for many trading systems because it’s simple, append-only, grep-able, and plays well with pipes. CSV is a standard and easy-to-open format, but it struggles with nested fields and schema changes. SQLite or Parquet add weight upfront, but pay off later for analytics and fast scans.

For this build, we start with NDJSON for a trivial write path and, if needed, batch-convert to Parquet/SQLite after the fact.

Characteristics

UTC only: log ISO-8601 timestamps; never rely on local time.
Atomic writes: one line per tick avoids half-written JSON.
Gap awareness: WebSocket hiccups occur; flag missing minutes so backtests can be labeled or skipped.
Backpressure: flush on an interval; don’t block the stream on disk I/O.
Strict schema: identical keys on every line; introduce a '_v' when adding fields.

How we’ll integrate “Capture” in this guide

Append one NDJSON line per second: {ts, priceA, priceB, latA_ms, latB_ms, spreadAB_bps, spreadBA_bps}.
Rotate by day (ticks-YYYY-MM-DD.ndjson) and create the data/ dir on boot.
Add lightweight gap markers so the backtest can exclude or down-weight choppy ranges.

Replay — run the same code paths offline

You tune what you can reproduce. Replay feeds captured ticks back through the live logic, ensuring parity with reality is provable, not assumed.

What it is

A deterministic player that drives the exact state and signal functions you use in production, not test branches, or alternate maths. At 1× speed, the console heartbeat should match what you saw live (minor print jitter is normal). Once parity holds, you can run faster to iterate.

Modes you’ll use

1× parity: prove capture + decoding are faithful before touching thresholds.
N× accelerated: 5×–10× for quick sweeps during backtests.
Step mode: pause/advance one tick at a time to debug edge cases.

Characteristics

Determinism: same code path, same config; avoid if (replay) logic.
Time math: keep everything in UTC; don’t re-stamp with local time.
No silent changes: schema tweaks in capture cause mismatches; bump _v and handle explicitly.
Synthesize consistently: if downstream expects reserves, derive them from the price the same way every time (e.g., one base unit and price × decimals on the quote side).
Printing traps: Rounding and format differences can appear as logic drift—normalize formats before comparison.
I/O limits: Terminals can bottleneck and throttle prints during accelerated runs.

How we’ll integrate “Capture” in this guide

Stream NDJSON lines in order, feed them into the state and signal, and print the same one-line status.
Start with 1× to verify parity, then accelerate for parameter sweeps.
Maintain a single configuration file (config.yaml) for both live and replay to prevent hidden drift.

Replay: run the same code paths offline

Replay shouldn’t invent a “test” version of your logic. We’ll drive the existing state and signal modules with captured ticks. If a 1× replay prints the same heartbeat you saw live (ignoring timestamp jitter), you’ve earned apples-to-apples comparisons. From there, you can accelerate playback to iterate quickly.

Key Features and Support Tools to enhance bot performance

We’re not switching to heavy ML. We’ll add just enough structure to make the signal calmer without hiding regime shifts, including:

Pre-fee & post-fee spreads (bps) in both directions remain the core.
A short EMA On spread gives recent ticks a little more weight and cuts flicker.
Realized volatility Mid-price returns are a quick indicator of noise; when RV is elevated, be more discerning. (Realized = backward-looking, based on actual moves—unlike implied.)
A thin-depth penalty adds a small haircut when reserves look shallow in v2-style pools—a conservative slippage proxy.

Each of these is visible and auditable. You can point at a tick and say why you would trade—or pass.

Walk-forward: tune without training on the future

Walk-forward validation is a time-series test where you optimize on one window (the training window) and evaluate the next (the testing window), then slide forward and repeat. You keep the time order intact, so nothing in the test window “leaks” into the training step. The result is a chain of out-of-sample results that reflect how a strategy would have performed under changing conditions.

Why it matters

Markets drift. Parameters that worked last week can fail this week. Walk-forward forces you to prove usefulness on unseen data and shows stability over time, not just one lucky backtest. It’s stricter than a single backtest and more realistic than k-fold CV (which breaks time order).

We’ll use walk-forward analysis to select parameters within an in-sample window, evaluate them on the next window, then roll forward and repeat the process. It’s a standard way to keep claims honest on time-ordered data and is widely described in trading literature. The output includes a trades CSV and a summary (hit rate, turnover, PnL).

Where Goldrush fits in Part 2

We’ll continue to use Goldrush as the data plane for this guide. Streaming data stays on our GraphQL over WebSockets for sub-second updates (pairs, OHLCV, wallet activity), while Foundational REST provides historical pulls and backfills. Same schema, same provider, one API key passed via GraphQL connection params—cleaner than juggling multiple services.

Step-by-Step Tutorial for Part 2: Capture, Replay, Features, Policy, and Walk-Forward

Prerequsites

Reuse your Part-1 repo (TypeScript + ts-node).

Add one dependency:

npm install yaml

Extend scripts ( Append to package.json:): These are the new scripts to add on top of the ones created in part 1

// package.json (merge/append)
{
  "scripts": {
    "dev": "ts-node src/index.ts",
    "replay": "ts-node src/replay.ts",
    "backtest": "ts-node src/backtest.ts"
  }
}

1a

Centralize tunables in a file config (config.yaml) - Creating the Config

A config file is the contract between your idea and your results. By moving thresholds, windows, and paths into the config.yaml, you can rerun the same data with different settings without touching code, producing clean diffs, honest comparisons, and fewer “it worked on my machine” moments.

Inputs: None at runtime besides reading config.yaml.

What to expect: Other modules call loadConfig(); you don’t run this file directly.

Create config.yaml (repo root)

This file provides human-edited settings that shape behavior across capture, features, and backtests. It’s read by the loader below and used everywhere else. Paste the content below into config.yaml.

data:
  out_dir: "data"
  tick_file_prefix: "ticks"   # ticks-YYYY-MM-DD.ndjson
  replay_glob: "data/ticks-*.ndjson"

fees:
  poolA_bps: 5
  poolB_bps: 5
  gas_bps: 0

features:
  ema_window: 5         # ticks
  rv_window: 60         # ticks
  depth_floor: 100000   # quote units; below this apply penalty_bps
  penalty_bps: 10

policy:
  spread_threshold_bps: 30
  slippage_bps: 5
  min_notional_quote: 100
  cooldown_ms: 2000

backtest:
  train_days: 3
  test_days: 1

1b

Creating a loader for Config File (src/configFile.ts)

Create the file above and paste the code below in it. A tiny loader that parses config.yaml, gives you typed access and keeps the rest of the app free of ad-hoc env parsing.

This loader reads config.yaml, provides typed access to configuration values, and keeps your application code clean and organized by avoiding ad-hoc environment parsing. NOTE: Ensure you have the PyYAML library installed (pip install pyyaml).

// src/configFile.ts
import fs from 'fs';
import path from 'path';
import YAML from 'yaml';

export type AppConfig = {
  data: { out_dir: string; tick_file_prefix: string; replay_glob: string };
  fees: { poolA_bps: number; poolB_bps: number; gas_bps?: number };
  features: { ema_window: number; rv_window: number; depth_floor: number; penalty_bps: number };
  policy: { spread_threshold_bps: number; slippage_bps: number; min_notional_quote: number; cooldown_ms: number };
  backtest: { train_days: number; test_days: number };
};

export function loadConfig(): AppConfig {
  const p = path.resolve(process.cwd(), 'config.yaml');
  const raw = fs.readFileSync(p, 'utf8');
  const cfg = YAML.parse(raw) as AppConfig;
  return cfg;
}

2a

Capture one NDJSON tick per second (src/capture.ts)

Each second, we record the world as the bot sees it—UTC timestamp, A and B prices, per-pool latency, and pre-fee spreads—so you can later replay, compare, and tune without relying on guesswork. NDJSON keeps writes atomic and streaming simple.

Inputs: Latest snapshot from your live state (state.ts).

What the file does

This system functions as an append-only writer, designed to efficiently log entries without modifying existing data. It incorporates a daily file rotation feature, ensuring that logs are organized and manageable, with new entries being recorded in a fresh file each day.

Additionally, it includes a specialized helper tool that accurately computes pre-fee spreads, providing essential data for financial analysis. Its lightweight architecture is intentional, allowing it to operate seamlessly without causing any bottlenecks or interruptions in the data stream, ensuring smooth and continuous data flow for real-time applications.

Paste the code below in src/capture.ts

// src/capture.ts
import fs from 'fs';
import path from 'path';
import { loadConfig } from './configFile';

const cfg = loadConfig();

function dayFile(ts: Date) {
  const y = ts.getUTCFullYear();
  const m = String(ts.getUTCMonth() + 1).padStart(2, '0');
  const d = String(ts.getUTCDate()).padStart(2, '0');
  return path.join(cfg.data.out_dir, `${cfg.data.tick_file_prefix}-${y}-${m}-${d}.ndjson`);
}

export function ensureDataDir() {
  if (!fs.existsSync(cfg.data.out_dir)) fs.mkdirSync(cfg.data.out_dir, { recursive: true });
}

export type Tick = {
  ts: string;
  priceA: number | null;
  priceB: number | null;
  latA_ms: number | null;
  latB_ms: number | null;
  spreadAB_bps?: number | null;
  spreadBA_bps?: number | null;
};

export function appendTick(t: Tick) {
  const f = dayFile(new Date(t.ts));
  fs.appendFile(f, JSON.stringify(t) + '\n', (err) => {
    if (err) console.error('capture append error:', err);
  });
}

export function preFeeSpreads(priceA: number | null, priceB: number | null) {
  if (priceA == null || priceB == null) return { ab: null, ba: null };
  const ab = (priceB / priceA - 1) * 10_000;
  const ba = (priceA / priceB - 1) * 10_000;
  return { ab, ba };
}

2b

b) Enhancing Live Loop Functionality with Data Capture Hook— File (updated) src/index.ts

To enhance your live loop from Part 1, we will incorporate a capture hook that logs additional data. Specifically, after constructing the snapshot and before outputting it, we will append an NDJSON line to capture and log this data effectively.

Please replace the contents of your src/index.ts file from Part 1 with the code provided below. This updated version retains the same functionality as Part 1 while incorporating the new capture wiring for enhanced data tracking and analysis.

// src/index.ts
import 'dotenv/config';
import { startStream } from './stream';
import { snapshot } from './state';
import { evalSignal } from './signal';
import { ensureDataDir, appendTick, preFeeSpreads } from './capture';

const TICK_MS = 1000;

ensureDataDir();

function printStatus() {
  const snap = snapshot();
  const sig = evalSignal();
  const ts = new Date().toISOString();

  const { ab, ba } = preFeeSpreads(snap.A.price, snap.B.price);
  appendTick({
    ts,
    priceA: snap.A.price ?? null,
    priceB: snap.B.price ?? null,
    latA_ms: snap.A.latencyMs ?? null,
    latB_ms: snap.B.latencyMs ?? null,
    spreadAB_bps: ab,
    spreadBA_bps: ba
  });

  const line =
    `[${ts}] ` +
    `A=${snap.A.price?.toFixed(6) ?? '…'} (lat ${snap.A.latencyMs ?? '…'}ms) | ` +
    `B=${snap.B.price?.toFixed(6) ?? '…'} (lat ${snap.B.latencyMs ?? '…'}ms) | ` +
    (sig.ok
      ? `SIGNAL: ${sig.side} | edge ${sig.spreadBps?.toFixed(1)} bps`
      : `no edge${sig.note ? ` (${sig.note})` : ''}`);

  console.log(line);
}

startStream(() => console.log('Stream connected. Waiting for live prices…'));
const timer = setInterval(printStatus, TICK_MS);

process.on('SIGINT', () => {
  clearInterval(timer);
  console.log('\nShutting down. Bye.');
  process.exit(0);
});

What to expect: data/ticks-YYYY-MM-DD.ndjson appears and grows by one line per second once prices start flowing.

3

Features: spreads, smoothing, noise, depth penalty (src/features.ts)

We retain the core signal (pre- and post-fee spread), add a short EMA to reduce flicker, a realized-volatility read to identify noisy windows, and a thin-depth penalty to account for slippage risk when v2-style reserves appear shallow.

What to expect: Consumed by the policy; you won’t run this directly.

Inputs: Latest prices per tick (from capture or replay) + optional reserve-depth proxy.

File: src/features.ts

This file produces a pure module that tracks minimal state (EMA accumulator, last mid, rolling returns) and emits a feature snapshot each tick.

// src/features.ts
import { loadConfig } from './configFile';

export type FeatureState = { ema_spread_bps?: number; rv_vals?: number[]; last_mid?: number | null; };
export type Features = {
  midA?: number | null; midB?: number | null;
  spreadAB_bps?: number | null; spreadBA_bps?: number | null;
  ema_bps?: number | null; rv_bps?: number | null; penalty_bps?: number;
};

const cfg = loadConfig();

function ema(prev: number | undefined, x: number, n: number) {
  if (!Number.isFinite(x)) return prev;
  if (!prev) return x;
  const k = 2 / (n + 1);
  return prev * (1 - k) + x * k;
}

export function computeFeatures(
  st: FeatureState,
  priceA: number | null,
  priceB: number | null,
  reserveQuoteApprox?: number
): Features {
  const spreadAB = (priceA && priceB) ? (priceB / priceA - 1) * 10_000 : null;
  const spreadBA = (priceA && priceB) ? (priceA / priceB - 1) * 10_000 : null;

  if (spreadAB != null) st.ema_spread_bps = ema(st.ema_spread_bps, Math.abs(spreadAB), cfg.features.ema_window) ?? undefined;

  const mid = (priceA && priceB) ? (priceA + priceB) / 2 : st.last_mid ?? null;
  if (!st.rv_vals) st.rv_vals = [];
  if (mid && st.last_mid) {
    const ret = Math.log(mid / st.last_mid);
    st.rv_vals.push(ret);
    if (st.rv_vals.length > cfg.features.rv_window) st.rv_vals.shift();
  }
  st.last_mid = mid ?? st.last_mid ?? null;

  const rv = st.rv_vals.length >= 2
    ? Math.sqrt(st.rv_vals.reduce((s, r) => s + r * r, 0) / st.rv_vals.length) * 10_000
    : null;

  const penalty = (reserveQuoteApprox && reserveQuoteApprox < cfg.features.depth_floor)
    ? cfg.features.penalty_bps : 0;

  return { midA: priceA, midB: priceB, spreadAB_bps: spreadAB, spreadBA_bps: spreadBA, ema_bps: st.ema_spread_bps ?? null, rv_bps: rv, penalty_bps: penalty };
}

4

Policy: a cautious paper decision + PnL math (src/policy.ts)

The policy is a tiny referee. It examines the post-fee spread, your threshold, a slippage envelope, and optional feature context (EMA, penalty), and decides whether to “trade” or “pass.” When it says “trade,” it emits a paper filled with simple, after-cost PnL in quote units. This keeps decisions explainable and costs explicit.

Inputs: Features + config (fees, slippage, min notional).

What to expect: A boolean decision and, if true, a paper trade record. Not runnable alone.

File: src/policy.ts

Two small functions: decide() returns a boolean + side; paperFill() turns that decision into a consistent PnL calculation used by the backtest.

// src/policy.ts
import { loadConfig } from './configFile';

export type Decision = { take: boolean; side?: 'Buy A / Sell B' | 'Buy B / Sell A'; edge_bps?: number; reason?: string; };
export type PaperTrade = { ts: string; side: 'Buy A / Sell B' | 'Buy B / Sell A'; notional_quote: number; edge_bps: number; slippage_bps: number; fees_bps: number; pnl_quote: number; };

const cfg = loadConfig();

export function decide({
  spreadAB_bps, spreadBA_bps, ema_bps, penalty_bps
}: { spreadAB_bps: number | null; spreadBA_bps: number | null; ema_bps: number | null; penalty_bps?: number | null }): Decision {
  const fees = cfg.fees.poolA_bps + cfg.fees.poolB_bps + (cfg.fees.gas_bps ?? 0);
  const threshold = cfg.policy.spread_threshold_bps + (penalty_bps || 0);

  if (spreadAB_bps != null) {
    const post = spreadAB_bps - fees;
    if (post > threshold && (ema_bps == null || ema_bps > threshold * 0.5)) {
      return { take: true, side: 'Buy A / Sell B', edge_bps: post };
    }
  }
  if (spreadBA_bps != null) {
    const post = spreadBA_bps - fees;
    if (post > threshold && (ema_bps == null || ema_bps > threshold * 0.5)) {
      return { take: true, side: 'Buy B / Sell A', edge_bps: post };
    }
  }
  return { take: false, reason: 'No post-fee edge above threshold' };
}

export function paperFill(ts: string, side: NonNullable<Decision['side']>, edge_bps: number): PaperTrade {
  const slip = cfg.policy.slippage_bps;
  const fees = cfg.fees.poolA_bps + cfg.fees.poolB_bps + (cfg.fees.gas_bps ?? 0);
  const notional = cfg.policy.min_notional_quote;
  const pnl_bps = edge_bps - slip - fees;
  const pnl_quote = (pnl_bps / 10_000) * notional;
  return { ts, side, notional_quote: notional, edge_bps, slippage_bps: slip, fees_bps: fees, pnl_quote };
}

5

Backtest: read NDJSON → features → policy → trades (src/backtest.ts)

The backtest is your first truth check. It replays captured ticks, computes features on the fly, applies the policy, and writes a CSV file of paper trades, along with a compact summary. It’s not “profit or bust”; it’s about seeing trade density, hit rate, and costs under the settings you chose.

Inputs: Files from data/ticks-*.ndjson.

File: src/backtest.ts

This is a single CLI that streams the capture files, runs the same compute path you use live, and emits both granular (trades.csv) and summary outputs. Paste the code below into src/backtest.ts

// src/backtest.ts
import fs from 'fs';
import path from 'path';
import readline from 'readline';
import { loadConfig } from './configFile';
import { computeFeatures, FeatureState } from './features';
import { decide, paperFill, PaperTrade } from './policy';

type Tick = { ts: string; priceA: number|null; priceB: number|null; latA_ms: number|null; latB_ms: number|null; spreadAB_bps?: number|null; spreadBA_bps?: number|null; };
const cfg = loadConfig();

function listFiles(globPattern: string): string[] {
  if (globPattern.includes('*')) {
    const dir = path.dirname(globPattern);
    const prefix = path.basename(globPattern).split('*')[0];
    return fs.readdirSync(dir).filter(f => f.startsWith(prefix)).map(f => path.join(dir, f));
  }
  return [globPattern];
}

async function* readNdjson(file: string): AsyncGenerator<Tick> {
  const rl = readline.createInterface({ input: fs.createReadStream(file), crlfDelay: Infinity });
  for await (const line of rl) {
    if (!line.trim()) continue;
    try { yield JSON.parse(line) as Tick; } catch { /* skip */ }
  }
}

async function run() {
  const files = listFiles(cfg.data.replay_glob).sort();
  const outCsv = path.join(cfg.data.out_dir, 'trades.csv');
  fs.writeFileSync(outCsv, 'ts,side,notional_quote,edge_bps,slippage_bps,fees_bps,pnl_quote\n');

  const trades: PaperTrade[] = [];
  let st: FeatureState = {};

  for (const f of files) {
    for await (const t of readNdjson(f)) {
      const feat = computeFeatures(st, t.priceA, t.priceB);
      const d = decide({
        spreadAB_bps: feat.spreadAB_bps ?? null,
        spreadBA_bps: feat.spreadBA_bps ?? null,
        ema_bps: feat.ema_bps ?? null,
        penalty_bps: feat.penalty_bps ?? 0
      });
      if (d.take && d.side && d.edge_bps != null) {
        const pt = paperFill(t.ts, d.side, d.edge_bps);
        trades.push(pt);
        fs.appendFileSync(outCsv, `${pt.ts},${pt.side},${pt.notional_quote},${pt.edge_bps.toFixed(1)},${pt.slippage_bps},${pt.fees_bps},${pt.pnl_quote.toFixed(2)}\n`);
      }
    }
  }

  const total = trades.reduce((s, x) => s + x.pnl_quote, 0);
  const wins = trades.filter(t => t.pnl_quote > 0).length;
  const hit = trades.length ? (wins / trades.length) * 100 : 0;

  console.log(`Backtest complete.
Trades: ${trades.length}
Hit rate: ${hit.toFixed(1)}%
Total PnL (quote): ${total.toFixed(2)}
Output: ${outCsv}`);
}

run().catch(e => { console.error(e); process.exit(1); });

Run

npm run backtest

What to expect: A console summary (trades count, hit rate, total PnL) and a data/trades.csv you can inspect or chart later. (We’ll add images of the output in a follow-up.)

6

Replay for parity: drive live logic from file (src/replay.ts)

This replayer feeds a capture file through the same state/signal path at about 1 Hz. If logs appear to be live, you can compare settings fairly and step into ticks that behave oddly.

Inputs: One data/ticks-YYYY-MM-DD.ndjson file.

File: src/replay.ts

This file is a tiny CLI that reads NDJSON line by line, synthesizes minimal reserves from the price (so downstream math is identical), and prints the same heartbeat.

// src/replay.ts
import fs from 'fs';
import readline from 'readline';
import { updatePool, snapshot } from './state';
import { evalSignal } from './signal';

function maybeLog() {
  const s = snapshot();
  const sig = evalSignal();
  const ts = new Date().toISOString();
  const line = `[${ts}] A=${s.A.price?.toFixed(6) ?? '…'} (lat ${s.A.latencyMs ?? '…'}ms) | B=${s.B.price?.toFixed(6) ?? '…'} (lat ${s.B.latencyMs ?? '…'}ms) | ` +
    (sig.ok ? `SIGNAL: ${sig.side} | edge ${sig.spreadBps?.toFixed(1)} bps` : `no edge${sig.note ? ` (${sig.note})` : ''}`);
  console.log(line);
}

async function run(file = 'data/ticks-YYYY-MM-DD.ndjson') {
  if (!fs.existsSync(file)) {
    console.error('Provide a capture file: npm run replay -- data/ticks-2025-11-06.ndjson');
    process.exit(1);
  }
  const rl = readline.createInterface({ input: fs.createReadStream(file), crlfDelay: Infinity });
  for await (const line of rl) {
    if (!line.trim()) continue;
    try {
      const t = JSON.parse(line);
      // synthesize reserves from price (1 base token vs price quote)
      if (t.priceA) updatePool('A', BigInt(1e18), BigInt(Math.round(t.priceA * 1e6)) * BigInt(1e12), 0);
      if (t.priceB) updatePool('B', BigInt(1e18), BigInt(Math.round(t.priceB * 1e6)) * BigInt(1e12), 0);
      maybeLog();
      await new Promise(r => setTimeout(r, 1000));
    } catch { /* skip bad lines */ }
  }
}

run(process.argv[2]).catch(e => { console.error(e); process.exit(1); });

Run

npm run replay -- data/ticks-YYYY-MM-DD.ndjson

What to expect: Your usual once-per-second line, but now driven by a file. If it diverges from live for the same period, fix capture/decoding first.

Common errors & troubleshooting

i) Nothing is being captured — Symptom: No data/ticks-*.ndjson file.

Fix: Ensure ensureDataDir() is called and appendTick() is wired in e. Verify printStatus runs every second.

ii) Tick file exists but stays empty — Symptom: 0 B file; console shows prices as ….

Fix: Your stream isn’t delivering prices. Check Goldrush connection + filters in stream.ts. Confirm pools are correct and the API key is passed.

iii) Goldrush auth error (401/403) — Symptom: WebSocket closes on connect.

Fix: Set your Goldrush API key in .env and pass it as connection params in stream.ts. Restart the process.

iv) Timezone drift — Symptom: Replay timestamps don’t match live; day boundaries are off.

Fix: Always log UTC ISO-8601 (Z). Don’t re-stamp with local time during replay.

v) Schema drift (JSON parse warnings) — Symptom: Replay/backtest prints “skip bad line” or stops early.

Fix: Keep identical keys on each line. If you add fields, increment a version number (_v) and handle it explicitly in readers.

vi) Decimals/price weirdness — Symptom: Edges appear 10 times too large or small.

Fix: Verify token decimals and price calc in state.ts. The post-fee spread is expressed in basis points (bps); don’t confuse this with a percentage (%).

vii) Replay doesn’t match live output — Symptom: Different edges for the same window.

Fix: Use the same code path for live and replay; remove any if (replay) logic. Store UTC only. Confirm the capture period matches.

viii) trades.csv is empty — Symptom: Backtest runs but writes no trades.

Fix: Lower policy.spread_threshold_bps (e.g., 45 → 30), reduce slippage_bps, or pick a busier capture window.

ix) PnL looks unrealistically high — Symptom: Big positive totals on quiet days.

Fix: Add fees for both pools and a gas estimate (fees.poolA_bps/poolB_bps/gas_bps). Apply a conservative slippage_bps.

x) Latency spikes create false edges — Symptom: Edges appear in bursts during disconnects.

Fix: Log latA_ms/latB_ms. Exclude minutes with gaps or elevated latency; be stricter on the threshold in those windows.

Closing thoughts

You now have a reproducible loop you can measure: it captures what the bot saw, replays those ticks through the same code paths, adds small features for stability, and validates choices with a walk-forward backtest. Settings live in config.yaml, outputs land in data/ (ticks-*.ndjson, trades.csv). It’s honest by design—no test-only math.

What you accomplished

Here’s the checklist you can point to before tuning further. If any line isn’t green, revisit the matching step in the tutorial.

Logged NDJSON ticks (UTC, prices, latency, pre-fee spreads) at 1 Hz.
Built a 1×/N× replay that matches live output for parity checks.
Added explainable features (post-fee spread, short EMA, realized vol, thin-depth penalty).
Wrapped a cautious policy that emits consistent paper trades with after-cost PnL.
Ran a walk-forward backtest (train → test → roll) and produced trades.csv + a summary.
Centralized tunables in config.yaml for clean, comparable runs.

What you’ll build next

With evidence in hand, Part 3 moves from paper to files. The goal is controlled execution on Sonic with guardrails from day one.

Wire live execution on Sonic: pick pools, build/submit trades with conservative slippage and timeouts.
Add safeguards: dry-run toggle, circuit breakers (for latency/price drift), and graceful revert handling.
Ship light telemetry: structured logs, per-trade PnL, basic metrics.

Keep the loop production-ready—only fire when data and venue line up.

Real-Time Base Whale Tracking Using GoldRush Streaming API- Part 1

Calculating Impermanent Loss in Real-Time on Solana using the GoldRush Streaming API - Part 1