Compression Training and Request Replay

My now page is too narrow for medium-form content. To recap today's entry, here are the bullets:

zstd dictionaries as a service.
Sampled recording of input, state collected, and decisions made for a high-volume request service.
Tooling to replay and introspect the above.

Automated dictionary training may not be novel — but it's still a fun problem to solve when done right.

The play is a service that accepts samples associated to a multi-tenanted usecase. Given via gRPC or pubsub, each is uploaded to blob storage and periodically trained based on usecase-specific thresholds: samples, cumulative bytes, time since last training. Compressors verify a usecase:tenant's active dictionary before encoding, with the dict ID placed in the zstd header for decompressors to load (when not pre-fetched).

A usecase might send samples via RPC, using dynamic sampling to equally represent each tenant. Or it might have a cronjob to do query magic and upload in-batch. Some may have a recency bias, benefitting from training every 5 minutes. Most will do with 1-4x an hour and tolerating delays.

The latter two are my actual team's domain.

We own a graph modeling service. Fetch global state based on an incoming request, constructing as we go, perform business decisions on the graph, flatten it back out and update global state.

Fast forward a month and try to answer why that decision was made — and how it impacted today's outcome. It's tough. And while we're far from #maxscale, keeping hi-res debug data isn't feasible at 1-2B events/day.

Our fix decides to record a request or not as it comes in at a tiny sample rate¹. When enabled, the recorder takes note of the input parameters, any global state at the time it was fetched, the output objects, and the database write operations². These are flushed to object storage with the request_id.

This week I'm getting our tooling to fetch this blob, construct a graph based on the fetched global state, and compare it to one generated from replaying the request with that state. Theoretically we can improve our RC strategy by pulling down N requests and verifying we made the same decisions in shadow before releasing.

As everything we do is multi-tenanted, dynamic sampling is best. [dynsampler-go](https://github.com/honeycombio/dynsampler-go) {go} from Honeycomb is lovely.↩︎
Because this system already abuses memory and is in Go, recording is optimized for minimal alloc + GC blindness.↩︎