Version: v5

Storage Tuning

Harper's storage configuration section controls how database files are written, cached, and reclaimed on disk. Defaults are tuned for safety and balanced throughput; this page covers the knobs that matter for production deployments with specific workload profiles.

For a quick reference of every option, see Configuration Options — storage. For the underlying mechanics, see Storage Algorithm.

Durability vs. Throughput

`storage.writeAsync`

Type: boolean • Default: false

Disables fsync on commit. Writes return as soon as data is queued to the page cache, dramatically increasing throughput on write-heavy workloads.

This disables durability guarantees. A power loss or OS crash between the application commit and the OS flushing pages to disk can lose recently committed transactions. The database itself remains structurally consistent — only the most recent writes are at risk.

Enable only when:

The data is reproducible from an upstream source (e.g., caches with sourcedFrom).
The workload is bulk ingest where some loss on crash is acceptable and the operation can be re-run.
Replication provides durability — peers acknowledge writes before they could be lost.

storage:
  writeAsync: true

`storage.maxTransactionQueueTime`

Type: duration string • Default: 45s

The maximum estimated time a write may wait in the commit queue before Harper rejects new writes with HTTP 503. Acts as backpressure when downstream disk I/O cannot keep up with incoming writes.

Lower this in latency-sensitive systems where it is better to shed load early than to let request queues grow. Raise it when occasional disk-write bursts are expected and the application can tolerate longer commit latency.

storage:
  maxTransactionQueueTime: 30s

Compression

`storage.compression`

Type: boolean | object • Default: true

LZ4 record compression is enabled by default. It typically reduces on-disk size by 2–4× for JSON-like records with modest CPU cost.

For object form:

Property	Type	Description
`dictionary`	`string`	Path to a Zstd-style compression dictionary. Training a dictionary on representative records improves the compression ratio for small records.
`threshold`	`number`	Records smaller than this many bytes are stored uncompressed. Useful when small records dominate and the overhead of compression headers outweighs gains.

storage:
  compression:
    threshold: 256
    dictionary: ~/hdb/keys/records.dict

Disable entirely for workloads dominated by small numeric records or pre-compressed payloads (e.g., images, video):

storage:
  compression: false

Blob Storage Paths

`storage.blobPaths`

Type: string | string[] • Default: <rootPath>/blobs

Blob attributes (declared with Blob in schema.graphql or written via createBlob) are stored outside the main database files. blobPaths accepts a single path or an array — Harper distributes blob writes across the listed paths.

Common configurations:

Separate fast disk for blobs: Place blobPaths on a higher-bandwidth volume (e.g., NVMe) while keeping the index on a smaller, lower-latency drive.
Multiple volumes: Provide an array to spread storage across drives. Harper picks the path with the most free space for each new blob, providing crude load-balancing without RAID.
Storage tiers: Mount large but slower object storage at one of the paths to absorb older blobs while keeping hot blobs on fast storage.

storage:
  blobPaths:
    - /mnt/nvme0/harper-blobs
    - /mnt/nvme1/harper-blobs

Blobs are not relocated when blobPaths changes — only new blobs honor the updated configuration. Existing blob references continue to resolve at their original path.

Read & Write Behavior

`storage.prefetchWrites`

Type: boolean • Default: true

Before a write transaction commits, Harper loads the affected pages into memory if not already present. This avoids stalling the commit on a page fault.

Disable only when memory pressure makes the prefetch counterproductive — for example, when transactions touch records on cold pages that are not expected to be re-read soon.

`storage.noReadAhead`

Type: boolean • Default: false

Advises the OS via posix_fadvise not to read ahead beyond the requested pages. Useful for random-access workloads on rotational disks where speculative reads pollute the page cache. Leave at the default for sequential or scan-heavy workloads.

`storage.pageSize`

Type: number • Default: OS page size (typically 4096 bytes)

Changes the database page size. Larger pages can reduce write amplification for large records but increase the minimum I/O unit. Only set this on a fresh database — existing files cannot be migrated to a different page size.

`storage.caching`

Type: boolean • Default: true

In-memory record caching of decoded records. Disable to reduce heap usage when records are large and unlikely to be re-read in the same process.

Storage Reclamation

storage.reclamation controls how Harper evicts data from caching tables (tables with sourcedFrom) when disk usage runs high. Reclamation does not affect non-caching tables — those rely on explicit deletion, TTL expiration, or compaction.

`storage.reclamation.threshold`

Type: number (ratio) • Default: 0.4

Minimum fraction of the volume that should remain free. When free space falls below this ratio, reclamation begins evicting expired and lightly-used entries from caching tables. A larger value reclaims earlier and more aggressively; a smaller value defers reclamation closer to the volume filling.

`storage.reclamation.interval`

Type: duration string • Default: 1h

How often Harper checks free space against the threshold. Lower intervals catch fast-filling volumes sooner at the cost of more periodic I/O.

`storage.reclamation.evictionFactor`

Type: number • Default: 100000

Tunes the heuristic used to evict entries early when reclamation priority is high. The heuristic considers each entry's remaining time-to-expiration, record size, and how long ago it was last refreshed. Lowering this evicts more aggressively when free space is critical; raising it preserves entries longer.

storage:
  reclamation:
    threshold: 0.3 # start reclaiming at 30% free
    interval: 30m
    evictionFactor: 50000

For a deeper discussion of how sourcedFrom interacts with reclamation, see Resource API — sourcedFrom.

Compaction on Start

`storage.compactOnStart`

Type: boolean • Default: false

Runs compaction on all non-system databases at startup. Useful when deployments include scheduled restarts and you want to reclaim fragmented space as part of the maintenance window.

`storage.compactOnStartKeepBackup`

Type: boolean • Default: false

Retains the pre-compaction backup files after compactOnStart runs. Recommended for the first few cycles in production while validating compaction behavior; the backups can be removed manually once confidence is established.

Workload Recipes

Write-heavy ingest, durability via replication:

storage:
  writeAsync: true
  prefetchWrites: true
  maxTransactionQueueTime: 60s

Read-heavy cache layer with large blobs:

storage:
  caching: true
  blobPaths:
    - /mnt/fast-ssd/harper-blobs
  reclamation:
    threshold: 0.2
    interval: 15m

Memory-constrained edge deployment:

storage:
  caching: false
  noReadAhead: true
  compression: true

Configuration Options — full list of storage options
Storage Algorithm — how Harper stores records and indexes on disk
Compaction — reclaiming space inside existing database files
Resource API — sourcedFrom — caching tables that interact with reclamation
Database API — createBlob — creating blobs that live under blobPaths

Durability vs. Throughput​

storage.writeAsync​

storage.maxTransactionQueueTime​

Compression​

storage.compression​

Blob Storage Paths​

storage.blobPaths​

Read & Write Behavior​

storage.prefetchWrites​

storage.noReadAhead​

storage.pageSize​

storage.caching​

Storage Reclamation​

storage.reclamation.threshold​

storage.reclamation.interval​

storage.reclamation.evictionFactor​

Compaction on Start​

storage.compactOnStart​

storage.compactOnStartKeepBackup​

Workload Recipes​

Related​

Durability vs. Throughput

`storage.writeAsync`

`storage.maxTransactionQueueTime`

Compression

`storage.compression`

Blob Storage Paths

`storage.blobPaths`

Read & Write Behavior

`storage.prefetchWrites`

`storage.noReadAhead`

`storage.pageSize`

`storage.caching`

Storage Reclamation

`storage.reclamation.threshold`

`storage.reclamation.interval`

`storage.reclamation.evictionFactor`

Compaction on Start

`storage.compactOnStart`

`storage.compactOnStartKeepBackup`

Workload Recipes

Related