Schema
Harper uses GraphQL Schema Definition Language (SDL) to declaratively define table structure. Schema definitions are loaded from .graphql files in a component directory and control table creation, attribute types, indexing, and relationships.
Overview
Added in: v4.2.0Schemas are defined using standard GraphQL type definitions with Harper-specific directives. A schema definition:
- Ensures required tables exist when a component is deployed
- Enforces attribute types and required constraints
- Controls which attributes are indexed
- Defines relationships between tables
- Configures computed properties, expiration, and audit behavior
Schemas are flexible by default — records may include additional properties beyond those declared in the schema. Use the @sealed directive to prevent this.
A minimal example:
type Dog @table {
id: Long @primaryKey
name: String
breed: String
age: Int
}
type Breed @table {
id: Long @primaryKey
name: String @indexed
}
Loading Schemas
In a component's config.yaml, specify the schema file with the graphqlSchema plugin:
graphqlSchema:
files: 'schema.graphql'
Keep in mind that both plugins and applications can specify schemas.
Type Directives
Type directives apply to the entire table type definition.
@table
Marks a GraphQL type as a Harper database table. The type name becomes the table name by default.
type MyTable @table {
id: Long @primaryKey
}
Optional arguments:
| Argument | Type | Default | Description |
|---|---|---|---|
table | String | type name | Override the table name |
database | String | "data" | Database to place the table in |
expiration | Int | — | Seconds until a record goes stale (useful for caching tables) |
eviction | Int | 0 | Additional seconds after expiration before a record is physically removed |
scanInterval | Int | (expiration + eviction) / 4 | Seconds between eviction scans |
replicate | Boolean | true | Enable replication of this table |
expiration, eviction, and scanInterval
These three arguments work together to control the full lifecycle of a cached record:
expiration— When elapsed, a record is considered stale. The next request for a stale record triggers a fetch from the source. The record may still be served while revalidation is in progress.eviction— Additional time afterexpirationbefore the record is physically removed from the table. Settingeviction > 0lets you serve the stale record while revalidation happens and controls how long after expiration the data is kept on disk.scanInterval— How often Harper scans the table for records to evict. Defaults to one quarter ofexpiration + eviction.
You can provide a single expiration value and all three behaviors share the same TTL. To tune them independently:
# Expire after 5 minutes, evict after 1 hour, scan every 10 minutes
type WeatherCache @table(expiration: 300, eviction: 3300, scanInterval: 600) {
id: ID @primaryKey
temperature: Float
}
How scanInterval Determines the Eviction Cycle
scanInterval determines fixed clock-aligned times when eviction runs. Harper divides the clock into evenly spaced anchors based on the interval, calculated in the server's local timezone. As a result:
- The server's startup time does not affect when eviction runs.
- Eviction timings are deterministic and timezone-aware.
- For any given configuration, the eviction schedule is the same across restarts and across servers in the same local timezone.
Example: 1-hour expiration — default scanInterval = 15 minutes (one quarter of expiration). Eviction schedule:
00:00, 00:15, 00:30, 00:45, 01:00, ...
If the server starts at 12:05, the first eviction runs at 12:15 — not 12:20. The schedule is clock-aligned, not startup-aligned.
Example: 1-day expiration — default scanInterval = 6 hours. Eviction schedule:
00:00, 06:00, 12:00, 18:00, ...
Eviction with Indexing
Eviction removes non-indexed record data, but it does not remove a record from its secondary indexes. If an evicted record matches a search query, Harper fetches the full record from the source on demand to satisfy the query. This means indexes remain fully functional even when most of the data has been evicted.
Examples:
# Override table name
type Product @table(table: "products") {
id: Long @primaryKey
}
# Place in a specific database
type Order @table(database: "commerce") {
id: Long @primaryKey
}
# Auto-expire records after 1 hour (e.g., a session cache)
type Session @table(expiration: 3600) {
id: Long @primaryKey
userId: String
}
# Disable replication for this table explicitly
type LocalRecord @table(replicate: false) {
id: Long @primaryKey
value: String
}
# Combine multiple arguments
type Event @table(database: "analytics", expiration: 86400) {
id: Long @primaryKey
name: String @indexed
}
Database naming: Since all tables default to the data database, when designing plugins or applications, consider using unique database names to avoid table naming collisions.
Replication: Replication is enabled by default for all tables. Note that if you disable replication on a table and re-enable it later, it will not catch-up on previous writes during when the replication was disabled.
@export
Exposes the table as an externally accessible resource endpoint, available via REST, MQTT, and other interfaces.
type MyTable @table @export(name: "my-table") {
id: Long @primaryKey
}
The optional name parameter specifies the URL path segment (e.g., /my-table/). Without name, the type name is used.
@sealed
Prevents records from including any properties beyond those explicitly declared in the type. By default, Harper allows records to have additional properties.
type StrictRecord @table @sealed {
id: Long @primaryKey
name: String
}
@hidden (Type Directive)
Suppresses the type from introspectable surfaces — MCP tool descriptors and the OpenAPI document. The table still exists; data is still queryable through Harper's other interfaces subject to RBAC. @hidden is a metadata-visibility directive, not an access-control mechanism: use attribute_permissions on roles to control data access.
type InternalConfig @table @hidden {
id: Long @primaryKey
value: String
}
@hidden is also available as a field directive to suppress individual attributes.
Documenting Types and Fields
Harper picks up GraphQL's standard triple-quoted docstrings on type and field definitions. Docstrings flow through to:
- MCP —
Table.description(consumed as a prefix on every verb-tool description) andinputSchema.properties[*].descriptionon derived tool schemas - OpenAPI —
components.schemas[*].description, per-propertydescription, and the path-leveldescriptionfor every verb on the resource
"""
Product catalog row — what shows up in the storefront listing,
search, and inventory feeds. One row per SKU.
"""
type Product @table @export {
"""
Stock keeping unit — globally unique across catalogs.
"""
sku: String! @primaryKey
"""
Display name shown in the storefront.
"""
name: String!
"""
Retail price in cents (USD).
"""
priceCents: Int!
}
Docstrings on @hidden fields are dropped from the descriptive surfaces alongside the field itself.
Trust model. Docstrings reach LLMs and public OpenAPI consumers verbatim. Treat them as code: don't put secrets, internal-only commentary, or speculative prose in them. Use
@hiddento suppress fields that shouldn't surface publicly.
Field Directives
Field directives apply to individual attributes in a type definition.
@primaryKey
Designates the attribute as the table's primary key. Primary keys must be unique; inserts with a duplicate primary key are rejected.
type Product @table {
id: Long @primaryKey
name: String
}
If no primary key is provided on insert, Harper auto-generates one:
- UUID string — when type is
StringorID - Auto-incrementing integer — when type is
Int,Long, orAny
Auto-incrementing integer primary keys were added. Previously only UUID generation was supported for ID and String types.
Using Long or Any is recommended for auto-generated numeric keys. Int is limited to 32-bit and may be insufficient for large tables.
@indexed
Creates a secondary index on the attribute for fast querying. Required for filtering by this attribute in REST queries, SQL, or NoSQL operations.
type Product @table {
id: Long @primaryKey
category: String @indexed
price: Float @indexed
}
If the field value is an array, each element in the array is individually indexed, enabling queries by any individual value.
Null values are indexed by default (added in v4.3.0), enabling queries like GET /Product/?category=null.
@embed
Added in: v5.1.0
Automatically computes an embedding vector for the attribute whenever the source field is written, using a configured embedding model:
type Document @table {
id: Long @primaryKey
text: String
embedding: [Float] @embed(source: "text", model: "default")
}
source— the name of the field to embed. Must be a declared field on the same type, passed as a string literal.model— the logical name of a configured embedding model, passed as a string literal.
The attribute type must be [Float]. The attribute is automatically indexed with an HNSW vector index, so it is immediately searchable by similarity; an explicit @indexed on the same attribute is allowed only if it is also HNSW.
Write semantics:
- Creating a record with the source field, or updating the source field, computes the vector before the write commits (with
inputType: 'document'). A failure to compute the embedding fails the write. - An update that does not touch the source field leaves the vector unchanged.
- Setting the source field to
nullsets the vector tonull. - Replicated writes and audit-log replays do not re-embed — the vector travels with the record, and only the node that accepted the original write calls the model.
Multiple @embed attributes on one type are computed concurrently.
@createdTime
Automatically assigns a creation timestamp (Unix epoch milliseconds) to the attribute when a record is created.
type Event @table {
id: Long @primaryKey
createdAt: Long @createdTime
}
@updatedTime
Automatically assigns a timestamp (Unix epoch milliseconds) each time the record is updated.
type Event @table {
id: Long @primaryKey
updatedAt: Long @updatedTime
}
@hidden (Field Directive)
Suppresses the field from MCP tool descriptors and the OpenAPI document. The attribute still exists in the table; data is still queryable through other interfaces subject to RBAC. Use this for fields that should not appear in introspectable surfaces.
type Customer @table {
id: Long @primaryKey
name: String
"""
Internal — do not surface to external consumers.
"""
creditScore: Int @hidden
}
@hidden is a metadata-visibility directive, not access control: attribute_permissions on roles remains the data-access enforcement mechanism.
Relationships
Added in: v4.3.0The @relationship directive defines how one table relates to another through a foreign key. Relationships enable join queries and allow related records to be selected as nested properties in query results.
@relationship(from: attribute) — many-to-one or many-to-many
The foreign key is in this table, referencing the primary key of the target table.
type RealityShow @table @export {
id: Long @primaryKey
networkId: Long @indexed # foreign key
network: Network @relationship(from: networkId) # many-to-one
title: String @indexed
}
type Network @table @export {
id: Long @primaryKey
name: String @indexed # e.g. "Bravo", "Peacock", "Netflix"
}
Query shows by network name:
GET /RealityShow?network.name=Bravo
If the foreign key is an array, this establishes a many-to-many relationship (e.g., a show with multiple streaming homes):
type RealityShow @table @export {
id: Long @primaryKey
networkIds: [Long] @indexed
networks: [Network] @relationship(from: networkIds)
}
@relationship(to: attribute) — one-to-many or many-to-many
The foreign key is in the target table, referencing the primary key of this table. The result type must be an array.
type Network @table @export {
id: Long @primaryKey
name: String @indexed # e.g. "Bravo", "Peacock", "Netflix"
shows: [RealityShow] @relationship(to: networkId) # one-to-many
# shows like "Real Housewives of Atlanta", "The Traitors", "Vanderpump Rules"
}
@relationship(from: attribute, to: attribute) — foreign key to foreign key
Both from and to can be specified together to define a relationship where neither side uses the primary key — a foreign key to foreign key join. This is useful for many-to-many relationships that join on non-primary-key attributes.
type OrderItem @table @export {
id: Long @primaryKey
orderId: Long @indexed
productSku: Long @indexed
product: Product @relationship(from: productSku, to: sku) # join on sku, not primary key
}
type Product @table @export {
id: Long @primaryKey
sku: Long @indexed
name: String
}
Schemas can also define self-referential relationships, enabling parent-child hierarchies within a single table.
Computed Properties
Added in: v4.4.0The @computed directive marks a field as derived from other fields at query time. Computed properties are not stored in the database but are evaluated when the field is accessed.
type Product @table {
id: Long @primaryKey
price: Float
taxRate: Float
totalPrice: Float @computed(from: "price + (price * taxRate)")
}
The from argument is a JavaScript expression that can reference other record fields.
Computed properties can also be defined in JavaScript for complex logic:
type Product @table {
id: Long @primaryKey
totalPrice: Float @computed
}
tables.Product.setComputedAttribute('totalPrice', (record) => {
return record.price + record.price * record.taxRate;
});
Computed properties are not included in query results by default — use select to include them explicitly.
Computed Indexes
Computed properties can be indexed with @indexed, enabling custom indexing strategies such as composite indexes, full-text search, or vector indexing:
type Product @table {
id: Long @primaryKey
tags: String
tagsSeparated: String[] @computed(from: "tags.split(/\\s*,\\s*/)") @indexed
}
When using a JavaScript function for an indexed computed property, use the version argument to ensure re-indexing when the function changes:
type Product @table {
id: Long @primaryKey
totalPrice: Float @computed(version: 1) @indexed
}
Increment version whenever the computation function changes. Failing to do so can result in an inconsistent index.
Vector Indexing
Added in: v4.6.0Use @indexed(type: "HNSW") to create a vector index using the Hierarchical Navigable Small World algorithm, designed for fast approximate nearest-neighbor search on high-dimensional vectors.
type Document @table {
id: Long @primaryKey
textEmbeddings: [Float] @indexed(type: "HNSW")
}
Embedding vectors can also be computed automatically at write time from a text field with the @embed directive, which creates the HNSW index implicitly.
Query by nearest neighbors using the sort parameter:
let results = Document.search({
sort: { attribute: 'textEmbeddings', target: searchVector },
limit: 5,
});
HNSW can be combined with filter conditions:
let results = Document.search({
conditions: [{ attribute: 'price', comparator: 'lt', value: 50 }],
sort: { attribute: 'textEmbeddings', target: searchVector },
limit: 5,
});
Filtering by Distance Threshold
To return only records whose distance to a target vector is below a threshold, place target directly on the condition (alongside comparator and value). This returns matches within the threshold without using sort:
let results = Document.search({
conditions: {
attribute: 'textEmbeddings',
comparator: 'lt',
value: 0.1,
target: searchVector,
},
});
This form is useful when you want to bound result quality by a similarity cutoff rather than ranking by similarity.
Selecting the Distance
Use the special $distance field in select to include the computed distance from the target vector in returned records:
let results = Document.search({
select: ['name', '$distance'],
sort: { attribute: 'textEmbeddings', target: searchVector },
limit: 5,
});
$distance is available in both sort-based ranking and conditions-based threshold queries.
Per-Query Search Options
The sort descriptor (and threshold condition) accepts options that tune an individual query:
let results = Document.search({
sort: { attribute: 'textEmbeddings', target: searchVector, distance: 'dotProduct', ef: 200 },
limit: 5,
});
distance— overrides the index's distance function for this query:"cosine","euclidean", or"dotProduct"(dotProductAdded in: v5.1.0).efAdded in: v5.1.0 — overrides the search exploration budget for this query. Higher values improve recall at the cost of latency.
Changed in: v5.1.0 — When a query passes no ef and the index does not explicitly configure efConstructionSearch (or efConstruction), the search budget auto-scales with the size of the index, so recall holds as the table grows instead of decaying with a fixed budget.
HNSW Parameters
| Parameter | Default | Description |
|---|---|---|
distance | "cosine" | Distance function: "cosine" (negative cosine similarity), "euclidean", or "dotProduct" (added in v5.1.0) |
efConstruction | 100 | Max nodes explored during index construction. Higher = better recall, lower = better performance |
M | 16 | Preferred connections per graph layer. Higher = more space, better recall for high-dimensional data |
optimizeRouting | 0.5 | Heuristic aggressiveness for omitting redundant connections (0 = off, 1 = most aggressive) |
mL | computed from M | Normalization factor for level generation |
efConstructionSearch | auto-scaled | Max nodes explored during search. When unset, auto-scales with index size (see above); setting it (or efConstruction, which seeds it) fixes the budget |
quantization | — | "int8" stores vectors quantized to int8 (added in v5.1.0, see below) |
Example with custom parameters:
type Document @table {
id: Long @primaryKey
textEmbeddings: [Float] @indexed(type: "HNSW", distance: "euclidean", optimizeRouting: 0, efConstructionSearch: 100)
}
Note: this parameter was previously documented as efSearchConstruction; the option name Harper reads is efConstructionSearch.
Changed in: v5.1.0 — Changing efConstructionSearch on an existing index no longer triggers a rebuild; it only affects searches. Structural parameters (distance, M, efConstruction, quantization) still rebuild the index when changed.
Vector Quantization
Added in: v5.1.0quantization: "int8" stores the index's vectors quantized to 8-bit integers, substantially reducing index size and memory traffic:
type Document @table {
id: Long @primaryKey
textEmbeddings: [Float] @indexed(type: "HNSW", quantization: "int8")
}
Graph navigation runs on the quantized (approximate) distances. For nearest-neighbor sort queries, Harper re-ranks the results against the full-precision vectors stored on the records, restoring exact ordering and exact $distance values. Distance-threshold (lt/le) queries currently filter on the approximate distance.
Field Types
Harper supports the following field types:
| Type | Description |
|---|---|
String | Unicode text, UTF-8 encoded |
Int | 32-bit signed integer (−2,147,483,648 to 2,147,483,647) |
Long | 54-bit signed integer (−9,007,199,254,740,992 to 9,007,199,254,740,992) |
Float | 64-bit double precision floating point |
BigInt | Integer up to ~300 digits. Note: distinct JavaScript type; handle appropriately in custom code |
Boolean | true or false |
ID | String; indicates a non-human-readable identifier |
Any | Any primitive, object, or array |
Date | JavaScript Date object |
Bytes | Binary data as Buffer or Uint8Array |
Blob | Binary large object; designed for streaming content >20KB |
Added BigInt in v4.3.0
Added Blob in v4.5.0
Arrays of a type are expressed with [Type] syntax (e.g., [Float] for a vector).
Blob Type
Added in: v4.5.0Blob fields are designed for large binary content. Harper's Blob type implements the Web API Blob interface, so all standard Blob methods (.text(), .arrayBuffer(), .stream(), .slice()) are available. Unlike Bytes, blobs are stored separately from the record, support streaming, and do not need to be held entirely in memory. Use Blob for content typically larger than 20KB (images, video, audio, large HTML, etc.).
See Blob usage details below.
Blob Usage
Declare a blob field:
type MyTable @table {
id: Any! @primaryKey
data: Blob
}
Create and store a blob using createBlob():
let blob = createBlob(largeBuffer);
await MyTable.put({ id: 'my-record', data: blob });
Retrieve blob data using standard Web API Blob methods:
let record = await MyTable.get('my-record');
let buffer = await record.data.bytes(); // ArrayBuffer
let text = await record.data.text(); // string
let stream = record.data.stream(); // ReadableStream
Blobs support asynchronous streaming, meaning a record can reference a blob before it is fully written to storage. Use saveBeforeCommit: true to wait for full write before committing:
let blob = createBlob(stream, { saveBeforeCommit: true });
await MyTable.put({ id: 'my-record', data: blob });
Any string or buffer assigned to a Blob field in a put, patch, or publish is automatically coerced to a Blob.
When returning a blob via REST, register an error handler to handle interrupted streams:
export class MyEndpoint extends MyTable {
static async get(target) {
const record = super.get(target);
let blob = record.data;
blob.on('error', () => {
MyTable.invalidate(target);
});
return { status: 200, headers: {}, body: blob };
}
}
Dynamic Schema Behavior
When a table is created through the Operations API or Studio without a schema definition, it follows dynamic schema behavior:
- Attributes are reflexively created as data is ingested
- All top-level attributes are automatically indexed
- Records automatically get
__createdtime__and__updatedtime__audit attributes
Dynamic schema tables are additive — new attributes are added as new data arrives. Existing records will have null for any newly added attributes.
Use create_attribute and drop_attribute operations to manually manage attributes on dynamic schema tables. See the Operations API for details.
OpenAPI Specification
Tables exported with @export are described via an /openapi endpoint on the main HTTP server associated with the REST service (default port 9926).
GET http://localhost:9926/openapi
This provides an OpenAPI 3.x description of all exported resource endpoints. The endpoint is a starting guide and may not cover every edge case.
Renaming Tables
Harper does not support renaming tables. Changing a type name in a schema definition creates a new, empty table — the original table and its data are unaffected.
Related Documentation
- JavaScript API —
tables,databases,transaction(), andcreateBlob()globals for working with schema-defined tables in code - Data Loader — Seed tables with initial data alongside schema deployment
- REST Querying — Querying tables via HTTP using schema-defined attributes and relationships
- Resources — Extending table behavior with custom application logic
- Storage Algorithm — How Harper indexes and stores schema-defined data
- Configuration — Component configuration for schemas