Version: v4

Jobs

Harper uses an asynchronous job system for long-running data operations. When a bulk operation is initiated — such as loading a large CSV file or exporting millions of records — Harper starts a background job and immediately returns a job ID. Use the job ID to check progress and status.

Job status values:

IN_PROGRESS — the job is currently running
COMPLETE — the job finished successfully

Bulk Operations

The following operations create jobs. All bulk operations are sent to the Operations API.

CSV Data Load

Ingests CSV data provided directly in the request body.

operation (required) — csv_data_load
database (optional) — target database; defaults to data
table (required) — target table
action (optional) — insert, update, or upsert; defaults to insert
data (required) — CSV content as a string

{
	"operation": "csv_data_load",
	"database": "dev",
	"action": "insert",
	"table": "breed",
	"data": "id,name,country\n1,Labrador,Canada\n2,Poodle,France\n"
}

Response:

{
	"message": "Starting job with id 2fe25039-566e-4670-8bb3-2db3d4e07e69",
	"job_id": "2fe25039-566e-4670-8bb3-2db3d4e07e69"
}

CSV File Load

Ingests CSV data from a file on the server's local filesystem.

The CSV file must reside on the same machine running Harper.

operation (required) — csv_file_load
database (optional) — target database; defaults to data
table (required) — target table
action (optional) — insert, update, or upsert; defaults to insert
file_path (required) — absolute path to the CSV file on the host

{
	"operation": "csv_file_load",
	"action": "insert",
	"database": "dev",
	"table": "breed",
	"file_path": "/home/user/imports/breeds.csv"
}

CSV URL Load

Ingests CSV data from a URL.

operation (required) — csv_url_load
database (optional) — target database; defaults to data
table (required) — target table
action (optional) — insert, update, or upsert; defaults to insert
csv_url (required) — URL pointing to the CSV file

{
	"operation": "csv_url_load",
	"action": "insert",
	"database": "dev",
	"table": "breed",
	"csv_url": "https://s3.amazonaws.com/mydata/breeds.csv"
}

Import from S3

Imports CSV or JSON files from an AWS S3 bucket.

operation (required) — import_from_s3
database (optional) — target database; defaults to data
table (required) — target table
action (optional) — insert, update, or upsert; defaults to insert
s3 (required) — S3 connection details:
- aws_access_key_id
- aws_secret_access_key
- bucket
- key — filename including extension (.csv or .json)
- region

{
	"operation": "import_from_s3",
	"action": "insert",
	"database": "dev",
	"table": "dog",
	"s3": {
		"aws_access_key_id": "YOUR_KEY",
		"aws_secret_access_key": "YOUR_SECRET_KEY",
		"bucket": "BUCKET_NAME",
		"key": "dogs.json",
		"region": "us-east-1"
	}
}

Export Local

Exports table data to a local file in JSON or CSV format.

operation (required) — export_local
format (required) — json or csv
path (required) — local directory path where the export file will be written
search_operation (required) — query to select records: search_by_hash, search_by_value, search_by_conditions, or sql

Changed in: v4.3.0 — search_by_conditions added as a supported search operation for exports

filename (optional) — filename without extension; auto-generated from epoch timestamp if omitted

{
	"operation": "export_local",
	"format": "json",
	"path": "/data/exports/",
	"search_operation": {
		"operation": "sql",
		"sql": "SELECT * FROM dev.breed"
	}
}

Export to S3

Exports table data to an AWS S3 bucket in JSON or CSV format.

Changed in: v4.3.0 — search_by_conditions added as a supported search operation

operation (required) — export_to_s3
format (required) — json or csv
s3 (required) — S3 connection details (same fields as Import from S3, plus key for the output object name)
search_operation (required) — search_by_hash, search_by_value, search_by_conditions, or sql

{
	"operation": "export_to_s3",
	"format": "json",
	"s3": {
		"aws_access_key_id": "YOUR_KEY",
		"aws_secret_access_key": "YOUR_SECRET_KEY",
		"bucket": "BUCKET_NAME",
		"key": "exports/dogs.json",
		"region": "us-east-1"
	},
	"search_operation": {
		"operation": "sql",
		"sql": "SELECT * FROM dev.dog"
	}
}

Delete Records Before

Deletes records older than a given timestamp from a table. Operates only on the local node — clustered replicas retain their data.

Restricted to super_user roles.

operation (required) — delete_records_before
schema (required) — database name
table (required) — table name
date (required) — records with __createdtime__ before this timestamp are deleted. Format: YYYY-MM-DDThh:mm:ss.sZ

{
	"operation": "delete_records_before",
	"date": "2024-01-01T00:00:00.000Z",
	"schema": "dev",
	"table": "breed"
}

Managing Jobs

Get Job

Returns status, metrics, and messages for a specific job by ID.

operation (required) — get_job
id (required) — job ID

{
	"operation": "get_job",
	"id": "4a982782-929a-4507-8794-26dae1132def"
}

Response:

[
	{
		"__createdtime__": 1611615798782,
		"__updatedtime__": 1611615801207,
		"created_datetime": 1611615798774,
		"end_datetime": 1611615801206,
		"id": "4a982782-929a-4507-8794-26dae1132def",
		"job_body": null,
		"message": "successfully loaded 350 of 350 records",
		"start_datetime": 1611615798805,
		"status": "COMPLETE",
		"type": "csv_url_load",
		"user": "HDB_ADMIN",
		"start_datetime_converted": "2021-01-25T23:03:18.805Z",
		"end_datetime_converted": "2021-01-25T23:03:21.206Z"
	}
]

Search Jobs by Start Date

Returns all jobs started within a time window.

Restricted to super_user roles.

operation (required) — search_jobs_by_start_date
from_date (required) — start of the search window (ISO 8601 format)
to_date (required) — end of the search window (ISO 8601 format)

{
	"operation": "search_jobs_by_start_date",
	"from_date": "2024-01-01T00:00:00.000+0000",
	"to_date": "2024-01-02T00:00:00.000+0000"
}

Data Loader — Component-based data loading as part of deployment
Operations API — Sending operations to Harper
Transaction Logging — Recording a history of changes made to tables

Bulk Operations​

CSV Data Load​

CSV File Load​

CSV URL Load​

Import from S3​

Export Local​

Export to S3​

Delete Records Before​

Managing Jobs​

Get Job​

Search Jobs by Start Date​

Related Documentation​