Jobs
Harper uses an asynchronous job system for long-running data operations. When a bulk operation is initiated — such as loading a large CSV file or exporting millions of records — Harper starts a background job and immediately returns a job ID. Use the job ID to check progress and status.
Job status values:
IN_PROGRESS— the job is currently runningCOMPLETE— the job finished successfully
Bulk Operations
The following operations create jobs. All bulk operations are sent to the Operations API.
CSV Data Load
Ingests CSV data provided directly in the request body.
operation(required) —csv_data_loaddatabase(optional) — target database; defaults todatatable(required) — target tableaction(optional) —insert,update, orupsert; defaults toinsertdata(required) — CSV content as a string
{
"operation": "csv_data_load",
"database": "dev",
"action": "insert",
"table": "breed",
"data": "id,name,country\n1,Labrador,Canada\n2,Poodle,France\n"
}
Response:
{
"message": "Starting job with id 2fe25039-566e-4670-8bb3-2db3d4e07e69",
"job_id": "2fe25039-566e-4670-8bb3-2db3d4e07e69"
}
CSV File Load
Ingests CSV data from a file on the server's local filesystem.
The CSV file must reside on the same machine running Harper.
operation(required) —csv_file_loaddatabase(optional) — target database; defaults todatatable(required) — target tableaction(optional) —insert,update, orupsert; defaults toinsertfile_path(required) — absolute path to the CSV file on the host
{
"operation": "csv_file_load",
"action": "insert",
"database": "dev",
"table": "breed",
"file_path": "/home/user/imports/breeds.csv"
}
CSV URL Load
Ingests CSV data from a URL.
operation(required) —csv_url_loaddatabase(optional) — target database; defaults todatatable(required) — target tableaction(optional) —insert,update, orupsert; defaults toinsertcsv_url(required) — URL pointing to the CSV file
{
"operation": "csv_url_load",
"action": "insert",
"database": "dev",
"table": "breed",
"csv_url": "https://s3.amazonaws.com/mydata/breeds.csv"
}
Import from S3
Imports CSV or JSON files from an AWS S3 bucket.
operation(required) —import_from_s3database(optional) — target database; defaults todatatable(required) — target tableaction(optional) —insert,update, orupsert; defaults toinserts3(required) — S3 connection details:aws_access_key_idaws_secret_access_keybucketkey— filename including extension (.csvor.json)region
{
"operation": "import_from_s3",
"action": "insert",
"database": "dev",
"table": "dog",
"s3": {
"aws_access_key_id": "YOUR_KEY",
"aws_secret_access_key": "YOUR_SECRET_KEY",
"bucket": "BUCKET_NAME",
"key": "dogs.json",
"region": "us-east-1"
}
}
Export Local
Exports table data to a local file in JSON or CSV format.
operation(required) —export_localformat(required) —jsonorcsvpath(required) — local directory path where the export file will be writtensearch_operation(required) — query to select records:search_by_hash,search_by_value,search_by_conditions, orsql
Changed in: v4.3.0 — search_by_conditions added as a supported search operation for exports
filename(optional) — filename without extension; auto-generated from epoch timestamp if omitted
{
"operation": "export_local",
"format": "json",
"path": "/data/exports/",
"search_operation": {
"operation": "sql",
"sql": "SELECT * FROM dev.breed"
}
}
Export to S3
Exports table data to an AWS S3 bucket in JSON or CSV format.
Changed in: v4.3.0 — search_by_conditions added as a supported search operation
operation(required) —export_to_s3format(required) —jsonorcsvs3(required) — S3 connection details (same fields as Import from S3, pluskeyfor the output object name)search_operation(required) —search_by_hash,search_by_value,search_by_conditions, orsql
{
"operation": "export_to_s3",
"format": "json",
"s3": {
"aws_access_key_id": "YOUR_KEY",
"aws_secret_access_key": "YOUR_SECRET_KEY",
"bucket": "BUCKET_NAME",
"key": "exports/dogs.json",
"region": "us-east-1"
},
"search_operation": {
"operation": "sql",
"sql": "SELECT * FROM dev.dog"
}
}
Delete Records Before
Deletes records older than a given timestamp from a table. Operates only on the local node — clustered replicas retain their data.
Restricted to super_user roles.
operation(required) —delete_records_beforeschema(required) — database nametable(required) — table namedate(required) — records with__createdtime__before this timestamp are deleted. Format:YYYY-MM-DDThh:mm:ss.sZ
{
"operation": "delete_records_before",
"date": "2024-01-01T00:00:00.000Z",
"schema": "dev",
"table": "breed"
}
Managing Jobs
Get Job
Returns status, metrics, and messages for a specific job by ID.
operation(required) —get_jobid(required) — job ID
{
"operation": "get_job",
"id": "4a982782-929a-4507-8794-26dae1132def"
}
Response:
[
{
"__createdtime__": 1611615798782,
"__updatedtime__": 1611615801207,
"created_datetime": 1611615798774,
"end_datetime": 1611615801206,
"id": "4a982782-929a-4507-8794-26dae1132def",
"job_body": null,
"message": "successfully loaded 350 of 350 records",
"start_datetime": 1611615798805,
"status": "COMPLETE",
"type": "csv_url_load",
"user": "HDB_ADMIN",
"start_datetime_converted": "2021-01-25T23:03:18.805Z",
"end_datetime_converted": "2021-01-25T23:03:21.206Z"
}
]
Search Jobs by Start Date
Returns all jobs started within a time window.
Restricted to super_user roles.
operation(required) —search_jobs_by_start_datefrom_date(required) — start of the search window (ISO 8601 format)to_date(required) — end of the search window (ISO 8601 format)
{
"operation": "search_jobs_by_start_date",
"from_date": "2024-01-01T00:00:00.000+0000",
"to_date": "2024-01-02T00:00:00.000+0000"
}
Related Documentation
- Data Loader — Component-based data loading as part of deployment
- Operations API — Sending operations to Harper
- Transaction Logging — Recording a history of changes made to tables