Skip to main content
Version: v4

Jobs

Harper uses an asynchronous job system for long-running data operations. When a bulk operation is initiated — such as loading a large CSV file or exporting millions of records — Harper starts a background job and immediately returns a job ID. Use the job ID to check progress and status.

Job status values:

  • IN_PROGRESS — the job is currently running
  • COMPLETE — the job finished successfully

Bulk Operations

The following operations create jobs. All bulk operations are sent to the Operations API.

CSV Data Load

Ingests CSV data provided directly in the request body.

  • operation (required)csv_data_load
  • database (optional) — target database; defaults to data
  • table (required) — target table
  • action (optional)insert, update, or upsert; defaults to insert
  • data (required) — CSV content as a string
{
"operation": "csv_data_load",
"database": "dev",
"action": "insert",
"table": "breed",
"data": "id,name,country\n1,Labrador,Canada\n2,Poodle,France\n"
}

Response:

{
"message": "Starting job with id 2fe25039-566e-4670-8bb3-2db3d4e07e69",
"job_id": "2fe25039-566e-4670-8bb3-2db3d4e07e69"
}

CSV File Load

Ingests CSV data from a file on the server's local filesystem.

The CSV file must reside on the same machine running Harper.

  • operation (required)csv_file_load
  • database (optional) — target database; defaults to data
  • table (required) — target table
  • action (optional)insert, update, or upsert; defaults to insert
  • file_path (required) — absolute path to the CSV file on the host
{
"operation": "csv_file_load",
"action": "insert",
"database": "dev",
"table": "breed",
"file_path": "/home/user/imports/breeds.csv"
}

CSV URL Load

Ingests CSV data from a URL.

  • operation (required)csv_url_load
  • database (optional) — target database; defaults to data
  • table (required) — target table
  • action (optional)insert, update, or upsert; defaults to insert
  • csv_url (required) — URL pointing to the CSV file
{
"operation": "csv_url_load",
"action": "insert",
"database": "dev",
"table": "breed",
"csv_url": "https://s3.amazonaws.com/mydata/breeds.csv"
}

Import from S3

Imports CSV or JSON files from an AWS S3 bucket.

  • operation (required)import_from_s3
  • database (optional) — target database; defaults to data
  • table (required) — target table
  • action (optional)insert, update, or upsert; defaults to insert
  • s3 (required) — S3 connection details:
    • aws_access_key_id
    • aws_secret_access_key
    • bucket
    • key — filename including extension (.csv or .json)
    • region
{
"operation": "import_from_s3",
"action": "insert",
"database": "dev",
"table": "dog",
"s3": {
"aws_access_key_id": "YOUR_KEY",
"aws_secret_access_key": "YOUR_SECRET_KEY",
"bucket": "BUCKET_NAME",
"key": "dogs.json",
"region": "us-east-1"
}
}

Export Local

Exports table data to a local file in JSON or CSV format.

  • operation (required)export_local
  • format (required)json or csv
  • path (required) — local directory path where the export file will be written
  • search_operation (required) — query to select records: search_by_hash, search_by_value, search_by_conditions, or sql

Changed in: v4.3.0 — search_by_conditions added as a supported search operation for exports

  • filename (optional) — filename without extension; auto-generated from epoch timestamp if omitted
{
"operation": "export_local",
"format": "json",
"path": "/data/exports/",
"search_operation": {
"operation": "sql",
"sql": "SELECT * FROM dev.breed"
}
}

Export to S3

Exports table data to an AWS S3 bucket in JSON or CSV format.

Changed in: v4.3.0 — search_by_conditions added as a supported search operation

  • operation (required)export_to_s3
  • format (required)json or csv
  • s3 (required) — S3 connection details (same fields as Import from S3, plus key for the output object name)
  • search_operation (required)search_by_hash, search_by_value, search_by_conditions, or sql
{
"operation": "export_to_s3",
"format": "json",
"s3": {
"aws_access_key_id": "YOUR_KEY",
"aws_secret_access_key": "YOUR_SECRET_KEY",
"bucket": "BUCKET_NAME",
"key": "exports/dogs.json",
"region": "us-east-1"
},
"search_operation": {
"operation": "sql",
"sql": "SELECT * FROM dev.dog"
}
}

Delete Records Before

Deletes records older than a given timestamp from a table. Operates only on the local node — clustered replicas retain their data.

Restricted to super_user roles.

  • operation (required)delete_records_before
  • schema (required) — database name
  • table (required) — table name
  • date (required) — records with __createdtime__ before this timestamp are deleted. Format: YYYY-MM-DDThh:mm:ss.sZ
{
"operation": "delete_records_before",
"date": "2024-01-01T00:00:00.000Z",
"schema": "dev",
"table": "breed"
}

Managing Jobs

Get Job

Returns status, metrics, and messages for a specific job by ID.

  • operation (required)get_job
  • id (required) — job ID
{
"operation": "get_job",
"id": "4a982782-929a-4507-8794-26dae1132def"
}

Response:

[
{
"__createdtime__": 1611615798782,
"__updatedtime__": 1611615801207,
"created_datetime": 1611615798774,
"end_datetime": 1611615801206,
"id": "4a982782-929a-4507-8794-26dae1132def",
"job_body": null,
"message": "successfully loaded 350 of 350 records",
"start_datetime": 1611615798805,
"status": "COMPLETE",
"type": "csv_url_load",
"user": "HDB_ADMIN",
"start_datetime_converted": "2021-01-25T23:03:18.805Z",
"end_datetime_converted": "2021-01-25T23:03:21.206Z"
}
]

Search Jobs by Start Date

Returns all jobs started within a time window.

Restricted to super_user roles.

  • operation (required)search_jobs_by_start_date
  • from_date (required) — start of the search window (ISO 8601 format)
  • to_date (required) — end of the search window (ISO 8601 format)
{
"operation": "search_jobs_by_start_date",
"from_date": "2024-01-01T00:00:00.000+0000",
"to_date": "2024-01-02T00:00:00.000+0000"
}