Jobs-API
The Jobs API is a tool for logging the outcome of ETL scripting jobs. It is designed to support the task of "incremental" data loads—i.e., loading only new or updated records from a source database to a destination database.
The typical ETL workflow using the Job API is:
At the beginning of your ETL job, query the Job API to retrieve the timestamp of the last time your job ran successfully.
Use this timestamp as a temporal filter to extract new/modifed records from the source database.
At the completion of your ETL job, register the successful outcome with the Job API.
Querying the API
The postgREST-provided jobs API is available at , and it can be queried using, as described in the . VPN access is required. You can also query the api.jobs table directly from our Postgres read replica.
Field Definitions:
We've written a Python utility to interact with the jobs interface. It's included in our pacakge. tdutils requires Python 3 and can be installed with $ pip install tdutils.
Begin by importing the jobutil library and creating a new job instance:
Assuming you've run this job successfully in the past, you can use the most_recent method to return a unix timestamp from the last successful job run:
Register your new job with job.start(). The API returns a JSON representation of newly created job record:
We have successfuly registered a new job. Note that the status was automatically set to in_progress. If we were to call job.most_recent() again, the results are the same as before, because the status of current job is not success:
From here, continue your ETL process as needed. When your ETL is complete, use job.result() to update your job status accordingly. Again, a JSON representation of the job record is returned:
Note that the end_date was populated automatically. Because the job was successful, calling job.most_recent() now returns the timestamp of our current job:
You should also register job failures. The errror result is availabe for such purposes: