Reader

What is a good approach to sync remote API data to a database?

| Software Engineering Stack Exchange | Default

Problem statement

We have a remote API that can be queried to fetch some arbitrary objects (we're expecting around 500 thousand records). Our application needs fast access to this data (faster than the API can provide) so we're using a relational database as a read-only cache.

The question is how best to keep the DB in sync with the remote API such that the data in the DB is no older than a few minutes. Our application is an authorization engine that reads attributes from the DB to make Permit/Deny decisions. When an attribute changes at the remote API our auth engine must know about it within minutes to prevent unauthorized access.

Possible approaches

  1. Write a program that periodically fetches data from the API, compares it to the data in the DB, and does some clever diffing to update what needs updating.
    1. I'm worried about the complexity of implementing and testing this.
    2. It seems inefficient to compare every single record on both sides.
  2. Write a program that fetches data from the API and simply replaces everything in the DB without doing a diff.
    1. To avoid discrepancies this replace will have to be an atomic transaction, but I worry about how long the transaction will take (and block queries to the DB). Especially if we have to run it frequently.
  3. Use some already-built API-to-DB syncing tool. Do you know of any?