We use SymmetricDs to transfer data from Azure to Redshift. For a number of unglamorous reasons, we have a scheduled full reload 3 times a day to bring our Redshift db into line with Azure. One issue we have is knowing when the full reload has finished (and hence when to merge staging into the main tables).
To solve this I wrote a state machine in F# which keeps track of the state of SymmetricDs. The main states are:
- Nothing happening
All of which have fairly natural and obvious interpretations. The main transition that we care about is from running to finished. Note that the speed of the process means that a run that didn’t have this path isn’t possible. With a faster process we’d need to be carefully about Started->Finished or even the process running in between nothing happening polls and not being picked up at all.
The nice thing about symmetricDS is that it exposes its internals nicely in the database. To get the running info that’s needed, we just count distinct load_ids from the outgoing batches table filtered onto reload channels. If you don’t filter onto the reload channel then you risk picking up noise from the heartbeat channel.
SymmetricDs initial reloads definitely aren’t my favourite ETL setup, but being able to programmatically link them to other steos in this manner makes them a lot more friendly.