I used to run a service solo that processed data from multiple external sources and as you said you need to program defensively when dealing with input.
I handled it with a pipeline that did the following, 1. validation, 2. transform data if needed, 3. load the data. If validation failed, the data would get "quarantined" and I would get an email notification, or slack notification if urgent.
I don't generally write a lot of unit tests, but all my validation and transformation logic would have 100% coverage because things always change and you need to make sure future updates never break the system
I handled it with a pipeline that did the following, 1. validation, 2. transform data if needed, 3. load the data. If validation failed, the data would get "quarantined" and I would get an email notification, or slack notification if urgent.
I don't generally write a lot of unit tests, but all my validation and transformation logic would have 100% coverage because things always change and you need to make sure future updates never break the system