Testing in big data area has typical challenges,
A big factor is quality of data ingested.
The analysis results has a heavy dependency on the ‘quality’ of data ingested (obviously),
What often happens is inconsistency across the data from down-stream sources, missed records, missing data within records etc and other data quality issues.
Unless these issues are flagged at the lower levels, problems creep up and start reflecting in the analytics results.
While having automated data checks on massive data stores might not be an easy job, it certainly is worth implementing.