One of the biggest challenge in #BigData<\/strong> projects is ensuring the ‘quality’ of data<\/p>\n Debugging ‘anomalies’ across #DataPipelines<\/strong> can be a nightmare. Certain teams end up spending more time debugging than actually coding new features.<\/p>\n That’s because usually there are no automated data quality checks in place to catch issues, and they have to trace back anomalies across huge amounts of data sifting through complex ETL processes.<\/p>\n With other development projects, the behavior is more predictive because inputs into the system are homogeneous.<\/p>\n For #BigData<\/strong> projects, there is no guarantee on the data we ingest & the how it get’s processed will always be accurate – the input is NOT homogeneous.<\/p>\n The solution to this: have #automated<\/strong> data quality checks running in #production<\/strong> across the data pipeline<\/p>\n #RedefiningSoftwareQuality<\/strong> #BigData<\/strong> #Testing<\/strong> #Automation<\/strong><\/p>\n<\/div>