Data quality quick list
Data quality is one of the biggest problems with data science projects,
I’ll be talking about these at the #AutomationGuild, here’s a quick list:
– Accuracy. Is the data accurate in the context to be used
– Validity. Is the data fresh enough, still valid?
– Consistency. Data from different sources / time frames matches
– Completeness. No parts of data are truncated / missing
– Uniqueness. Enough data to uniquely identify records
– Timeliness. Data being collected at the right time & processed in a timely fashion (efficient enough)
More on the conference here:
http://amp.gs/Dydu
#QsDaily #BigData #DataScience #Testing