Data quality is one of the biggest problems with data science projects,

I’ll be talking about these at the #AutomationGuild, here’s a quick list:

– Accuracy. Is the data accurate in the context to be used

– Validity. Is the data fresh enough, still valid?

– Consistency. Data from different sources / time frames matches

– Completeness. No parts of data are truncated / missing

– Uniqueness. Enough data to uniquely identify records

– Timeliness. Data being collected at the right time & processed in a timely fashion (efficient enough)

More on the conference here:

http://amp.gs/Dydu

#QsDaily #BigData #DataScience #Testing