The subject of Big data fascinates people and businesses. While this might be just a buzz word, big data is immensely helpful to unearth important information and in the 21st century, information is power. <\/p>\n
In this post, I\u2019ll summarize my talk at the OnlineTestConf2019 titled \u201cWhy we call it big data and how to test it\u201d. The talk was also recorded and can be watched on YouTube here<\/b><\/u><\/i><\/a>.<\/p>\n<\/p>\n I start with talking about little bit history of Big data and what factors fueled growth and innovation in this industry. When would a project classify as big data? Is it only the size of data? This slide explains the different ways we tried to classify it and the most common method used.<\/p>\n<\/div><\/span> Hadoop is the most widely used big data platform which is also open source. I talk about it\u2019s widely used MapReduce process and different products within like HDFS, HBase and HiveQL.<\/p>\n<\/div><\/span> All we are doing in a big data project is collect data from different sources, hash it up into meaningful big tables and generate insights from it. There are three main phases you might have in a big data project.<\/p>\n<\/div><\/span> At the end we quickly skim through the different type of tests we perform across the pipeline. At each stage, depending on the type of activities being performed, the type of tests will be different.<\/p>\n<\/div><\/span> Due to lack of time I couldn\u2019t go into details of the Quality Dimensions and sample tests of these dimensions across the pipeline. My talk at the AutomationGuild 2020<\/b><\/i><\/u><\/a> explains that in more detail.<\/p>\n<\/p>\n Katjya did a very good sketch summarizing the talk she mentioned in her tweet.<\/p>\n<\/div><\/span>An intro to big data<\/h2>\n
\nNext we put \u2018Big\u2019 into perspective to help in understand the sheer size of data and the challenge it poses to process it.<\/p>\n<\/div><\/span>Defining Big data<\/h2>\n
The Hadoop platform<\/h2>\n
The Data Pipeline<\/h2>\n
Testing stages<\/h2>\n
Summary<\/h2>\n