Deprecated: Function create_function() is deprecated in /home/qualit96/public_html/wp-content/plugins/revslider/includes/framework/functions-wordpress.class.php on line 258

Warning: Cannot modify header information - headers already sent by (output started at /home/qualit96/public_html/wp-content/plugins/revslider/includes/framework/functions-wordpress.class.php:258) in /home/qualit96/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1372

Warning: Cannot modify header information - headers already sent by (output started at /home/qualit96/public_html/wp-content/plugins/revslider/includes/framework/functions-wordpress.class.php:258) in /home/qualit96/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1372

Warning: Cannot modify header information - headers already sent by (output started at /home/qualit96/public_html/wp-content/plugins/revslider/includes/framework/functions-wordpress.class.php:258) in /home/qualit96/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1372

Warning: Cannot modify header information - headers already sent by (output started at /home/qualit96/public_html/wp-content/plugins/revslider/includes/framework/functions-wordpress.class.php:258) in /home/qualit96/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1372

Warning: Cannot modify header information - headers already sent by (output started at /home/qualit96/public_html/wp-content/plugins/revslider/includes/framework/functions-wordpress.class.php:258) in /home/qualit96/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1372

Warning: Cannot modify header information - headers already sent by (output started at /home/qualit96/public_html/wp-content/plugins/revslider/includes/framework/functions-wordpress.class.php:258) in /home/qualit96/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1372

Warning: Cannot modify header information - headers already sent by (output started at /home/qualit96/public_html/wp-content/plugins/revslider/includes/framework/functions-wordpress.class.php:258) in /home/qualit96/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1372

Warning: Cannot modify header information - headers already sent by (output started at /home/qualit96/public_html/wp-content/plugins/revslider/includes/framework/functions-wordpress.class.php:258) in /home/qualit96/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1372
{"id":15084,"date":"2020-02-15T17:12:50","date_gmt":"2020-02-15T12:12:50","guid":{"rendered":"http:\/\/quality-spectrum.com\/?p=15084"},"modified":"2020-02-16T20:27:30","modified_gmt":"2020-02-16T15:27:30","slug":"automation-guild-2020-big-data-101-importance-of-automation","status":"publish","type":"post","link":"https:\/\/quality-spectrum.com\/automation-guild-2020-big-data-101-importance-of-automation\/","title":{"rendered":"Automation Guild 2020 \u2013 Big data 101 & Importance of Automation"},"content":{"rendered":"

Among all the online conferences, Automation Guild is the best automation conference I happily attend, this year was a pleasure to speak at it again. If you are in automation, I think this is a must attend conference. In this post I\u2019ll give a brief overview of my talk at the conference.<\/p>\n

The subject of big data is exciting, but I\u2019ve felt there is a general lack of testing maturity in the space. I guess since the industry itself is comparatively new and is evolving. The walk was to share some basics about big data and how testing & automation works in this field.<\/p>\n<\/p>\n

About big data<\/h2>\n<\/p>\n

The evolution into big data has been fueled by technologies which have made processing lots of data at high speeds easy, and most importantly the ability to react to the insights very quickly. We discussed all these factors quickly, summarized in this image:<\/p>\n<\/div><\/span>

<\/h2>\n

What are big data projects all about<\/h2>\n<\/p>\n

The objective of big data projects is to gather insights \/ analytics to understand and solve problems. For that to happen, data from few or many sources may be needed to run analytics on. Now acquiring the data is usually not a big problem, to get it into a structure where it all makes sense collectively \u2013 is the challenge.<\/span><\/p>\n<\/p>\n

That\u2019s where the concept of a data pipeline comes in. The data is passed through different stages of \u2018transformation\u2019 \/ ETL (Extract, transform, load) to make it more usable for our problem at hand.<\/p>\n<\/div><\/span>

<\/h2>\n

Testing in Big data<\/h2>\n<\/p>\n

Like Web applications have some standard tests that happen, similarly in big data there are some tests which are common. However, they are nothing like the ones we do for web applications.<\/span><\/p>\n<\/p>\n

In data projects, all we are dealing with is \u2018data\u2019, data in and data out. The challenge is transforming the data as expected and building models which actually solve our problems. Therefore, most testing in this industry revolves around \u2018Data Quality\u2019.<\/p>\n<\/div><\/span>

Within the three stages of the data pipeline, there can be many ETL activities happening within. For each ETL, deciding what type of data quality checks are needed is important. In the talk we walked through a basic process of how to determine that.<\/p>\n<\/p>\n

Automating tests<\/h2>\n<\/p>\n

Because of the kind of tests we have in the big data space, automation also works quite differently. It\u2019s more about fetching sets of data and checking if the right logic \/ business rules was applied. To perform these activities, some data platforms provide the capability of doing that easily, if not the technologies used to build these ETL flows are also then used to test them.<\/p>\n

Talking about languages, Python is used widely because of it\u2019s data processing capability. These scripts are used within workflows to do the required validations. The most common validation is checking of all data has been copied from point A to B. Sometimes while moving data from one space to another, files or records get missed, maybe they get truncated or other reasons. This is just one of the 6 quality dimensions. <\/p>\n<\/p>\n

Data quality across the pipeline<\/h2>\n<\/p>\n

In the talk we walked through a sample pipeline and explained the kind of tests that can be done and how these tests would be executed. The image below summarizes all the checks discussed. The data pipeline was also expanded to show activities happening within the three stages and how they are tested.<\/p>\n<\/div><\/span>

<\/p>\n

If you were not part of the automation guild, you can still get access to it since all the talks and Q&A sessions are recorded. This talk would serve well for those willing to get into and expand within the big data field from a testing perspective.<\/p>\n<\/div>

<\/div><\/div><\/div><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":3,"featured_media":15089,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"yoast_head":"\nAutomation Guild 2020 \u2013 Big data 101 & Importance of Automation - Quality Spectrum<\/title>\n<meta name=\"description\" content=\"An introduction to big data, data pipelines, data quality dimensions testing & automation in the big data space\" \/>\n<meta name=\"robots\" content=\"index, follow\" \/>\n<meta name=\"googlebot\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<meta name=\"bingbot\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"http:\/\/quality-spectrum.com\/automation-guild-2020-big-data-101-importance-of-automation\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Automation Guild 2020 \u2013 Big data 101 & Importance of Automation - Quality Spectrum\" \/>\n<meta property=\"og:description\" content=\"An introduction to big data, data pipelines, data quality dimensions testing & automation in the big data space\" \/>\n<meta property=\"og:url\" content=\"http:\/\/quality-spectrum.com\/automation-guild-2020-big-data-101-importance-of-automation\/\" \/>\n<meta property=\"og:site_name\" content=\"Quality Spectrum\" \/>\n<meta property=\"article:published_time\" content=\"2020-02-15T12:12:50+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2020-02-16T15:27:30+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/quality-spectrum.com\/wp-content\/uploads\/2020\/02\/Pic-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1366\" \/>\n\t<meta property=\"og:image:height\" content=\"728\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@aali_khalid\" \/>\n<meta name=\"twitter:site\" content=\"@aali_khalid\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Organization\",\"@id\":\"https:\/\/quality-spectrum.com\/#organization\",\"name\":\"Quality Spectrum\",\"url\":\"https:\/\/quality-spectrum.com\/\",\"sameAs\":[\"https:\/\/www.linkedin.com\/in\/alikhalid\/\",\"https:\/\/www.youtube.com\/c\/QualitySpectrum\",\"https:\/\/twitter.com\/aali_khalid\"],\"logo\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/quality-spectrum.com\/#logo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/quality-spectrum.com\/wp-content\/uploads\/2019\/11\/QS-logo-mobile-e1574510459832.png\",\"width\":40,\"height\":40,\"caption\":\"Quality Spectrum\"},\"image\":{\"@id\":\"https:\/\/quality-spectrum.com\/#logo\"}},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/quality-spectrum.com\/#website\",\"url\":\"https:\/\/quality-spectrum.com\/\",\"name\":\"Quality Spectrum\",\"description\":\"Redefining software quality\",\"publisher\":{\"@id\":\"https:\/\/quality-spectrum.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"https:\/\/quality-spectrum.com\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"ImageObject\",\"@id\":\"http:\/\/quality-spectrum.com\/automation-guild-2020-big-data-101-importance-of-automation\/#primaryimage\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/quality-spectrum.com\/wp-content\/uploads\/2020\/02\/Pic-1.png\",\"width\":1366,\"height\":728},{\"@type\":\"WebPage\",\"@id\":\"http:\/\/quality-spectrum.com\/automation-guild-2020-big-data-101-importance-of-automation\/#webpage\",\"url\":\"http:\/\/quality-spectrum.com\/automation-guild-2020-big-data-101-importance-of-automation\/\",\"name\":\"Automation Guild 2020 \\u2013 Big data 101 & Importance of Automation - Quality Spectrum\",\"isPartOf\":{\"@id\":\"https:\/\/quality-spectrum.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"http:\/\/quality-spectrum.com\/automation-guild-2020-big-data-101-importance-of-automation\/#primaryimage\"},\"datePublished\":\"2020-02-15T12:12:50+00:00\",\"dateModified\":\"2020-02-16T15:27:30+00:00\",\"description\":\"An introduction to big data, data pipelines, data quality dimensions testing & automation in the big data space\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"http:\/\/quality-spectrum.com\/automation-guild-2020-big-data-101-importance-of-automation\/\"]}]},{\"@type\":\"Article\",\"@id\":\"http:\/\/quality-spectrum.com\/automation-guild-2020-big-data-101-importance-of-automation\/#article\",\"isPartOf\":{\"@id\":\"http:\/\/quality-spectrum.com\/automation-guild-2020-big-data-101-importance-of-automation\/#webpage\"},\"author\":{\"@id\":\"https:\/\/quality-spectrum.com\/#\/schema\/person\/4805a00d7139e111ea9430e17cc8f28c\"},\"headline\":\"Automation Guild 2020 \\u2013 Big data 101 & Importance of Automation\",\"datePublished\":\"2020-02-15T12:12:50+00:00\",\"dateModified\":\"2020-02-16T15:27:30+00:00\",\"mainEntityOfPage\":{\"@id\":\"http:\/\/quality-spectrum.com\/automation-guild-2020-big-data-101-importance-of-automation\/#webpage\"},\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/quality-spectrum.com\/#organization\"},\"image\":{\"@id\":\"http:\/\/quality-spectrum.com\/automation-guild-2020-big-data-101-importance-of-automation\/#primaryimage\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"http:\/\/quality-spectrum.com\/automation-guild-2020-big-data-101-importance-of-automation\/#respond\"]}]},{\"@type\":[\"Person\"],\"@id\":\"https:\/\/quality-spectrum.com\/#\/schema\/person\/4805a00d7139e111ea9430e17cc8f28c\",\"name\":\"Ali Khalid\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/quality-spectrum.com\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/70cbf539f218f275a77959dd2e56bddb?s=96&d=mm&r=g\",\"caption\":\"Ali Khalid\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"https:\/\/quality-spectrum.com\/wp-json\/wp\/v2\/posts\/15084"}],"collection":[{"href":"https:\/\/quality-spectrum.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/quality-spectrum.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/quality-spectrum.com\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/quality-spectrum.com\/wp-json\/wp\/v2\/comments?post=15084"}],"version-history":[{"count":6,"href":"https:\/\/quality-spectrum.com\/wp-json\/wp\/v2\/posts\/15084\/revisions"}],"predecessor-version":[{"id":15095,"href":"https:\/\/quality-spectrum.com\/wp-json\/wp\/v2\/posts\/15084\/revisions\/15095"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/quality-spectrum.com\/wp-json\/wp\/v2\/media\/15089"}],"wp:attachment":[{"href":"https:\/\/quality-spectrum.com\/wp-json\/wp\/v2\/media?parent=15084"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/quality-spectrum.com\/wp-json\/wp\/v2\/categories?post=15084"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/quality-spectrum.com\/wp-json\/wp\/v2\/tags?post=15084"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}