Debugging the Sherlock Style

Debugging is something generally people don’t like. Particularly problems spanning over days and sucking out all brain power. In a previous article I discussed how to avoid problems from emerging in your UI automation framework. Today l will go over some tips on making debugging easier for your automation.

The Detective, the investigator and the ‘crime scene(s)’

Visiting a crime scene and trying to figure out what happened is what a detective does. Our automation debugging also entails reflecting on the ‘crime scene’ trying to figure out what went wrong. All investigation work would have the same steps, gather data, plot the variables and evidence and deduce what might have happened, except off course I’m pretty sure our automation scripts would not try to cover their tracks which makes our work infinitely easy!

Picking up minute details which others might miss out and an in-depth understanding of how the world works helping with deductive analysis; These are perhaps the two most portrayed characteristics of our famous fictional character Sherlock Holmes. While automation failures ‘investigation’ might not be as difficult, but might need the same basics from a crime scene investigation.

Debugging an issue would require

1) Gather data on what was going on in the program when the error occurred,

2) Deduce what might have caused it

3) Replay the scene of crime, or as we call it ‘reproduce the problem’

4) Fix the issue

5) Test the fix

Sherlock might have to battle his way through a lot of bad guys, we are fortunate not have such occupational hazards and therefore I will be skipping any reference to gun battles here.

Gather data – the automation ‘black box’

The ‘black box’ simply put is a recording device including flight data recorder and the cockpit voice recorder on each aircraft. The most important piece of equipment in an air crash investigation is the black box, which is ‘data’. Sherlock too on every crime scene manages to pick at least one-minute detail which others might not have thought of, my point – Data is the most important aspect of an investigation.

Report logs are generally used for just mentioning a passing or failing of a script, at max what step failed. Turn your automation logs / test report into a black box. Record all debug data in there, the way a safety critical device keeps logging all vital data at intervals and events.

I stress on adding a lot of debug data within the logs – control flow details, specific data along important events like page load etc, important data being written / read and so on. The don’t would include taking too many images, cluttering the log for other users and recording data which is not important.

Deduction – Fitting in the variables

This part comes from depth of the system’s understanding, and exposure to different types of issues fixed. This means it might take experience to become quick at debugging problems, however you can speed up the process. Certainly, this is the area where our detective and our automation engineer spend most of their time, putting in the pieces together.

Knowing the control flow of the system is key. With that knowledge one can simply trace the main events and make sure they are coming one after another. Powered with your ‘black box’ like recording data around main events and knowledge (or detailed debug info) of what the control flow should be, isolating the problem becomes a piece of cake.

In most cases folks don’t have good recording mechanism (or don’t want to read data available) therefore they don’t start with data and skip this step altogether. Instead the move to the next step, re-running the script in debug mode. Taking a wild guess and placing a break point somewhere, then go back to multi-tasking doing something else during the debug script is running. On the break point look at a few variables and if it does not fit the hypothesis, another break point and re-run it. This is the WORST POSSIBLE way to debug IMHO.

I read in Brian Tracy’s book –

Some people think, some people think they think and majority would rather die than think.

Let’s not wish death of our project and be allergic to using the brain muscle. The more you exercise it, the stronger it gets.

Replaying the crime scene – use stubs

Fortunately, we can re-create our crime scene in real time instead of relying on imagination, this is the step we call ‘reproduce the issue’. For automation, at times scripts can have a longer run time, with the ‘winging debug style’ of randomly choosing debug points these scripts would take a while.

Firstly, you should have an educated guess from data, hopefully the majority issues would not require more than 2 debug tries. If it’s taking more than that, don’t run the whole thing, use stubs to bypass functionality which you are not interested in testing, and get straight / near to the problem area.

This is going to be very subjective, generally trying to re-create data you need quickly and skipping execution steps which have no effect are the two common things I do. In rare cases I might add a few new lines of test code to bypass a larger execution. For instance, a script with 15 test steps is having a suspected problem on step 12. Try to start your debug run form step 10 or 11. Create data you need before you run, and try preventing it from being used / deleted. Or create copies of that pre-req data and use a new copy on every debug try.

Fixing – There is nothing called ‘isolated change’

I dislike the term isolated change. I’ve seen this used when a nasty patch is being added to push in at the last minute through the release process, and guess what 90% of the time the isolated change comes back to haunt everyone.

The reason is – nothing is isolated, especially in today’s software. Each line of code is having an impact somewhere else, even if nowhere it is adding up to the execution time for automation. Try to find out what areas this change will affect and what impact your change will have on them as well. This should be very easy if you have, what I call, a ‘layered architecture’ making the control flow very easy and predictable. You can also search through your project for function calls for the method getting changed.

With automation execution time is something you must watch as well. Recently we were discussing to add a check in a common routine taking approximately 200msec per call in my team. After calculating turned out that single change would add another THREE HOURS to the batch run time!

Lastly and most importantly, the solution should be scalable. Mostly changes are done keeping in mind the current problem. We all know of one problem dev or automation has fixed multiple times and still the fix does not sit well with everyone. Writing code which can pass the test of time is a real skill to learn. Also make sure the change is maintainable, reusable and robust (Pillars of framework design).

Testing – Do it in a controlled environment

Once the change has been identified and fixed, it’s time to test if it works as expected. For complex changes, if you have done your homework you know what areas this change will affect and can test them, that’s a no brainer.

The mistake sometimes is testing the change on your main project workspace. For complex changes I always check out a separate project and work on the fixing and testing there. Once tested, I copy over the change to my main working copy and then push the change.

The reason for isolation here is to avoid any potential mishaps caused by unintended changes and mixing up multiple fixes at the same time. Also, it’s easier to trace back the changes and isolate variables that have changed.

Lastly, it’s a great idea to run the fixed script in a batch run. Sometimes (especially delays) can work fine in single runs and expose problems in batch runs which makes this step crucial.