I recently made the good old ‘one liner change’ in our automation framework having a 22-hour batch run with one permutation only. We discovered a week before regression my ‘small change’ had caused a straight 25% false failures. When my awesome team suggested to revert the change, I realized I did not recognize the demon at that time. This demon goes by many names, ‘one liner change’, ‘small change’, ‘localized change’ and so on (and will be referred as demon from here on in letter and spirit).
Cause of the trap
Have you ever sat with a friend driving a car and you feel a bit unsafe, like he / she does not have as much control over the car? And how is the driver feeling in that moment, pretty confident for the most part. I feel a similar case happens when one person on the team feels about the ‘small change’ in code. In the moment of making the change, it feels like we were just putting a bandage on a little bunny’s scratch. Till later on it turns out that was a lion we just missed fired a pump action at and now are running for our lives.
The demon effect
The demon is lethal mainly because it does not look like will create much of a problem. Often teams end up making such changes close to regression / release giving less time to react. Eventually the decision of releasing with bugs or delaying the release has to be taken which no one likes. I would equate this with “Putting good money after bad money” (adding more time to salvage the situation) hardly ever works except for gifting with more frustration (my published thesis on the subject here).
Small and ‘not so small’ change
While small changes or localized changes do exist, sometimes the demon is misunderstood for one. What defines a demon then? A formula I like to use is if the change is encapsulated within a small area of the application, a low-level module or one class affecting functionality of just that module or class. The mistake I have seen made is a functionality which is not encapsulated within a confined class / module but we feel it’s used only in one or few specific areas being confused with a localized change. Unless there is no way this change can be accessed or affect directly anything outside its scope, it’s not a localized change.
Quarantine the demon
The change must be done no doubt, but can be done in time of peace, not on the 11th hour a day before regression. Implement it in the early sprints of a release, so there is time to react and adjust before the deadline hits. Firstly, while implementing you hopefully will not be in a rush and can think through. Secondly there would be time to test issues around just that change and would be easier to identify the cause of bugs we see.
A general best practice is to make changes in increments rather than doing the whole big change at once. If you have automation running on UI or API level, that makes things easier. As you push the big change in increments, you get to know right away any outright effects it might have. And if you don’t have automation, I would say first think of doing that, otherwise you can focus manual testing as much possible towards testing the ripple.
Big decisions take time
Like in life big decisions take time not just to research on, but to think about and reflect upon. Similarly, I feel taking time for architectural changes to ‘sink in’ or internalize takes time. By doubling the resources working on it, it would not get done in half the time. For big changes, I usually start thinking about them ahead of time with my team, so when we do start doing it we have things in perspective and can make a better judgement.
Care to share what else would you would do to make an architectural change transition smoothly?