How to intelligently manage your technical debt with behavioral code analysis
What is technical debt? The best explanation of this term, coined by Ward Cunningham, was provided by Martin Fowler. We are unable to evolve IT systems perfectly. Every added or modified functionality introduces little deficiencies in software quality. These deficiencies accumulate over time and make it harder for us to sustain the pace at which we deliver our systems to production.
Consider a system with technical debt. Without any debt, it would take us 4 days to implement the given functionality, because of debt, it takes 6 days. This means that the additional 2 days is the interest we have to pay. If the time needed to fix the code and remove technical debt is 3 days, then the whole change will take us 7 days and it doesn’t pay off. But if we know that this is not the only change we have to introduce in the given software component, then spending the extra 3 days will quickly pay off. Also, be warned, that technical debt tends to accumulate. Future changes to a component with debt will only cause the debt to increase and the amount of time required to develop new features will rise from 6 to 10 or 15 days.
What are the reasons for technical debt?
Let’s consider Lehman’s laws on software evolution:
- Law of continuous change: “A system must be continually adapted or it becomes progressively less satisfactory”
- Law of increasing complexity: “As a system evolves, its complexity increases unless works are done to maintain or reduce it.”
From one side, we have to add new functions, modify existing ones, for the system to stay relevant and useful. From the other, our activities increase complexity and unless we take actions, this complexity will eventually make system evolution impossible due to missing deadlines and excessive development costs. As we all know, software development is a learning experience. This means that we have very little knowledge about the actual problem we try to solve and about the business domain we are working in at the start of the project. We learn along the way and assumptions, and architectural solutions we envisioned at the beginning of the project, quickly become out of date. Business requirements change, deadlines are looming. We often sacrifice quality for pushing code as fast as we can into production. Technologie choices are often biased and less than optimal. All of these and many other factors produce technical debt.
Managing technical debt
So how can we deal with technical debt? In my opinion, there are three key elements to tame the beast: system architecture, team, and process organization. As we know from Conway’s law (or rather from the consequences of it) – all of them must work together, otherwise, we are doomed for failure.
Architecture on each level, from the division of a system into components/modules to the internal structure of packages, classes should separate things that have different reasons to change or change at a different rate. As we will see in the course of reading this post, this is crucial. Mixing different responsibilities in the same component increases the probability of defects and further strengthens the trend of quality decrease.
Architecture must also be compatible with teams’ organizations to minimize process loss related to additional communication and coordination effort, and to eliminate costly code merges.
We are unable to create an “ideal” architecture and organization that will be adequate for the project for its whole lifecycle. This means that we must constantly monitor our architecture’s state, code quality, communication patterns, team’s organization to properly react and plan improvements in the areas where it makes the biggest gain.
Many managers ignore the need for such activity. They tend to blame developers only for quality deficiencies and to question their skills and involvement. That is not true. Accumulation of technical debt is a natural process and all project stakeholders are partially to blame.
Proper monitoring, the evolution of system architecture, and the team’s organization are especially important in projects using agile methodologies. In such projects, we strive to get our code to production as quickly as possible. Often teams forget that this should be performed without sacrifices to code quality. As we go to production, our code is now subject to maintenance, and as recent scientific research shows, 40 to 80% of IT project’s cost is spent on maintenance. Not acting on increased system complexity has a consequence for all project stakeholders. For management, it means missed deadlines, increased lead times, no predictability of the development process. For users, it means lots of bugs that make their daily work harder and frustrating, decreasing business efficiency. For the whole organization, additional cost, and loss in the competitive advantage.
There are many tools and techniques that most organizations are currently using to fight technical debt. Let’s enumerate the most popular ones and their deficiencies:
- Tests: unit tests, integration tests, e2e tests – they all help us find bugs before we go to production. They are also very important because, without it, large scale system development is nearly impossible. The ability to execute automated tests allows us to change the structure of our systems to make development and maintenance easier and reduce technical debt. Without tests, you are doomed to the high cost of manual regression tests of the whole system. But automated tests are code. This means that it can accumulate technical debt. Poorly designed and maintained tests will quickly become more of a burden than a help.
- Static code analysis: it can discover many violations of best practices and patterns in our codebases. It doesn’t take into account any context, therefore cannot tell us how severe in terms of costs for further system development given issues is. It can tell us how much technical debt we have, but not, which part of it is the actual risk for us.
- Complexity metrics: many of them are programming language dependent and costly to calculate, they also do not take into account context or social information, which as we will show later are crucial to understanding if a given piece of complex code is a problem for us or not.
- Code reviews: a costly manual process that involves our senior engineers, but is very important, as it promotes knowledge sharing and early problem detection. As seen in practice, reviews often focus on a given change, introduced by a developer, not on the state of code after the change.
Behavioral code analysis
All of these tools are useful but we need more information to properly manage technical debt. In our R&D team, we found a very interesting new approach developed by Adam Tornhill – behavioral code analysis. He described his concept in two books: “Your Code as a Crime Scene” and “Software Design X-Rays”. His approach broadens static analysis with two new dimensions: time and people. As a medical doctor cannot diagnose a patient successfully just based on his current state and symptoms, but needs historical and activity information (patient’s medical history, genetic risk, physical activity, nutrition habits, etc.), to diagnoses our system, we need information about code evolution in time and the ways developers interact with code. Luckily for us, we gather all the required information for years in our source control systems like git or svn.
Let’s review what insights we can get from our repositories using behavioral code analysis and how it helps us manage technical debt.
Hotspots analysis
The first step we need to take to properly deal with technical debt is finding places in code, where improvements will give us the greatest payoff.
In large projects, with hundreds of thousands of lines of code, developed in different programming languages, it is not an easy task using traditional tools. But hotspot analysis is here to help us. A hotspot is an element of our codebase (a file) which is at the same time complex and has been frequently changed in the analyzed period. As a measure of complexity, we will use the simplest possible one – number of lines of code. As a measure of change frequency, we will use the number of commits changing the given file. The picture below presents one of the possible visualizations of hotspots: the larger the circle, the more complex the given element is, more red circle means higher change frequency.
Can such a simple metric work? Research based on hundreds of projects, of different sizes, developed in different technologies using different development processes, all result in the same distribution of change frequencies.
Only a small fraction of code is changed frequently. Most of the code is stable and rarely changed. Statistics prove that stable unmodified code is much less error-prone than code recently changed. For example, the probability of a defect in code stable for a year is ⅓ less than in code recently changed. This means that stable, but complex code is not our concern (unless we know, that there is work planned that will require changes to such code). Yet other research confirms the correlation between change of frequency and decrease of quality. Based on statistics, a combination of the number of lines of code and change frequency proved to be better at defect rate prediction than any other combination of more complex metrics.
I can only confirm that we observed the same distribution in projects we develop and maintain here at Altkom Software & Consulting.
Hotspot analysis helps us eliminate more than 90% of code from our further investigations. But we need to dig deeper as being a hotspot doesn’t necessarily mean a problem. Of course, a complex component with high change frequency is most likely a problem, but changes might be a result of already started improvements.
As already said, in most cases hotspot means we have a problem, that at least needs some more investigation. A big, frequently changed file is probably accumulating many responsibilities and should be split into smaller, more cohesive components.
To decide whether a given hotspot is a problem or not, we need to dive into the details of it. The first heuristic that we should apply is very simple, we need to check the name of a class that was marked as a hotspot. Nondescriptive, general names, names with suffixes give us a clue that we should analyze the case carefully. Names like State, Helper, Service, Manager are usually a sign that a given class has more than one responsibility and probably should be divided into smaller more cohesive parts. Naming things in code is crucial to the code readability and ease of maintenance. Code is much more often read than modified and the working memory of our brains is limited therefore it’s much easier for humans to navigate the code if names are descriptive and capture the class or function purpose. The research has shown that we humans start building a mental model of code from names of classes and functions. Nondescriptive or too general names create an obstacle on our way to understand the code.
The next step is checking the complexity trends of our hotspot. This time we are going to calculate complexity more elaborately. We are going to count indents in code. This way we can easily detect code with complex structures, code that contains complex logical expressions, and multi-branch control structures. Such constructs are usually the source of many errors. For example, complex logical expressions are responsible for roughly 20% of all errors. Looking at the complexity trend we can ensure if developers have already started taking care of technical debt and complexity started to decrease. If complexity keeps increasing, especially if it increases quicker than the total number of lines, we have a situation we need to deal with.
We need to look inside the given class and find areas we need to take care of. We can use the hotspot analysis at the function level. This will tell which functions are the most complex and which are most frequently changed.
We take the same approach as with file-level hotspots. If we have a complex function that has not been changed for months, we do not need to perform any actions. But finding a complex function that is often modified is a sign that it needs refactoring. Maybe the function is too big and can be modularized. Maybe it contains some complex expressions that can be refactored into separate functions. Hotspot analysis can also show us which functions tend to change together. That might be a sign of copy and paste or a missing abstraction.
After inspection of suspected functions, we can plan our improvements.
Hotspot analysis helps us make informed decisions, based on data and scientific research and intelligently manage technical debt.
As we can see, we can sort of scale the hotspot analysis – we can apply it on the file/class level, we can go deeper and analyze functions inside a given file/class. But we can also apply it to the architectural level. We can search for hotspots among system architecture components. This will be extremely useful if you are building microservice-based systems or any kind of modular system. With the hotspot analysis, we can find modules (services) that are too big and change frequently. Again, in most cases, the reason behind such a situation is that code in a given module takes too many responsibilities.
We should analyze such modules and try to divide them into smaller and more cohesive elements, just like on the class level. As we mentioned at the beginning of this post, it is extremely important for system health, to separate code that can change for different reasons and with different frequencies. As we have seen on the change frequency distribution diagrams – the separation of stable code from volatile code decreases the area where the accumulation of technical debt is the most dangerous.
Temporal coupling
Another important concept that can be analyzed using our new tools is temporal coupling (aka. Change coupling). The temporal coupling means that two components X and Y change together. We can analyze how frequently that happens. With temporal coupling data, we can look for unexpected dependencies between system elements and for dependencies that grow too strong.
In some cases temporal coupling is natural and there is nothing wrong with it. For example, when we add a new function to a component, we usually add a test to its test module. This is an example of a temporal coupling that is desired. But if we observe that many changes to internal implementation details of a component also result in its test code changes, this means that test code probably depends on the implementation details of our module, which is a very bad thing.
If two components that should be independent of each other show signs of high temporal coupling, we need to investigate the case: find which classes and functions tend to change together. There may be a missing abstraction or a whole new separate component should be refactored. Maybe it is high time to revise our architectural assumptions and to draw new architectural boundaries between components to better support further system evolution.
Temporal coupling is also highly related to defects probability. A high level of temporal coupling usually results in a high defect rate. The research shows that temporal coupling is better for defects prediction than any of the complexity metrics.
Knowledge of temporal coupling helps us plan refactorings and also helps us plan tests. Knowing that changes to functionality X cause changes in code related to Y, we can plan regressions test of both modules in case any of it changes.
The best place to start temporal coupling analysis is modules with the highest change frequency.
Temporal coupling analysis, as the hotspot analysis, can be used to go from the modules level to class level, and then to function level.
Temporal coupling extends our analysis from individual components to relations between them and helps us find code that, if left unattended, can lead to unexpected and hard to diagnose bugs. We all know these situations when someone adds functionality to module A, module A is tested and works perfectly, we go to production to find out that modules B and C unexpectedly stop working.
Hotspots, Temporal Coupling, Code Ownership, and Communication Patterns
We all remember Brooks’s law from “The Mythical Man-Month”: “adding manpower to a late software project makes it later”. This happens because of the additional cost of communication and coordination that is higher than the value of work created by new team members. Another thing is that system architecture limits the number of developers that can effectively work on the project. Too many developers or developers not properly assigned to architectural components will result in concurrent work on the same piece of code, performed by different developers and teams. There is a correlation between the number of authors, that concurrently modify the same component and the number of defects in a given component. Research on Linux source code shows that modules with the number of authors greater than 9 have 16 times more defects than the rest of the codebase.
We can use behavioral code analysis to detect such issues. We have already seen hotspot analysis. In projects with suboptimal teams’ organization, there is a strong connection between hotspots and modules modified by many developers. Using data from our source control we can check how many developers modified a given module in the analyzed period. If there are many developers modifying hotspot code, chances are it has too many responsibilities, which in connection to frequent modifications done by different persons will probably result in a code of lesser quality, inconsistencies, defects, and time lost on costly merges. Once again we see the importance of the single responsibility principle.
In his books, Adam Tornhill presents three typical patterns of code ownership and describes their consequences.
In the first case, all or most of the code in a given component was developed by one person. Here we do not have any additional costs related to communication and coordination. The component’s code is consistent and its quality is mostly related to the author’s technical skills.
In the second case, we have many developers working on a component, but one of them has done most of the work. In this case, the percent of code created by the main contributor is a good quality predictor. Research shows that the higher percentage of code contributed by the main author, the lesser number of bugs. There is also an even stronger relationship between bugs and the number of other contributors. The number of bugs increases with the number of additional contributors.
The last case is high fragmentation – many authors with a small amount of changed code. This code will require careful analysis and tests as it will be the source of many bugs.
We can also combine information about authors and temporal coupling. First, we need to find the main author of each module. We can do it by finding a developer who added most of the code to the module. But often a better heuristic is to look for the number of deleted lines. This way we will probably find a developer who is taking care of a given component, cleans it up and reduces complexity using refactoring techniques. We can combine a list of main authors per module or component with the results of temporal coupling analysis. If modules temporarily coupled have the same main author or author work closely together (for example they are on the same small team), then we have no problems. If they are not, then we probably found a costly communication and coordination pattern, when developers from different teams have to work on tightly coupled code.
Summary
As stated at the beginning of this post, technical debt is not a problem of developers only. It affects the effectiveness of the whole organization. If we don’t fight the technical debt results are: increased costs, lack of predictability of our delivery process, systems with high defect rates that won’t help you to achieve your business goals.
Technical debt should be constantly monitored and dealt with on a daily basis. We should manage it based on actual data and experts’ opinions.
Technical debt is in the code of our systems, but to take care of it we should look beyond the code itself. System architecture and team organization are crucial. We should monitor both and adjust it to the project’s needs. Architecture and organization must evolve to provide optimal support for system evolution.
Traditional tools help us a lot and bring valuable information about technical issues, but they do not help us find places where investments will quickly pay off. They also won’t help us find organizational issues. We should add techniques from behavioral code analysis to our toolbelts to manage technical debt intelligently.
This post presents some of the behavioral code analysis techniques described by Adam Tornhill: hotspots, temporal coupling, code ownership. There are more of them: we can use knowledge maps to monitor the knowledge of our system, code churn to analyze the way developers interact with the code, and many more. There are tools that automate the process. One of them is Code Scene, a commercial product created by Adam based on his books. We use it at Altkom Software and Consulting on a daily basis, it is integrated into our CI/CD pipelines and helps in many ways to create better systems. If you want to try these new techniques I highly recommend starting with Adam’s books and playing with Code Maat, a free open-source tool that accompanies these books.