Measuring performance within engineering teams
Julian Colina is the CEO and Founder of Haystack and formerly Director of Engineering at IRIS.tv. Haystack promises to deliver insights from your GitHub data that help you experiment faster, ship reliably and prevent burnout.
To find out more, Glyn Roberts, CTO of Digital Solutions at iTechArt sat down for a conversation about metrics, getting the team onboard and identifying the most useful data sets for your business.
Glyn: Why do businesses want to measure the performance of engineering teams?
Julian: Engineering leaders are often flying blind. Even the simplest evaluation questions can sometimes be really hard to answer: Are we improving? What can we do to improve? What is it that we’re improving? If you’re a CTO or you’re managing a team, it can be frustrating not to have those answers. That’s where metrics come in.
At Haystack, we’ve identified four really good reasons to use metrics:
- To prompt action and create meaningful communication with the team
- To create a feedback loop for people on the team
- To express your engineering goals to other stakeholders in the business
- To advocate for the team and future investment
Glyn: Are there any metrics that are not appropriate ways of measuring the performance within an engineering team?
Julian: Yes. Sadly there is quite a lot of bad practice with metrics. There’s actually a bit of stigma around the use of metrics to measure performance and that’s because a lot of mistakes have been made historically.
Many of these bad practices came about from setting the wrong goals at the outset. Metrics have often been used as a way of comparing or ranking engineers, or wanting to provide bonuses or promotion structures based on results.
At Haystack, we try to stay away from ranking or comparing individual engineers. We focus on the team and we’ve purposefully made it hard to use our metrics to measure individual performance.
Another problem is the use of metrics that are flawed because they attempt to measure the output of an engineering team, but output can be hard to define in software engineering. One common example of this is measuring ‘lines of code’. There’s also ‘commit frequencies’, which is how often individuals or the team are committing. ‘Velocity points’ also belong in this category. They’re great as an estimation tool but when they’re used as performance metrics they don’t tell us much. I’ve also recently seen a metric called ‘impact scores’ becoming more popular. This is essentially lines of code with a few extra qualifications factored in, such as lines of code touched.
The problem with all of these output metrics is that they end up incentivising the team to increase output in whatever way possible – more lines of code, more commits, more storyboards. This can end up damaging the team as they focus on quantity over quality. Instead, we suggest that you always begin the measurement process with really clear goals and spend time working out what it is you’re trying to capture.
Glyn: So what metrics should we be looking at?
Julian: Focus on measuring process instead. Not only is that what the developers actually care about, it’s also the area where you can make the most actionable changes.
Before identifying the specific metrics you’re going to use, identify your goals. Ultimately, most of us want a productive team. We want to drive improvement and demonstrate achievements to our stakeholders in quantitative ways that are easy to understand. When I imagine a productive and successful team I don’t imagine a team who are competing to write more lines of code than each other. I imagine a team that is responsive to each other, who are collaborating effectively and who have efficient processes in place. That’s what we want to incentivise and capture by using metrics.
There are a lot of different process metrics that we could be measuring for software engineering, but there are only a few that we should be measuring.
At Haystack we’ve identified two categories of process metrics that we like to use. Firstly there are North Star metrics, which are the ones that guide the team in what they’re doing. North Star metrics can help you understand the development process from a really high level, which will allow you to identify areas to improve. This could include metrics like cycle times, which look at how long it takes you from first commit all the way through to merging code. That can then be broken down into how long we’re spending in the development cycle, the review cycle and on reworking code. North Star metrics can give you a really good high-level view of how the team is progressing through the whole software development life cycle.
The second category is Indication Metrics. These help you monitor and assess if something is going wrong. Metrics can give you really early warning signs that something needs to be changed or addressed. My personal favourite Indication Metric is deployment frequency, which looks at how frequently we’re deploying the code we’re writing. This can give you a high level view of how much work your team is completing from sprint to sprint. It can be a fantastic indicator for things like burnout and overwork because you can see spikes and dips. It’s also very easy to action changes based on the results.
Glyn: Could you go one step further and tie your metrics to actual outcomes?
Julian: I think there’s a really subtle difference between product metrics and engineering metrics. Product metrics are largely geared towards product usage, whereas for engineering teams, we’re really trying to measure process. We’re asking questions like: do we have a healthy process internally? Are we reviewing code efficiently? Are there bottlenecks that are affecting the team daily?
With product metrics, it’s much more difficult to get to actionable solutions that can be used on a day-to-day or a sprint-to-sprint level. Using product metrics and process metrics together allows you to create beautiful stories around how you improved and how that affected the product metric.
Glyn: How do you collect the information you need for process metrics?
Julian: We’ve come up with two rules of thumb at Haystack:
- 1. Keep your metrics simple
- 2. Automate where you can
Focus on 3-4 North Star metrics and don’t create processes that mean you spend a lot of time pulling different data from different platforms.
Start where the data is most available to you. In most cases this will be GitHub. Jira also offers good data, but we’ve found it to be slightly less accurate because it requires employees to manually move tickets. Both GitHub and Jira have an API and Haystack can also help you to pull that data.
Haystack plugs directly into GitHub and we provide North Star metrics and some Indicator Metrics, so as soon as you plug Haystack in you can gather your cycle time and throughput and also get alerts when things start to go wrong.
Glyn: How do software metrics and people metrics relate to each other?
Julian: Software metrics aren’t really measuring people and we don’t believe they should be. People are complex and they shouldn’t be tied to any one metric. The number of Jira story points that one person achieves doesn’t tell us if they are a good or bad engineer. Metrics give a very granular view of how an individual engineer is doing.
Unlike sales or marketing which has revenue and qualified leads, software engineers have very varied outcomes for what success looks like. There are a lot of different ways that software engineers can be effective. They could create a system design that lasts for 10 years or simply share knowledge and experience that spreads to the rest of the team and enhances everyone’s effectiveness. To measure individual success there’s no better way than to take a more humanised appraisal approach.
Glyn: There are things that a business can do to support their developers. Transparency of expectations is important. Also, having standard operating procedures such as regular stand-ups and retrospectives. I also think it’s good practice to include engineers in meetings and conversations to ensure they have a clear understanding of what’s happening and what the goals are.
Julian: Software engineering is a team sport and the metrics should reflect that. We’re not looking at individual performance, we’re looking at the team as a whole and we’re measuring whether the changes we’re making are having a positive impact and making us more efficient. We need to be able to measure whether we are heading in the right direction and then prove it to our clients and the wider company.
Glyn: How difficult is it to define the decision criteria from metrics?
Julian: Historically it has felt difficult because there were a lot of bad options to choose from, but I think things are getting better.
There are 3 important steps towards defining the correct decision criteria for metrics. Ask yourself, does the data you are collecting enable you to:
- Identify any areas to improve
- Take action on that data
- Fuel improvement and remove bottlenecks
If you run the metric ‘lines of code’ through these criteria, you would see that it doesn’t meet them. It doesn’t tell you why things are going wrong or what to do about it. Metrics have to give actionable data or they aren’t going to be successful.
Glyn: How do you roll your new metrics out to the engineering teams without them having concerns about their individual performance being monitored?
Julian: Be upfront about your goals. Don’t just walk in with a new set of metrics without talking them through with the team. Talk about what you are measuring and also why you are measuring it and what you hope to achieve. Make sure they understand that these metrics are to help the whole team improve. It’s important that they understand that the data is for the team, not against the team.
When you use the data to present to managers and senior staff, it should be used to support the story of your team and the work you’re doing. It’s not a replacement. Craft the story of how you’re working as a team and then use the data to demonstrate that.
When it comes to using the data to talk to your team about how things are going, always remember to come in with questions and not conclusions. There is a big difference between saying ‘our cycle time is bad this week’ and saying ‘does our cycle time look the way that we want it?’
Glyn: Would you expect engineering teams to set their own metrics?
Julian: I think it’s a process that the team should be involved with. We would never recommend forcing a metric on your team. Historically, that is how lines of code became one of the most hated metrics. We always suggest testing out metrics before committing to them and seeing how they work for your team.
Glyn: How do you communicate your data across the rest of the business on a regular basis and ensure that it has context and meaning?
Julian: At Haystack we have a weekly retrospective and we bring up some of the data from the previous week. We look at the outliers and the overall picture. Some weeks the data looks great and we can see that we are moving quickly and communicating effectively, and some weeks we seem to get a bit slower. We use the data to look for the outlier – what was it that enabled us to do well, or what slowed us down? From that we come up with a few action points. Our general rule is that we come up with a maximum of three action points that we can put in place this week or next and then we use the data to see if they were effective.
Glyn: So in summary, what do team leaders need to look out for?
Julian: Stop focussing on assessing individual performance and start thinking about metrics that can support your team to keep improving and understanding where the bottlenecks are arising.
As a first step, come up with a few North Star metrics that your team can really rally around. If your team is moving from sprint to sprint with no idea whether they’re improving or if their process is effective then you’re flying blind and your team could be getting demoralised.