Simon Sinek said, “you can’t incentivize performance, you can only incentivize behavior.”
Elegantly put, I do believe this to be true.
You can reinforce behaviours through incentives. This may lead to positive outcomes — we hope that the right inputs lead to the right outputs, which hopefully lead to the right outcomes. But this is not always the case.
There’s a phenomenon referred to as “regression towards the mean” where extreme outliners are followed by a more average result.
Amazing players follow a great game with a less-so one.
A winning streak inevitably comes to an end.
The opposite is also true.
Struggling students tend to improve.
A bad week is followed by a better one and seemingly cursed sports teams suddenly start to win games again.
None of these events are new or interesting on their own.
However, the human mind likes causality — we have trouble accepting that sometimes things just happen either for no reason, or irrational ones and that luck, randomness, etc in circumstances play a part.
We therefore look to fill that void — explain the unexplainable.
Our minds like coherence. So much so that we even fabricate it at times.
We infer that the great basketball player who started missing must be stressed or jinxed.
A terrible student must have started to study more.
A dice rolling 4x sixes in a row must be rigged.
But from a probability point of view, this is all possible without any intervention — if only probably.
The danger here is that as a leader it can lead us to punish high-performers after a bad stint and promote lower-performers when they have shown improvement.
But what if it was inevitable? What if the high performers would eventually have a mishap and the lower eventually an improvement, regardless of whether anything changed or not? What if it was a ‘regression toward the mean’?
Ramifications of ‘regression towards the mean’ for leaders
In the book ‘Thinking Fast and Slow’ Daniel Kahneman shared his own personal experience working with the Israeli air force. One instructor had observed a correlation that whenever they gave good performances praise their performance declined, whereas when they reprimanded poor performances they saw an improvement.
He therefore concluded that criticism was superior for training than praise, where what he was really observing was ‘regression toward the mean’.
“I had the most satisfying Eureka experience of my career while attempting to teach flight instructors that praise is more effective than punishment for promoting skill-learning. When I had finished my enthusiastic speech, one of the most seasoned instructors in the audience raised his hand and made his own short speech, which began by conceding that positive reinforcement might be good for the birds, but went on to deny that it was optimal for flight cadets. He said, “On many occasions I have praised flight cadets for clean execution of some aerobatic maneuver, and in general when they try it again, they do worse. On the other hand, I have often screamed at cadets for bad execution, and in general they do better the next time. So please don’t tell us that reinforcement works and punishment does not, because the opposite is the case.” This was a joyous moment, in which I understood an important truth about the world: because we tend to reward others when they do well and punish them when they do badly, and because there is regression to the mean, it is part of the human condition that we are statistically punished for rewarding others and rewarded for punishing them. I immediately arranged a demonstration in which each participant tossed two coins at a target behind his back, without any feedback. We measured the distances from the target and could see that those who had done best the first time had mostly deteriorated on their second try, and vice versa. But I knew that this demonstration would not undo the effects of lifelong exposure to a perverse contingency.” — Daniel Kahneman, ‘Thinking Fast and Slow’
Another example was John, in Seeking Wisdom by Peter Bevelin. John was dissatisfied with the performance of new employees and had placed him in a ‘skill-enhancing program’. Upon completion, John measured their performance and concluded that the program was a success.
“Their scores are now higher than they were on the first test. John’s conclusion: “The skill-enhancing program caused the improvement in skill.” This isn’t necessarily true. Their higher scores could be the result of regression to the mean. Since these individuals were measured as being on the low end of the scale of skill, they would have shown an improvement even if they hadn’t taken the skill-enhancing program. And there could be many reasons for their earlier performance — stress, fatigue, sickness, distraction, etc. Their true ability perhaps hasn’t changed.” — Peter Bevelin, ‘Seeking Wisdom’
In both of these cases, these leaders viewed the data in isolation. They fell for the cause-and-effect trap.
They inferred that their actions were the result of the outcome — positive reinforcement = decline and negative reinforcement = improvement.
But if they took a step back and viewed the data as a trend it would have told a different story.
The top-performing pilots probably still performed above average regardless of the type of feedback they received — perhaps even (knowing the research behind it) overtime positive reinforcement would have improved their average performance over time.
This is the same trap that I see many new teams and product people get into when it comes to discovery. Just because one customer had an extreme experience doesn’t mean that it’s true for all, you need to view the data as a whole and not in isolation.
Outcomes are biased, Behaviours less so.
Assessing your team's performance and whether someone is good at their role or not gets real messy when you start to consider the biases like attribution bias alongside ‘regression towards the mean’.
Attribution bias is when we put more weight on one’s personality or character to explain their behavior, rather than considering external factors.
In other words, we put more emphasis on the individual being the cause rather than the context, environment, and other factors many of which are often out of their control (such as randomness or luck).
An example of attribution bias is assuming that someone who is overweight is lazy before considering potential medical or health factors.
This becomes messy to navigate as a leader as our attribution bias also shifts depending on whether we are the actor (the one affected) or an observer (observing someone else).
From an observer point of view, we tend to over-exaggerate our own abilities over the environment when we have a positive outcome — “…hard work paying off” — and the opposite when we have a negative one — “if it wasn’t raining that day…”
Conversely, we do the opposite when we become an observer. When we observe others we tend to over-exaggerate the environment when we see them achieve a positive outcome — “they just got lucky!” — and over exaggerate their character/personality when they have a negative one — “if they worked harder they would have succeeded.”
This form of attribution bias is referred to as the ‘actor/observer difference’ and is one of the major reasons why we can be misled by ‘regression towards the mean’ when it comes to incentives and assessing performance.
Truth is performance will always fluctuate because it is one part dependent on your abilities but another part dependent on external factors that you cannot control.
Putting too much weight onto one over the other can lead us to create poor incentive progress, ineffective performance-based programs, and give misguided promotions.
Our performance always varies around some average true performance. Extreme performance tends to get less extreme the next time. Why? Testing measurements can never be exact. All measurements are made up of one true part and one random error part. When the measurements are extreme, they are likely to be partly caused by chance. Chance is likely to contribute less on the second time we measure performance.
If we switch from one way of doing something to another merely because we are unsuccessful, it’s very likely that we do better the next time even if the new way of doing something is equal or worse.” — Peter Bevelin, ‘Seeking Wisdom’
Taking this one step further outcomes in an organisations context is seldom an individual thing. It’s almost exclusively a collective effort.
Thus measuring outcomes for performance or incentive bonuses are misguiding as they are too influenced by external factors and a representation of the collective, not the individual.
This is why team-based outcomes — like OKRs and other forms of group goals work well. They then to break down when organisations either impose them on an individual level or have competing individual KPIs.
Behaviours for Individuals, Outcomes for Teams
The best model I’ve seen work is using outcomes at the collective level but focusing on behaviours for individuals.
Since outcomes are partly out of our control it doesn’t make sense to hold individuals accountable to them.
For me, as a product person, I see this as a problematic aspect of our industry. One of the primary measures I see companies interview Product Managers (including myself) against is what outcomes they achieved.
But this is problematic as we only have so much influence over it at the end of the day. For more senior folk, yes trends and their average performance is a good indication but looking at a single role in isolation and determining that they’re not a good hire because they didn’t have too many positive outcomes is falling for the same trap those Israeli Air Force Pilots and John did.
However, what is in your control is how you go about achieving those outcomes —aka ‘ your behaviours’.
So rather than focusing on whether the pilots were performing good or bad maneuvers they should focus on what they are doing — are they doing the right things, are they piloting the plane well. Over time this should lead to more above-average maneuvers over time.
And for those poor performances, you’d likely find aspects of what they are doing that can be improved. Focus on improving them and over time the outcomes will likely come through — be remember not always! Rolling four sixes in a row is possible, if only probable.
For example, I’ve worked with a company once that had great success without ever doing any customer discovery. They’re very much a feature factory — the cofounder set what they wanted to build and they did it. But we know that is ill-advised and over time they’ll likely not continue to have these superior outcomes long-term.
Similarly, I know Product Managers who have hailed from one of the pinnacle FAANG companies only to land in a different business, context, etc, and struggle.
So as leaders (and as individuals) focus on what you can control. Focus on your behaviours, the inputs you are putting into the puzzle. Are you doing the right things?
As a runner, I cannot expect myself to perform PRs every time I go out for a run. Rather I’ll have good days, not-so-good ones. My performance will fluctuate. If I got upset with myself and reprimanded or told myself that I should switch to cycling because “I suck” at running the first time I had a set back then I’d never get to see any significant outcomes come to fruition.
Rather I focus on my behaviours — am I eating well? Am I running regularly? Am I recovering enough? Stretching? Warming up? Cooling down? How is my technique? etc… And when looking at the data and outcomes, I look for trends. I don’t look at the last time I ran a10km, I look at the last 3 or 4 — how were they? Are they trending up or down? How’s my pace over the last 10 runs? Am I improving? Declining?
This is an important lesson for leaders as I see too many tie outcomes to individual performance. And worse try to incentivize it as if we have absolute control over them.
This may seem like it’s working, just as much as John thought his performance program was working and the Israeli Instructor thought they had the formula for training. But correlation does not
Rather I’ve seen and had the most success focusing on behaviours at the individual level and reserving outcomes for teams.