Is NPS Helpful?
The Making of a Metric with Gibson Biddle
The Net Promoter Score (NPS) catches a lot of flack. But should it?
For a long time, I was among those who dismissed its utility. A recent data analysis project I took part in with Gibson Biddle, however, may have changed my mind.
This essay tells the story of how that came to be, and shares a few reflections on where NPS might fit into your analytics strategy.
NPS as Proxy Metric for Content Quality
In the world of business metrics, few subjects are as divisive as the NPS. Some people swear by it, some people really dislike it, and just about everyone (… or at least everyone who cares a lot about metrics) has an opinion.
Developed by Bain loyalty consultant Fred Reichheld and first introduced in 2003 (The One Number You Need to Grow, Harvard Business Review), the NPS boils down to one question:
On a scale of 0 to 10, how likely is it that you would recommend [Organization A / Product B / Service C] to a friend or colleague?
For those unfamiliar with the system and how NPS is calculated, see more details on its official site.
Since its introduction, the NPS has been widely adopted across the business world. Two-thirds of the Fortune 1,000 make use of it, and many have tied NPS performance to the company’s bonus programs.
One of NPS’ supporters in the Product space is Gibson Biddle (who’s also on Medium @Gibson Biddle). While NPS was only a secondary metric during his time as VP Product at Netflix, it played a larger part in his tenure at Chegg. There, a high NPS — and the company’s ability to demonstrably raise it through product innovation — helped to secure early funding from investors. Today, Chegg is a public company with a market cap of $11B.
Now an advisor, teacher, and speaker, Gib closes each of his talks, workshops, and exec events with a request for audience members to provide feedback through an NPS survey. The resulting scores, he feels, serve as a proxy metric for the overall quality of his work. Critically, he augments the question on respondents’ likelihood to recommend with two follow-up prompts aimed at capturing qualitative insights:
- What did you like best about this [talk/event/essay]?
- What would make it better?
In recent months, Gib has turned his attention to the Ask Gib newsletter, in which he responds to readers’ questions on product strategy, product leadership, and building strong organizational cultures. True to form, every issue ends with a link to provide feedback.
One night, I got a message from Gib with an invitation to a spreadsheet and the question:
“What do you see?”
The document contained data from the first several months of Ask Gib’s operations. Naturally, this included the NPS for each essay, along with a host of other details. Some months prior, Gib and I analyzed qualitative reader feedback on a few select early essays, but it was great to see aggregate data as the newsletter’s reach continued to expand.
Thinking that maybe he simply needed confirmation that some customized file privacy settings were working correctly, my initial response consisted of two parts: 1) a report on what I saw (including screenshots!); and 2) a request to clarify whether his inquiry was, in fact, a bit less literal.
His response was affirmative:
“Which essays are ‘best?’”
Where NPS Falls Short
Around the time Gib was considering this question, he had recently completed a survey on Twitter asking whether his readers saw NPS as a useful tool when building a product or service.
In the end, only 31% of respondents expressed confidence in the utility of NPS.
My vote, at the time, was not among them.
As ubiquitous as NPS has become — or, perhaps, because of it — the system has no shortage of outspoken opponents. Within the Product/UX space, Jared Spool has published a thorough rebuke, arguing that NPS is not only misguided, but actively detrimental; Jeff Gothelf takes a moderately softer stance, but still characterizes NPS as a “waste of time”. Academia, too, questions the system’s validity, with several studies highlighting its shortcomings.
Among the criticisms of NPS are claims that the system is:
- Gameable: Whether by incentivizing participation or optimizing when the survey’s timing in the customer journey, NPS can be a false representation of true customer sentiment.
- Overly broad: By design, NPS asks a fairly generic question. As a result, responses are likely influenced by factors wholly outside of your business’s control.
- Inconsistent: With such a wide scale, one person’s “4” may express the same sentiment as another’s “6”.
- Opaque: given that NPS excludes responses of 7 or 8 from its calculation, respondents are not given full transparency on the impact of their rating.
Most critically, though, is the consideration that NPS measures intent, rather than behaviour.
Without a means to track tangible follow-up actions, no meaningful correlation can exist between the claim that someone will recommend a product and them actually doing so.
Now, don’t get me wrong, I’ve dabbled in NPS in my earlier years. Maybe even tried to optimize for higher scores from time to time, if I’m being honest. But I’ve never been in a position where I’ve seen NPS have any significant correlation with growth, as Reichheld claims, and generally dismiss it in favour of something more targeted like the Customer Satisfaction Score.
The Making of a Metric
Identifying meaningful product metrics is challenging — doubly so for non-monetized products, where revenue can’t provide a simple baseline. Gib, for his part, believes in four sources of consumer insight, which reflects my own experience. These are:
- Existing quantitative data, showing past and present behavioural trends
- Qualitative feedback — focus groups, usability, ethnographic studies — to learn how people feel about the work under consideration
- Surveys, including NPS
- A/B tests — and similar experimental methods — to test hypotheses informed by the above data
Before I began my analysis, Gib focused his core inquiry:
“If you were writing Substack essays, which source of data would you rely on?”
Unfortunately, Substack’s analytics capabilities are rather limited. A/B testing, too, is not supported. And while qualitative data exists by way of the extended NPS surveys, written sentiments rarely reflect the depth of insights gained from interviews — which have not yet happened for Ask Gib. Functionally, this left relatively few areas for investigation.
Gib also shared one of the goals underlying his interest in reviewing the data:
“trying to help folks to see how challenging it is to develop a proxy metric and explore the value of NPS.”
As an NPS-keptic, however, those last few words gave me a great deal of concern. I was fully prepared to find nothing corroborating his claim, and telling one of your mentors that the metric he’s used for much of his career is, y’know, just not that great… well, I wasn’t looking forward to it.
Still, I was willing to give the numbers an unbiased review and hoped I wouldn’t have to be the bearer of bad news.
Analysis
Hypothesis and process
My early hypothesis was that the proxy metric for essay quality would involve the number of likes (“hearts”) or shares. Given Substack’s limited analytics, these are the only two trackable user actions that correspond with reader engagement. Beyond that, though, I was unsure of what the correlating data might be.
With the dataset’s small size, I opted for an exploratory approach to analysis over formalizing any processes in advance.
When analyzing any dataset, it’s essential to determine constraints and identify what’s relevant. With an average email open rate of around 50% since the start of the newsletter, this number was quickly deprioritized.
This left six stats for each issue:
- Total number of subscribers as of its publication date
- New subscriptions after 1 day
- Hearts
- Shares
- NPS
- Number of NPS responses
From these stats, I created sets of relative values and normalized them based on the size of the email subscriber base at the time of publication. For example: per 1,000 email subscribers who had received the newsletter, how many had hearted, shared, or responded to the NPS survey?
Next, I applied a colour scale to each column. By using colour to interpolate between a column’s minimum and maximum values, it becomes easier to spot patterns and irregularities between number sets. When using this technique, it’s helpful to choose two “neutral” colours (here, magenta and blue) to avoid subconscious bias. Like it or not, many of us are conditioned to associate green with a good result, red with bad, and yellow/orange as somewhere in between.
The colour-scaling made it evident that, given the small audience of the early issues, these numbers were more noise than signal. As such, I factored out all data from prior to January 1, 2021, when the newsletter had around 650 subscribers.
Of all the data relationships, one emerged as being of particular interest: NPS and the number of shares per 1,000 email subscribers.
And it was here that my assumptions were challenged.
Insight: NPS correlates with share rate
When someone shares an article, they are, in effect, recommending it to a friend or colleague. Rage shares, notwithstanding. It stands to reason, then, that share rate would correlate with Net Promoter Score.
After sorting the articles by their NPS, this hypothesis seemed to bear out.
It’s immediately apparent that several highly-shared essays correspond with high NPS scores. For example:
- What leader(s) over your product career truly changed how you approach product management? — Share rate: 10.47; NPS: 88
- What do you think about project-based roadmaps versus outcome-based (metric-based) roadmaps? — Share rate: 16.6; NPS: 87
- What are your top tips for a Product Leader resume/CV? — Share rate: 10.32; NPS: 74
This relationship seems readily discernible by numbers, alone, when the two columns are viewed in isolation. Amidst a multitude of other data, however, I found the colour-scaling helpful for spotting this pattern at a quick glance.
Visualizing the data in a scatter plot made the results more clear:
Notice how, as NPS increases, so does the average number of shares.
It’s not a perfect slam-dunk, of course. A few of the essays with the highest NPS have amongst the lowest share rates. The model’s R-squared value, as well, is far too low at 0.017 to represent a good fit; however, if we remove scores calculated from fewer than 10 responses, the R-squared value increases to a more confidence-inspiring 0.146.
Limitations
Substack’s analytics tooling can only track shares initiated directly through the platform. This excludes all shares made via dark social media, forwarding the email, or simply copy/pasting the article’s URL into a chat. I’m unsure of how this would influence the data, but suspect it would correspond to a higher number of shares for each well-scored article.
Why not hearts?
Beside shares, hearts represent the only other user action that can be associated with a given essay in Substack’s analytics. In this case, the correlation of hearts/subscribers (x 1,000) and NPS score results in a higher R-squared value (0.047) than the equivalent calculation with shares (0.017).
So, why not use hearts as the proxy metric?
Unlike shares, hearts are not publicly attributable to individual readers. As well, shares require a minimum of three constituent actions to be registered by Substack — and often include a fourth:
- Clicking “Share”
- Choosing a platform
- Writing thoughts or commentary (optional)
- Confirming the post
Hearts, by comparison, need only a single click. With this disparity in the number of discrete actions needed to trigger a traceable event, shares provide a more reliable indicator of reader engagement and endorsement.
Implications
Given that this analysis reflects a small dataset, I’m reluctant to say its implications can be broadly generalized. Preliminarily, though, the results indicate that:
- Both NPS and shares can each serve as a reasonable proxy metric for content quality
- Correlating both content shares and NPS can improve confidence that well-performing content is not being “rage shared”
- NPS may, in fact, uphold its fundamental premise of serving as a predictor of growth
Reflection and Conclusion
It’s rare, in my experience, to have data that includes both how many people say they will recommend your product and how many people actually do.
So, which articles are best?
Though I didn’t expect it, the results of this analysis show a clear correlation between NPS and number of article shares. And if sharing an article does, in fact, serve as an act of recommendation, then it goes to say that NPS is a reasonable proxy metric for overall essay quality.
With that said, the number of article shares unambiguously captures reader action, rather than intent. For this reason, I recommended shares over NPS as the newsletter’s primary proxy metric, and Gib agrees. You can read his take in an Ask Gib post here.
Further research would be needed to make the argument more conclusive, of course. It’ll be interesting, as well, to see if this trend bears out over time with Gib’s newsletter. For now, though, I’m willing to re-evaluate my biases.
It turns out that with the right supporting data, NPS may be helpful, after all.
I know. I’m as surprised as you are.