Graham's Blog: The Devil in the Denominator

At the start of my career, I was a retail sales rep. for Shell in Northern Ireland. I had to look after the Shell petrol stations in part of the province. It was a fun job.

One of the many lessons I took away from Belfast was that small businesses come in many shapes and sizes. I had a few big stations, usually on land owned by Shell, where there were several employees and even things like dedicated accountants and managers.

At the other extreme, I remember a station on a side road off a side road off a side road, operated by a charming old woman of eighty or more. My monthly sales report showed that she sold almost nothing, but she stayed open, running the place herself, alerted by a bell in her living room that was tripped if a rare customer would show up. There was no reason for her to close. She basically had no costs, and nor did we. So long as it remained safe for the Shell tanker to deliver fuel to her, the business made sense to us and it made sense to her. I called in for a social chat every six weeks or so – she always had great homemade cakes.

I often recalled this woman in later years when I was in fancy head offices devising strategies for whole countries. Senior managers always wanted key data to demonstrate that everybody understood the business that they were in charge of and to produce KPI’s that would drive strategy.

One time fairly early on I remember standing in a room with the head of the retail business for the whole UK, arguing over some set of statistics. There was much debate, because no indicator seemed stable. At one point, he shouted, exasperated, “how can we pretend to know what we are doing when we don’t even know how many sites we have?”

I immediately thought of my old lady up country in Northern Ireland. Was she still alive? Did she still sell fuel? And, even if she did, should her station actually count? The senior manager wanted to use statistics to understand his business, but every ratio seemed to involve something that he really shouldn’t care about. It shouldn’t matter to him whether my old lady was pumping gas or not, but it did, because if she stopped the national volume per site would increase.

Volume per site often became a key indicator in other contexts, with more being seen as better. I remember being told by some fancy consultant that Shell should make a bid for the Elf network, because it had the highest volume per site. The reality was that Elf had all its sites in big cities and on motorways, but that was also where price competition was fiercest and rents highest, so volume per site was only a partial guide.

There was usually a good reason to segment further. We could just look at Shell owned sites, because in those we bore most of the fixed costs and kept most of the revenue. We could exclude motorway stations because of the unique government fees they carried. But what about truckstops, with huge volumes but tiny margins? Supermarket sites? We even had some dual branded sites, for goodness sake.

I notice that most major retailers nowadays use an indicator of same store sales growth to report and analyse. I like it. It is not a ratio, but a comparison between two meaningful numbers. It strips out network changes. And it doesn’t really care about stores like that of my old lady; her volume would not influence the data much at all.

All of these thoughts of ratios and KPI’s has come to my mind recently in the context of coronavirus. Everybody is trying to publish meaningful statistics and trends, but most of them lead to more questions than answers. Usually, the reason is my old lady in Northern Ireland.

Start with the mortality rate of the disease. At the beginning this was one way the epidemiologists tried to convey the seriousness of covid 19: flu killed 0.1% of those infected, but this virus killed maybe 2%.

But the 2% figure has not really stood any exposure with reality. If you compare the ratio or deaths per cases across cities and countries, it is all over the place. Just now, in New York, we have 10,000 deaths and 100,000 confirmed cases, but nobody is claiming a mortality rate of 10%. In theory, it could be even worse, because death should be a lagging indicator.

It is obvious where the problem lies. The deaths number might be reasonably accurate, though even it has challenges, due to folk dying at home undiagnosed or dying with a range of possible causes. But the denominator, the number of cases, is even less reliable than Shell’s count of its stations.

New York has finally got its testing somewhat up to speed, but still hardly anybody is being tested. If you are an essential worker you have some chance of being tested, maybe even frequently. If you are actually admitted into hospital you will be tested. The rest of us have to sit at home and wonder. It is possible that my wife and I have both had ultra mild cases of covid, but we have no current way of knowing, and, perhaps at the end of the day it doesn’t really matter, unless there are some implications for immunity.

This problem might be called the long tail challenge or the devil in the denominator. Often, what it tells us is that we are measuring the wrong thing in the first place. For something like that, often the only way to get a good handle on infection rate would be random sampling. Even that would only work if fully recovered cases still showed up in some way. But in the end, maybe we just should not care.

As a mathematician, I have been trying to follow coronavirus statistics carefully. It has not been easy, and the dodgy denominators have not been the only reason. Perhaps reasonably, there has been some censoring going on, usually not for political reasons but to avoid spreading panic.

To his credit, Andrew Cuomo has been fairly open and fairly consistent with sharing data for New York State. Just like same store sales for a retailer, the trend of ICU admissions seems to work as a meaningful trend indicator, and the same for intubations and deaths as somewhat reliable lagging indicators.

But even these numbers are hard to compare across states and countries. Different places will have different policies for who gets admitted to hospital (do nursing homes with specialist equipment count?), and some will be more robust in their data collection. For emerging countries, even the death toll is likely to remain a colossal undercount. In the end, comparing the total death rate during the pandemic with the same time period last year will probably be the best indicator.

On balance, I think statistics help us in times like these. They can support public messaging, and eventually help our morale, once things start to head in a less horrific direction. I prefer the facts of Cuomo to the bluster of Trump: “we have sent out millions of masks”: “Nobody has told me about anybody unable to get a test”. We need data: timely, accurate, well-defined data.

The Economist, as ever, has been wonderful with its data and charts. They have been offering trend lines, with a log scale of cases or deaths plotted against the number of days since a place saw its tenth death or hundredth case: brilliant and enlightening.

But even The Economist has struggled with mortality rate, and recently it seems to have given up trying. I think that is a good move, as I remember the lessons from my old lady somewhere West of Ballymena. I remember her cakes, too.

Graham's Blog

Wednesday, April 15, 2020

The Devil in the Denominator

No comments: