2020-04-17

Coronavirus days: the IHME model is worthless

93486463_10158072109287350_8516462141245489152_o James half said this a bit ago in Dumb and dumber but on reflection he only half said it. But people keep on saying, effectively, "well the IHME model isn't very good is it" without ever bothering to look at exactly what it is. James said "some sort of fancy curve fitting that doesn't seem to make much use of what is known about disease dynamics" and I think that's true though I'm not sure how much it deserves the "fancy". I've been drifting along on the stream of all this modelling and not bothering to peer into the murky depths much, but I was very struck by this IHME "prediction" that James Twat; I've inlined it. If you look at it, there are - as James said - a number of obviously very strange things about it:

* the uncertainty range immeadiately leaps up on the first day of prediction to completely implausible levels (as well as having an implausible lower limit too);
* the model has an implausible level of certainty that the whole thing will be over by the end of the first week in May;
* it's all a bit Gaussian looking.

That's taken from https://covid19.healthdata.org/united-kingdom, if you want to look for your self.

I finally dragged myself out of my lethargy to read their paper and discovered that they don't go out of their way to tell you what their methods are. But if you read it, it's fairly plain:  The cumulative death rate for each location is assumed to follow a parametrized Gaussian error function. So, that's their "modelling". But that's worthless, because epidemics don't follow a Gaussian, especially if they've got a lock-down in the middle of the data, whereupon fitting a Gaussian goes form being a bad idea to a cretinous one.

I'm guessing (though I haven't looked) that this explains their uncertainty bounds too: all they've done is taken the mean and fuzzed it, so the uncertainty is proportional to the value. Which is also worthless.

This also explains why the model goes to zero when it does: since we happen to look like we've got to the "top" of the Gaussian, it's simply predicting a mirror-image of itself as a decline. Also worthless.

But then we get people like Nate Silver Twitting "There are some good critiques of the IHME model in here IMO" and... it's all to wishy-washy. Yes there is in that one good point: According to a critique by researchers at the London School of Hygiene & Tropical Medicine and Imperial College London, published this week in Annals of Internal Medicine, the IHME projections are based “on a statistical model with no epidemiologic basis.” (my bold). And yet despite all this no-one can actually be bothered to read their paper and say what's wrong with it.

14 comments:

Tom said...

It would appear to this layperson that infection-ness (R whatever), sickness and mortality are so heavily dependent on our short term reactions to the phenomenon that single line projections are not likely to be useful.

Again, to the untrained eye the various curves shown for individual countries do not look similar enough to say much of anything useful. But neither do they look different enough to say for example, that Sweden's approach is better or worse than Denmark's or France's.

Finally, policy differences in classifying non-hospital deaths as CV19 or not varies enough from jurisdiction to jurisdiction that it seems possible that we will not have useful data any time soon.

PaulS said...

Again, to the untrained eye the various curves shown for individual countries do not look similar enough to say much of anything useful

Comparing absolute numbers may have significant problems but I think you can make some reasonable inferences from relative comparisons between curves over time, on the basis that reporting idiosyncrasies within countries are not changing.

There's a lot of focus on Sweden, but Sweden's data are particularly noisy and appear to have a very large weekly cycle. So when the figures go down at weekends we're seeing stories about how great everything is there, but when the figures go up in the middle of the week we see stories about how it's all gone wrong.

There is a fairly simple solution though - apply a 7-day running mean. Removes the weekly cycle as well as reducing random noise. I tried this with Sweden plus a few other European countries and the United States. European countries other than Sweden clearly have declining death rates at this point, since around 10 April. In the US there is a plateau, but with a slight incline rather than decline, though presumably this is an aggregate of very different situations in different states.

In Sweden there is, as yet, no sign of even a slow down in growing death rate. But my understanding, which may be wrong, is that this is supposed to be the plan for Sweden's approach: they're aiming for a herd immunity effect, which means they want a faster spread of infection than other countries and that necessarily means a faster near-term growth in death rate. So even if we had perfect data, looking solely at near-term death rates would not necessarily be a good indicator for how well different countries' approaches are "working" since they will have different definitions of "working".

I've seen people try to have their cake and then eat it with Sweden, based on their particular biases. Playing with stats to claim that Sweden's voluntary approach is not resulting in clearly greater near-term prevalence than seen in other European countries with mandatory lockdowns, but then also claiming that Sweden's more relaxed approach will damp the longer-term effects, such as rebound peaks, due to herd immunity. But that's nonsensical - if Sweden's actions have somehow resulted in the same near-term suppression efficacy as mandatory lockdown in other countries then the long-term implications for spread of the virus in Sweden are necessarily exactly the same as those other countries.

I've also seen it reported that Sweden's chief health adviser has claimed the situation is stablising in Sweden. This is a bit of a puzzle to me, unless I'm wrong about the goals of the Swedish approach. According to this Harvard modelling study looking at effects of different social distancing mitigation strengths, a Sweden-like approach aiming for ~ 40% reduction in R0 (versus 60% reduction in other countries) should expect to see continuing quite strong growth in critical cases for the next few months. That paper does appear to have quite nicely predicted the difference between, say, Sweden and the UK to-date looking at death rates.

The paper also models a possible seasonal effect in infection spread, and that modeling does suggest Swedish critical cases should be stabilised right now. Trouble is, actual death rate statistics indicate that this seasonal effect isn't happening. I wonder if the Swedish health authorities gambled on a seasonal influence, in which case they may have fucked up big time.

Graeme said...
This comment has been removed by a blog administrator.
William M. Connolley said...

[Spammed . I haven't had anyone bother post drivel here for ages, I thought people had given up in favour of more remunerative targets- W]

William M. Connolley said...

> According to this Harvard modelling study...

Which of course I didn't do more than skim. But I think it's just using various values for R0 as its "different social distancing mitigation strengths". What I'm rather more interested in is some way of calculating changes in R0, from different actions. For example, in an office environment where everyone wears masks and uses handsan, what effect does that have on R0, vs staying at home? What if you went in every other day? And so on.

Tom said...

I should think R0 is a variable linked closely to population density and utilization of public transport.

William M. Connolley said...

I was rather surprised to hear on the news that the Underground was still being run. I'd have thought shutting it down would be an excellent idea.

Marco said...

Yep, shut it down. Healthcare personnel can get to work by...eh...bike? Walk? Taxi? Busses?

Tom said...

Gee, why not government vehicles tasked for the purpose? Is it really that complicated?

wereatheist said...

Tanks? Army Lorries? I'm positively pissed that the U-Bahn in Berlin ran on reduced schedule, meaning that the essential workers could not keep distance. This is sabotage.

wereatheist said...

BoJo initially wanted to let the useless die, but reconsidered. Because he liked his head on his shoulders, instead of on a spike. But I think his head would look just perfect on a spike.

Marco said...

Tom, yes, really that complicated. You'd need chauffeurs for many of these people. I've seen some being offered cars to use, and they said no, for the simple reason that they did not dare to drive after a 12-hr shift running around all the time. Driving home in the dark, very tired, yeah...ICU places are already limited.

Also, not everyone has a driver's license, and there are enough "essential workers" that you'll still run into very busy roads and congestions. Add the problem of now having to make room for lots of extra cars in the hospital car parks, and having to find a way to make sure these people do not need to pay all kind of charges (London has a congestion charge, car parks generally demand payment) and it is indeed really complicated. Not something you just do.

Maybe a good lesson for the future...

Paul Kelly said...

Can you compare the Imperial College and Stanford models?

William M. Connolley said...

I don't even know what the S model is... is it https://covid-measures.github.io/ ? If so it looks like an STD.model, not the curve-fitting of IHME. It's a small model (with 11 compartments, so a ponced-up SEIR model?), can be run interactively, designed to test the effectiveness of interventions? But in a non-interesting way, as you have to specify those interventions as % of present. Imperial (https://github.com/ImperialCollegeLondon/covid19model) appears say it is an SIR model (https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology; not SEIR?) and I don't know enough to know if the two are essentially the same, or if the number of compartments make a significant difference. Based on JA's expostulations, I'd guess the model fitting is a more important consideration.