Quiz: to be taken before you get to the end. Who is this, and why is he relevant?
The story so far: Imperial have some kind of epidemic model that was used to predict, errm, stuff1. After a bit people, predictably (arf!) enough, said "where's your source code" and Imperial said "errm, it's a bit of a mess actually, hang on a mo" and after a fair while and heavy massaging from folk in the private sector that turned an unreadable unmaintainable 15k single-file model bearing a powerful resemblance to S+C's MSU code into something that could be seen in public without too much embarrassment, they put it onto Github2. I even poked around in there for a bit before realising I didn't know how to look at Github, and getting bored.
But! Other people have looked, and are unimpressed. This is no great shock I think. The people I found were "Lockdown Sceptics" who, despite their sensible but derivative motto Stay sane. Protect the economy. Save livelihoods, are probably a bunch of nutters; they also have a dreadfully slow website. They have a post (arch) called Code Review of Ferguson’s Model.
Most of the post is about non-determinism. This is interesting (though probably no great flaw) and I'll get to it in a moment, but first a few other things they pick out. The first is the absence of any kind of tests, which seems a fair point. Writing tests is tedious and often neglected even in the Real World, so it is unsurprising that a bunch of amateurs didn't bother. Another is poor documentation of some parts; and again, meh, so it goes. let's go back to the interesting part, non-determinism.
There are a number of meanings for this that need to be disentangled. In running such a model, you probably want to run a pile of runs with similar but perturbed initial conditions and do some averaging of the results. In this sense the model is intended to be "stochastic", and that's fair enough. However, with a given random seed, you would rather like the model to be repeatable. It appears to be rather shaky at this. The first problem looks to be parallelism. This comes up in GCMs too, and indeed way back when I said:
There are two sorts of repeatability: you run the model again, and you get *exactly* the same results down to the last bit. This is call bit-reproducibility. Or, you run the model again, and you get *scientifically* the same result (the same climate; probably the same response to forcing within statistical error) but the exact details of the weather differ. Because the climate is chaotic (in the sense that small initial perturbations rapidly amplify) and GCMs reproduce this well, if your model diverges even slightly from bit-reproducibility it will diverge strongly from it, because the details of the individual weather will be totally different. But the climate (statistics of the weather) will be the same.So you can see Repeatability of GCMs for more. The problem with the parallelism stuff if that the easy way to implement it leads to numbers coming back from other processors in a different order, and so inevitably the exact last few bits of floating point calculations don't quite match. If you put effort in you can avoid that, at the cost of slowing the thing down a little3. Imperial's solution appears to be running the model single-threaded, instead (but because they didn't really care about repeatability, and had no tests to pick up problems, they still had repeatability bugs even in single-threaded mode).
If you look at the report, the non-repeatability looks to produce a pretty big difference. However, I rather suspect this is like GCMs: if you look at output after one month after a trivial perturbation, it will look very different even though the long-term climate is the same. What that report / pic appears to show is sensitivity to when the initial perturbation starts to grow. This might be a problem; but it might not (I think you'd need to see a whole ensemble of runs to see what it is supposed to look like).
So overall I think the criticism in scientific terms isn't exciting. As a lesson-for-our-times I think "when you look behind the curtain of just following the science you'll see some messy stuff" will do. Or even Laws are Like Sausages. Better Not to See Them Being Made.
The East is Red
Interestingly, seen from one bug report, people are shamelessly using the concept of "Red Team" in this context. It's almost as though the concept is sensible and helpful, when used in good faith.
The fight for the soul of the Imperial model continues. Nurture has Critiqued coronavirus simulation gets thumbs up from code-checking efforts which (once you remove the goo and dribble) amounts to pointing out CODECHECK certificate for paper: Report 9: impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand. March 16, 2020 which looks to be part of a commendable effort to re-run codes leading to publication by Stephen J. Eglen; and they have, and get the same answers (in the stochastic sense).
Meanwhile Bryan Lawrence has a somewhat unconvincing defence in On scientific software - a beginning.
* The Imperial College code - ATTP
1. I'm a big-picture man. Don't bother me with details.
2. I've been re-reading Proust.
3. The Met Office / Hadley Center are not a bunch of amateurs and did put the effort in. Incidentally, one reason for wanting the reproducibility is rare bugs where you model blows up. If you don't have reproducibility, you can'r re-run with extra debug to get to the same point and see the failure more clearly, so you're pretty well stuffed for debugging.