The Long and Short of It: Statistical Arguments

February 1st, 2009

So there’s an inflammatory post on the physics preprint server blog with the headline Massive Miscalculation Makes LHC Safety Assurances Invalid.   It’s based on a paper by Toby Ord and others titled “Probing the Improbable: Methodological Challenges for Risks with Low Probabilities and High Stakes.”   Here is the abstract:

Some risks have extremely high stakes. For example, a worldwide pandemic or asteroid impact could potentially kill more than a billion people. Comfortingly, scientific calculations often put very low probabilities on the occurrence of such catastrophes. In this paper, we argue that there are important new methodological problems which arise when assessing global catastrophic risks and we focus on a problem regarding probability estimation. When an expert provides a calculation of the probability of an outcome, they are really providing the probability of the outcome occurring, given that their argument is watertight. However, their argument may fail for a number of reasons such as a flaw in the underlying theory, a flaw in the modeling of the problem, or a mistake in the calculations. If the probability estimate given by an argument is dwarfed by the chance that the argument itself is flawed, then the estimate is suspect. We develop this idea formally, explaining how it differs from the related distinctions of model and parameter uncertainty. Using the risk estimates from the Large Hadron Collider as a test case, we show how serious the problem can be when it comes to catastrophic risks and how best to address it.

In addition to the discussion at the archive blog, there’s a long slashdot discussion as well, if such things can be called discussions.   And we already had the Fox News story spinning this as “Scientists Not So Sure ‘Doomsday Machine’ Won’t Destroy the World.”   Ugh.

First, before I attack some stupid ideas and presentation issues, let me say what I like and agree with.   From a post of mine last year about “stupid smart people”:

I’m equating wisdom here with a type of intelligence, one that “smart” people should have or be capable of achieving. When smart people do/say/believe stupid things, it’s akin to them lacking wisdom, and the stupid things could be avoided if only they applied some of their smarts in a different or more global way. It’s often a failure to see the forest for the trees. Sometimes it’s forgetting that forests are made of trees.

One example I see in astronomy all the time has to do with uncertainties. It’s pounded into our heads as graduate students that a data point doesn’t mean much if you don’t know its error bars, and we often spend more time generating the uncertainties than we do determining the data values. That’s fine as far as it goes. But here’s where the stupid comes in sometimes. There are two kinds of uncertainties: formal and systematic. It’s often possible to calculate and show formal uncertainties, which are usually based on well-understood statistics of shot noise or error propagation. A lot of the time these are worthless, because they’re much smaller than the systematic uncertainties, which depend on the validity of the technique. A simple example of the difference is calibrating how bright a star is in absolute terms. We do this regularly by comparison of photons received in a time period compared to some standard reference stars, and use statistics of photons and detector noise to determine formal uncertainties. The systematic error comes up in the choice of reference stars (or the change in seeing without changing extraction apertures, etc.) — if the standard star turns out to be a variable for some reason, then the formal uncertainty means nothing.

In astronomy, adding those formal error bars to a plot, even when the systematic uncertainties are known to be much larger and more important, will make many an audience member smile happily even when they don’t mean anything. That’s being a stupid smart person.

The paper is in part about seeing the forest, putting a calculation in context, in recognizing that errors with the method, or other unforeseen aspect of the calculation could mean that the actual result doesn’t mean anything.   I remember a case back in the early 1990s in my field.   It was a calculation of the probability that a certain category of quasar, a broad absorption line quasar, could be what we call “radio loud.”   The probability thrown out was one chance in a trillion based on the survey work to date.   I published a paper in 1998 reporting my discover of five of them, and now hundreds are known.   They’re not common, but they sure exist.   The problem was that parameter space had not been uniformly explored to that point, and there is a relationship between radio properties and the presence of broad absorption lines, but it is far from absolute or clearly understood.

Now, here’s the thing.   Careful scientists usually couch there statements in qualifiers.   I’m sure that calculation of one in a trillion started with a bunch of assumptions, and that those assumptions were clearly started.   One of those assumptions was wrong, is all.   There was interesting science in why it was wrong.   It wasn’t a “mistake” to make the assumption and do the calculation.

The kind of statistical argument made here in the Ord et al. paper has no physics in it.   It’s more of a mathematical or philosophical argument.   It’s full of its own assumptions, too, which it does state, but seems somewhat unaware of their own limitations.   That is, there is the notion that mistakes are made in physics calculations so often that one shouldn’t believe any extremely rare probabilities.   Well, yes and no.   Sometimes the probabilities can be verified by experiment.   Events are rare, but they can still be studied by virtue of having a lot of chances for them to be observed.

There are physics theories that predict a proton decay timescale of something like 1031 years.   Well, that’s a dman small probability of any given proton decaying this year.   But by watching a lot of protons, we know know that the decay timescale is at least a few orders of magnitude larger, and there is no evidence of proton decay of any sort.

Another post I made related to this topic was after reading The Black Swan, by Nicholas Taleb.   Taleb’s big complaint is how the economics industry adopts Gaussian errors, which may or may not be a valid description of the uncertainty for unlikely events.   We can prove Gaussians are good for many applications in physics.   The problem is that they are assumed in economics, often without compelling evidence, and often with   catastrophic consequences.

But I digress a bit.   I’m interested in all the tangents on this topic I’m afraid, of pointing out the promise and danger of our explorations.   Anyway, when it comes to communicating results to the public, all the qualifiers tend to be dropped.   The results tend to drop the assumptions and it sounds like an absolute result, with lots of significance, and such things are rare in science.   Then Ord et al.’s argument comes in: people doing calculuations are wrong one time out of a thousand, or ten thousand.   On average.   Their claim is that this is serious when assessing risk for high cost disasters.

Which says to me we might as well stop doing anything with high stakes, no matter the benefits, and start doing anything to avert risks, even if the dangers are not proven.

Hear this, Fox News?   The chances of catastrophic climate change ending civilization as we know it are far from zero according to the experts.   But they might be wrong.   It might be much higher than they think!

I don’t quite buy this.   This is a brand of numerology, philosophy.   There are right and wrong answers in the hard sciences.   We can be careful.   We can check, double check, and cross check.   Some authors are better and more careful than others, and so are their referees.   Don’t just say, there’s a tiny chance that is wrong and that chance is larger than the tinier chance calculated.   Be more careful.   Do more checks.   Keep checking.

And why pick on the Large Hardron Collider?   For the PR?   And when the actual conclusion is that there’s still little risk, and when fear mongers like Fox News are likely to pick up on it and spread fear?

I mean, can anyone say how often the very best people do a calculation wrong, have it checked and found to be okay by everyone else in the field, sit around thinking about it for years facing criticism and questions from people about if it is wrong, and still fail to see any errors?   I submit that in this scenario, consistent with the business with the LHC, hasn’t happened often.   I don’t believe anyone can assess the liklihood of error here, except to say it is smaller than usual.

Again, I am really tempted to rewrite the blog entry with the same arguments, only replacing the Large Hadron Collider with Anthopogenic Climate Change, to argue that we’re mistaken about how likely total destruction of the planet is via runaway greenhouse effect.   I haven’t seen anyone putting high odds on that.   If it’s lower than 1 in a 1000, this paper by Ord et al. suggests it might as well be considered 1 in a 1000.   How about a little Kyoto, Fox News, in the face of rolling the thousand-side die on the extinction of mankind?

Share/Bookmark

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.