2021ix9, Thursday: the truth behind the lie.

9th September 20219th September 2021 JSJLeave a comment

Anyone can lie with statistics. But buried in the numbers backing up the BS, the truth that rebuts it can often be found. And fashioning that into a compelling story can be shockingly effective.

One of the great things about numbers is not that they can be used to lie, although they can.

It’s that even when they’re (mis-)used that way, sometimes the truth still lurks within.

I’m no great mathematician, although my daughter tells me that I light up when I’m working through problems with her to help her study. And it’s a sadness that when I studied maths in school, we focused on mechanics at the expense of statistics and probability.

I’ve picked up a bit of each since, although I’m still very rule-of-thumb. And every so often something comes up that simply delights me.

Benford’s Law was one such. I encountered it as a counter-fraud tool many years ago. For large number sets, it observes, the leading digit – that is, the lefthand-most one, denoting (say) the thousands in a four-digit number or the millions in a seven-digit one – is rarely an even distribution. No: a leading “1” is by far the commonest number, with a sharp drop to “2” and then a logarithmic curve flattening thereafter all the way to “9”.

Why is this useful in counter-fraud? Well, to make a fraud work, you often need to cook the books – to alter financial records. What are financial records but numbers? And when you make up numbers, or generate them randomly, you may well fail to make the statistical distribution of those numbers look right.

So if you’re looking at a data-set whose leading digits are evenly distributed – instead of, as Benford’s Law predicts, having as much as 30% of them start with a “1” – you ought to start getting suspicious.

I mention this having been pointed (by the ever-wonderful Charles Arthur) to a recent takedown of a seminal piece of counter-fraud research. The research, from 2012, posited that a measurable decrease in dishonesty could result from a simple change in how people sign declarations of honesty in documents. You know how at the bottom of a tax return, or form providing details for (say) insurance, you sign to say you’ve given accurate information? The research suggested that simply by putting the declaration at the top – that is, before you provide the information instead of afterwards – people would be significantly more likely to tell the truth.

Classic “nudge” theory at work, you might think.

Unfortunately, the authors themselves tried and failed to replicate their findings in 2020. They found anomalies in one of their key data sets, which they attributed to a “randomisation failure”.

No: as the new (and really smart and thoughtful) analysis says – conclusively, to my mind – the data in question was simply faked.

I won’t provide too much detail. The analysis is short, clear, and absolutely worth reading in full. To give just one example, it noted that the data (from a motor insurer) included two sets of mileage figures, both supposedly provided by drivers. But while the first set showed notable spikes in frequency for numbers ending either in “000” or “500” (that is: people roughly rounding their mileage to the nearest half-thousand, as you might well expect them to do), the second set was absolutely flat – as the graph reproduced below shows.

In other words: the same people were rough-guessing their mileage first time round, but giving it accurate to a single mile thereafter. Consistently. Everyone. Every time.

You’ve met humans. You tell me how plausible that sounds.

If anything, the analysis gets still more fascinating thereafter.

To their credit, all four of the 2012 authors recognise the problem, and have now retracted the 2012 paper. There’s no reason to think any of them were party to what now appears to have been an essentially made-up data set.

More importantly, they also agree with a core emergent finding of the writers of the new analysis. Research which doesn’t expose its underlying data (unless it’s absolutely impossible, say for personal privacy or safety purposes, to share it), isn’t to be trusted. Because it can’t be checked.

And given the reproducibility crisis, that just isn’t good enough.

I recognise that I seem to be straying a long way from the law, here – my usual stamping grounds.

But this is, to me, objectively interesting. There’s a beauty in the idea that those who lie with statistics may ultimately be found out by them too.

And I think there’s at least a small legal application – or at least a litigation one.

Numbers can be made to lie, sure. But equally, underneath the lying explanation there may be a true story begging to come out.

And – as we’ve discussed ad nauseam – advocacy is about story-telling. Don’t ignore the opportunity you have to use numbers to tell stories. If you can take a wall of impenetrable numbers, and – as the writers here have so lucidly done – use them to fashion a compelling, even shocking, narrative, which grabs the attention and answers the key questions, don’t waste it.

Not all of us advocates are numerate. Not all of us “get” statistics and probability. Some of us even misuse them – by accident or by design. But more of us should get it, and get it right. I know I’ve mentioned it before, but the Inns of Court College of Advocates guide, created with the help of the Royal Statistical Society, is a pretty good way to start.

(If you’d like to read more like this, and would prefer it simply to land in your inbox, please go ahead and subscribe at https://remoteaccessbar.substack.com/.)

2021iii24, Wednesday: Does it add up?

24th March 202124th March 2021 JSJ1 Comment

The slipperiness of statistics, and why us advocates need to learn to love numbers. Plus: wise words from the US on design.

Short thought: I have a problem with numbers: I like them.

Don’t get me wrong. I’m not a mathematician. My formal maths education stopped at A-level, decades ago, and has only restarted recently as I’ve sought to help my daughter with her GCSE maths studies through lockdown.

But numbers don’t scare me, and there’s an ethereal beauty to maths which always appeals. Which, I think, is generally a good attitude in an advocate.

Still, that’s the problem. I sometimes find it hard to understand just how daunting maths – and particularly statistics, perhaps – can be to many people. To be clear: that’s a failure of empathy on my part, not any failing on theirs.

Why “particularly statistics”? Because, I think, they can often defeat common sense. And while Darryll Huff’s seminal book “How to lie with statistics” overdoes it (Huff later became a key smokescreen for Big Tobacco, unfortunately), the fact remains that using stats to obfuscate instead of illuminate is an old and well-used trick because it works.

(Chart crime is a subset of lying with statistics, or perhaps an overlapping circle on a Venn diagram. Because often chart crime arises from negligence, not malice. FT Alphaville’s Axes of Evil series, from which the above illustration is drawn, is an excellent set of examples.)

A great illustration of the “common sense is wrong” problem is highlighted in a piece by a Conservative MP, Anthony Browne. (I don’t usually link to pieces by Tory MPs on ConservativeHome. But this, despite the clickbait headline about government policy, is really good.) Anthony says his constituents are up in arms because their kids are getting sent home from school on positive LFD Covid tests, and kept away even when they have a negative PCR test thereafter. Surely the PCR tests are gold standard? This can’t be right.

Well, yes it can, says Anthony. And he’s spot on. The issue arises because of the counter-intuitive way that false positives (getting a yes when it should be a no) and false negatives (the other way round) interact with large populations with a relatively low incidence of what you’re testing for.

Put simply:

Imagine a million kids, and 0.5% of them – 1 in 200, or 5,000 – have the Bug.
A positive LFD test is almost always right (only 0.03% false positives – only a tiny fraction of people told they have the Bug will prove not to have it), but a negative test is much more unreliable (49.9% false negatives – in other words, if you’ve got the Bug there’s a 50/50 chance the test will say you haven’t).
A positive PCR test is basically always right. But 5.2% of people with the Bug will get a negative result nonetheless.
Of the million kids (remember: about 995,000 are fine, about 5,000 have the Bug), the LFD will flag 2,500 of the kids with the Bug. (Yes, the other 2,500 won’t get flagged. But that’s a different problem…) It’ll also flag about 300 kids who are clean. Oops.
So 2,800 kids get sent home, along with their close contacts. Assume all 2,800 then have a PCR test.
The zero-false-positive thing means all 300 of the mistakes will get picked up. Yay! Back to school for them and their classmates?
Er… no. Here’s the problem. That 1-in-20 false negative rate means that about 125 or so of the 2,500 kids who DO have the bug will get a negative result as well.
So of the 425-odd kids whose PCR looks like they should be allowed back into school, a third of them are actually Bugged.

This, says Anthony, is why the government is right to disallow immediate return after a negative PCR. And I see his point. The stats are right, if utterly counter-intuitive.

What’s this got to do with advocacy? Well, so much of our work involves numbers. In crime, it’s DNA tests. In personal injury, it’s causation for some kinds of illness and injury. In commercial matters, we spend our lives poring over company accounts and arguing over experts who tell us what’s likely and what’s not. And an awareness of Bayesian reasoning can be a huge help when assessing whose story stacks up.

And if we don’t speak numbers, how can we possibly ensure our clients’ cases are properly put?

This point isn’t new, and the profession knows it. Working with the Royal Statistical Society, a couple of years ago it put together a guide for advocates on statistics and probability. It’s brilliant. Download it, and keep it as a ready reference. And – as I’m trying to do – find ways of illustrating probability that are transparent to people for whom this just isn’t straightforward, or that take into account the times when statistics boggle the common-sense mind.

One final word on Anthony’s piece, though. He rightly points out that these numbers change as the incidence drops. The false-negative rate in the above example, for instance, falls to less than 10% once the incidence of the Bug is down to 1 in 1,000.

But his overall point – that government policy is backed up by the numbers – has one big hole, it seems to me. As we noted, the false negative rate for LFDs is 50%. So even on our example, that’s 2,500 kids WITH the Bug who are in school, in the honest but mistaken belief that they’re no risk to anyone.

In other words, the reliance on LFDs for school testing is a false comfort – a form of pandemic theatre (akin to the security theatre that made air travel such a pain before it was wiped out by the Bug). And compared to that, quibbling over the 125 kids to whom the PCR has wrongly given the all-clear seems a bit pointless.

(An invitation: I like numbers, but I’m not a statistician. If I’ve got any of the above wrong – particularly the final bit about the 2.5k kids innocently swanning around leaking Bug everywhere – let me know and I’ll correct myself.)

Someone is right on the internet: As a follow-up on the font conversation on Monday, I’ve always been a fan of style guides. Not the ghastly prescriptive grammatical guides (Strunk and White, I’m looking at you); I mean the guides some publications craft to help their writers keep things consistent. Good examples come from the Guardian and the Economist.

These, of course, deal with words themselves, not the typography in which they appear. But a good friend (thanks, Ian) points to a guide published by the Securities and Exchange Commission in the US. It’s aimed at people creating investor notifications, for instance about listings, and spends a lot of time suggesting clear language (and is really good on that). But there’s also a chapter (chapter 7) dealing with design, which says wise and interesting things about fonts. Worth a look.

It also makes some worthwhile and entirely true points about layout: for instance, that a ragged right-hand margin is far more readable than a justified one. I’d love to adopt that one in my legal drafting. However, I suspect that if I hand in a Particulars of Claim, or a skeleton argument, with a ragged margin, I’m likely to get into even more trouble than I will by continuing to use Garamond. Baby steps…

(If you’d like to read more like this, and would prefer it simply landing in your inbox three or so times a week, please go ahead and subscribe at https://remoteaccessbar.substack.com/.)

2021ii17, Wednesday: the other shoe.

17th February 2021 JSJLeave a comment

So it’s four days since I took a test, three since the result. Not much in the way of symptoms. How long, o Lord, how long…

Short thought: This is weird. Day four (at least) of having The Bug (again, I think). And aside from a mild headache and some fatigue: nothing to speak of. Heart rate? Normal. Blood O2? Normal. Temperature? Normal*. I’m in limbo.

I confess I hadn’t really thought through what it would be like to get a positive test while asymptomatic. I realise that’s a failure of imagination on my part. But it’s odd. Here I am, self-isolating as best I can, knowing that in theory the clock runs out on that next Tuesday night – but also knowing, as far as I’ve been able to find out, that while symptoms mostly emerge within 5-6 days of infection, it could be a couple of weeks.

So if I was tested on Saturday, in theory I could be sitting here happily for another 10 days or so and still get the whammy at the end of it, even if that’s at the far end of the probability curve.

In the meantime: limbo. Bayesian reasoning doesn’t help, because I haven’t got any more useful info than I had on Sunday. The lack of major symptoms to date isn’t a helpful data point because of the lengthy incubation period. The fact that I’ve no idea when, before Saturday, I picked it up means that period in itself is unknowable. (Which leads me to rack my brain unhelpfully. Where was it? The 20 minutes in Waitrose last Thursday? The half-hour in Tesco the day before? The three minutes in the pizza takeaway on Friday night? The five minutes picking up coffee on Saturday morning? When was Day One, really? There’s been nowhere, and no-one, else. And the rest of the family have been even fewer places than me.) Perhaps if we get to the end of the week I can adjust my priors – but the potential risk to others is so high, I might discount even that.

Even then, is it a false positive? Am I an asymptomatic (mostly) carrier? Or are the T-cells from last time doing a good job this time round? No idea. No way of having one. Sigh.

So here we go. Sit. Wait. Wonder. Fret. But also thank God, the stars and whatever any of us believes in that – thus far at least – I’m getting off far, far more lightly than most. Amen.

(And yes, I know that given the above this feeling is a bit premature. Give me this one. Ok?)

*Thank you, Apple Watch. A gift to hypochondriacs everywhere, although in this instance pretty useful.

(If you’d like to read more like this, and would prefer it simply landing in your inbox three or so times a week, please go ahead and subscribe at https://remoteaccessbar.substack.com/.)

#RemoteAccessBar

Tag: statistics