Just another statistic? A 3 part series
“There are three kinds of lies: lies, damned lies, and statistics.” This phrase was popularized in the United States by Mark Twain and others. It was (possibly mistakenly) attributed it to the British prime minister Benjamin Disraeli. [according to Wikipedia]
Part one – What have storks got to do with it? All explained in the blog.
Nowadays, we are being showered with numbers which we loosely call data.
Values, sums and statistics appear every hour of every day and can be incredibly confusing, misleading or even appear scary. It is not a new problem.
During the ’50s-‘70s, numbers, trends and averages/medians and/or collectively ‘statistics’ were so loosely used in advertising that people were often actively misled.
In 1955, commercial TV started broadcasting and it was the first time that claims made in advertisements, like statistics, were regulated in any way.
In the UK, this regulation evolved into the Advertising Standards Authority and informal codes of practice. This protects us and stops advertisers making false or misleading claims. For obvious reasons, it is a key role of the Financial Conduct Authority (FCA) as well.
So, what are statistics?
Statistics is the old name for data science! It is the collection, processing, and presentation of numbers (or data).
In use, a ‘sample’ of things is studied, or a ‘model’ (idea) is designed and, hopefully, tested with observed facts before being used to predict things.
Important Note: If something is new and there are no, or insufficient facts, then a model based on something which may be similar is used – for example flu model for coronavirus pandemic.
‘Road deaths leap by 100% this year!’ – how statistics can be employed for impact.
A made-up headline written about a rural area in the east of England. The headline is based on the per 100,000 people rate of road deaths was 0.01% in 2019 and which then rose to 0.02% in 2020. A change from or 1 death per year to 2 deaths per year.
While the risk of dying on the road is still very low, it would still be a doubling. Hence, the 100% headline is accurate if out of context.
However, are the figures correct and what defines a road death? A pedestrian killed on a pavement or a driver in a car? Were the causes of death the same in both counts?
This distinction is important because the risk you face maybe very low compared to your perception of risk from that headline.
How to see through the ‘lie’ or false impression?
Darrell Huff wrote a book first published by Penguin in 1954 called ‘How to lie with statistics’ and for scientists many decades later it was compulsory reading.
A key point of the book was that the derivation or source of the facts needs to be understood so that the statistical answer is relevant and in context.
Are you more likely to be killed on the road or in a train? In science, the ONE lesson to remember that if B follows A, it doesn’t mean that A caused B!
The book taught that the only thing to do was to dig deeper into the data to find the truth.
Example – Do storks bring babies?
The ‘context’ of a data sample can influence the result. In explanation, Huff describes this through a study about Storks and babies in Holland!
Assuming that you have heard the “old wives’ tale”, or perhaps better a superstition, about storks delivering babies.
However, statistics can show that the seasonal nesting of storks is linked to bringing human babies.
But, in context we see that people with families or about to have babies, tend to move to or live in bigger houses. Bigger houses tend to have more chimneys, storks nest on chimneys.
So, storks just be preferring bigger houses with more, bigger chimneys.
Today, a theme brought to life by Huff is that of selection bias – which is relevant to our lives and to insurance.
Learning the wrong lessons from what ‘the numbers tell you’ can significantly affect our day-to-day lives.
In the next week’s blog, we cover some thoughts on selection bias and how statistics impact us all.
HAPPY HALLOWEEN from Services Family Ltd