Measure and Normality

You shall do no wrong in judgment, in measures of length or weight or quantity.

You shall have just balances, just weights, a just ephah, and a just hin

I am the LORD your God, who brought you out of the land of Egypt.

Leviticus XIX, 35-36
Measure means essentially to translate a property of an object (real or abstract) in a symbol, generally a number.
We can consider the measurement a bridge amongst the ethic metaphor, the judiciary metaphor and mechanical  metaphor. The balance, archetype of all the instruments of measurement, is universally recognized as a symbol of the Justice. Of a wise person, we use to say he’s a “measured person”.
Both words of “measure” and “mind” have the same Indo-European root “MA”, meaning “to ponder” “to think” “to measure”. The Indo-European word “MANUS” means “man”, “who ponder”, “who think”.
Therefore, it is comprehensible that the measures and the quantitative data are considered “objective”, id est for their own nature the data obtained from measurements are perceived as fair, equitable, unbiased and impartial, because “pondered”, “thought”.  This deep inborn fairness of measurement is present in all societies, and during the course of French Revolution reached the acme with the introduction of The International System of Units.
In the concept of measurement is, therefore, inborn the concept of objectivity, justice, and impartiality. Indeed, the word “measure” come from the Greek word “mesotetos”, meaning “middle path”, “the doctrine of the Mean”.
Notwithstanding the measurement had been always considered scientific and impartial, we must note it was, in the past more than ever, burdened by a pronounced subjectivity and by the impossibility to be compared.
The universal system intended to eliminate, or at least to mitigate, the subjective vision of measurement and to bestow it an impartial and scientific value.
If on the one hand the abstraction bestowed by universal system allowed to overcome the subjective and “particularis” character of the measurement, on the other hand it rose the problem of the dehumanization of itself: the international units are completely devoid of any meaning and relation to he daily activities and problems.
Before the advent of the Universal System, a same physical magnitude, e.g. the length, could be measured by means of different measure’s units, depending on the circumstances. Thus, small lengths could be measured in inch or in steps. The terrestrial mile, utilized by the roman infantry in order to measure roads, amounted to a thousand double steps of a roman soldier. Instead the roman cavalry used the “leagues” to measure distances, better adapting to the speed of the horses. A nautical mile was different from a terrestrial mile, because better adapting to the path and speed of a ship. A tailor could measure fabrics by means of cubits. A cubit was the length of the forearm from the elbow to the tip of the middle finger (in latin “cubitum” is the “elbow”)
Some archaic units of measurements could appear weird to the present reader, yet they are comprehensible in some contexts. For example, some Saharan tribes used to calculate the walking distance in “days of march”. The natives of Greenland, the Eskimos, in the absence of the alternation of day and night used to calculate the distances in “sleeps” (“sinik” in Eskimo language), i.e. the number of nights that a journey requires. It's not a fixed distance. Depending on the weather and the time of year, the number of sinik can vary. It's not a measurement of time either. Sinik is not a distance, not a number of days or hours. It is both a spatial and a temporal phenomenon, a concept of space-time, it describes the union of space and motion and time that is taken for granted by the Inuit but cannot be captured by any western everyday language
An even more curious unit of measurement was the “number of brays of a donkey”, typical of some desert regions in Middle-East. The bray of the donkey typically lasts for twenty seconds and can be heard for over three kilometres.
Analogous considerations are valid for the units of measurement of volume: a glass of water, a pint of beer, a barrel of wine are all not universal units, yet “natural” because they match precise and useful object of daily life. Nobody think to drink 200cc of water:  we drink a glass of water!
Obviously, as we said before, the main drawback of the natural units of measurements is the impossibility to confront them for two reasons:
–    They have different meanings. It is not useful propose to Eskimo people a “march day” as unit of measurement. It’s unworthy to measure roads by means of cubits, yet possible: the fabrics are nothing in common with roads.
–    Also even if we use the same measure, we shall not able to confront the obtained numbers. The cubit varies from tailor to tailor, being the forearm different from each other. The roman mile changes in relation to the foot of a roman soldier. Etc. etc.
The Universal System represents a Copernican revolution: the man and his activities cease to be the measure of all things. The measures become aseptic and impersonal. If natural units offered to the man some qualitative comprehension of the measured object, the universal units do not bestow any qualitative information. For example, a surface of a land expressed in square meters does not bestow any indication about its morphology or its productivity. Indeed, a unit of measurement using “days of sowing “ (days necessary to sowing the entire land) or using “the return in wheat” may offer a more adequate idea  of the intrinsic value of that land.
In other terms, the “natural” unit of measurement keep some qualities of the measured object, qualities not preserved using the aseptic universal units. Their “naturalness” is linked to the relations they have with the daily activities. They put the man and his activities in the centre.
On the contrary, universal units, though resolving the necessity of generalisation and abstraction, are completely devoid of any reference to the daily aspects of life. In 1960 the meter was defined in the following way: equal to 1,650,763.73 wavelengths in a vacuum of the radiation corresponding to the transition between the 2p10 and 5d5 quantum levels of the krypton-86 atom.
But such international units have no practical interest for humans. The cubic, the league, the mile Yes, they have.

The fundamental difference is that “natural units” keep a strong relation to the quality of the measured object, instead the “universal units” abstract from an object only a property, e.g. the length, without accounting for the utility of that object and for its qualities.
In the following verses, from an Italian poet, are described extremely well the reaction to the introduction of the abstract universal units of measurements (in this case the sundial):
Be Damned who the clock invented,
And be Goddamned who the sundial here placed!
Poor me, he has torn my day to pieces!
In youth the stomach was the sole sundial,
The best and the most exact of all, beyond contrast.
When it says the word I went to eat, in case food there was:
But now, yet food there is, I may not eat till sun decides.
The first instrument to measure the time was the Man, or better the shadow his erect body projected on the earth during the day. Indeed, wanting to ask “what time is it” they used to ask “How is your shadow”?
Today the international unit of the time is the second, defined as the duration of 9,192,631,770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the caesium 133 atom.
Along with the introduction of universal system and along with the complete trust in the objectivity of such measurements, started to appear the so-called “numerical doctors”, who promoted the application of measurements to clinical practice and to the evaluation of the effect of the therapies.
Today, due to the introduction of computers, software and sophisticated instruments, the clinician is daily bombarded by a series of abstract numbers, to whom he’s not always able to attribute any clinical significance. We are facing a very “measuring virtuosity”, able to dull our capacity of understanding.

Knowing that the average BIC (Bone Implant Contact) obtained in monkey around an implant X is 65% and the average BIC around an implant Y is 62% has any clinical significance?

Knowing that a dental adhesive "X" got 35 megapascal in traction test in an vitro study on sound dentin and other adhesive get "Y" got 31 megapascal in the same condition has any clinical importance?

Knowing the bi-dimensional and arbitrary cephalometric parameters bestows any useful information to the clinician? 
Knowing that ANB is 2 degree, SN-PM is 20 degree bestow us any useful information?

Or they are just a numerical virtuosism with no relation to clinical efficacy and utility?

The hazard we are risking is the complete dehumanization of the decisions and procedures, based increasingly upon a tyranny of numbers, which are supposed to be “neutral” and “unbiased”, whereas they often contain deceitfully some subjective assumptions, not immediately evident.

A consequence of the introduction of the universal unit system and of the objectification  of measurements is the introduction of the concept of “normality”. The universal system is, indeed, characterized by a basic concept: it exists, or it can be defined, an “universal standard” to whom refer all the measurements. It was unavoidable that such standard (or norm) was also  looked for and extended to the human health and diseases.
Quételet (1796-1874), Flemish astronomer and statistician , described the concept of normality using statistics. He, starting from the theory of errors and from the bell-shaped-distribution of errors, applied universally these assumptions to all variables.
Analogously to the theory of errors, Quételet attributed to the individual variations the character of accidents due to several casual factors, whose effects tend to nullify each other for compensation.
Such interpretation of the biological fluctuations by means of the calculus of probability led to the formation of a concept of “average man” and “mean value”. Nay, in his opinion, the existence of a mean and Gauss curve were the unequivocal signs of the existence of an ontological regularity expressing itself. Thus, the disease has been considered a deviation from the “norm”.
The “numerical doctor” is often similar to that shoemaker, who, after having measured a thousand different feet, calculate the mean and then decide to make and fit an average shoe to all his clients.
The classic frequentist statistics cares an average patient, who, as an “average man”, is a pure mathematical fiction.
At this point is licit to wonder: what does it mean “normal”?  how do we measure normality? Where do we trace the border amongst normality and abnormality?  Can we define an universal standard in health and disease?
Physician and philosopher of medicine Edmond Murphy famously distinguished “seven meanings of the word ‘normal’ important in medicine:

1 Having a Gaussian distribution
2 Most representative of its class (average, median, modal)
3 Commonly encountered (habitual)
4 Most suited to survival and reproduction (optimal or “fittest”)
5 Carrying no penalty (innocuous or harmless)
6 Commonly aspired (conventional)
7 Most perfect of its class (the moral ideal, the aesthetic ideal, etc.)



So, what is normal? the gaussian average? or the most representative? or the most habitual and common? or the fittest? or the most innocuous? or the ideal? 
For the countless continuous variables used in clinical diagnosis, there is a standard methodology for determining the reference range reported in handbooks, textbooks, and lab reports. This procedure has nothing to do with risk, or even directly with disease. First, a sufficiently large sample of apparently healthy people is chosen — either healthy in general, or free of a particular kind of disease – and variable “V“ is measured in each by some uniform method. Second, one sees whether the resulting data are close to a normal (= Gaussian) distribution. If so, the reference range is chosen as [μ – 2σ, μ + 2σ]. That is, in the old terminology, the “normal range” for this variable includes everything within two standard deviations of the mean. This range constitutes a central ~95% of the sample, with two “abnormal” tails of ~2.3% each. If the data do not look bell-shaped, as Feinstein says is usually the case, there are two alternatives. First, one can test whether they do so after some transformation. For example, some variables turn out to have a log-normal empirical distribution, which means that the curve becomes Gaussian when each value is replaced by its logarithm. Or one might find that cube roots or arcsines do the trick. In such cases, one can find the interval μ ± 2σ in the transformed data and then undo the transformation to get a reference range. If the data cannot easily be transformed into a normal distribution, then simple percentiles can still be used to come up with the desired central 95%.
Rather, the main objections to calling a reference range “normal” are two features inherent in the methodology used to derive it. First, it was derived from an apparently healthy population, with sick individuals excluded. But the method guarantees that 5% of that population will become abnormal at the end, seemingly contradicting the original assumption. Feinstein writes:
"Having been selected for their normality, the people might be expected to remain “normal” after the numerical analyses, but the statistical procedure is remorseless. No matter how medically “normal” the people may have been, 5% of them must emerge as “abnormal” after the statistical partitions"
Of course, one can reasonably assume one’s initial judgments of health to be imperfect, since some seemingly healthy persons harbor hidden disease. As Revel says,
“a small but definite group of clinically normal persons may have subclinical or undetected disease and may be inadvertently included in the supposedly normal group used to establish normal values”
But there is no general reason to assume that these hidden sick constitute 5% of the population rather than, say, 0.1%, 1%, 10%, or 25%. So the second objection seems conclusive: choosing a reference range to exclude a symmetrical 5% is wholly arbitrary. Feinstein writes:
"The usual statistical partition of the zone called the “range of normal” depends on three arbitrary judgments about proportions, location, and symmetry. With the first judgment, we decide that 1 in every 20 values ([5%] of the total array) are sufficiently uncommon to be regarded as “abnormal,” i.e., beyond the “normal” zone. In the second judgment, we decide that this 95% zone of normality will be located in the central portion of the ranked array of numbers, rather than at one of the extremes. In the third judgment, we decide to place the 95% in the exact centre of the distribution, so that the remaining 5% of the values are divided symmetrically, with 2.5% at one end and 2.5% at the other.
These judgments are completely arbitrary. As Murphy points out, the strategy “contrary to popular opinion … is not a recommendation of statisticians, and … has no support from statistical theory.”
Feinstein goes on to claim that 95% limits in laboratory medicine arose from their common use in hypothesis testing and estimation. Now even if we waive all objections to classical statistics, in significance testing the demand for “p-values” < .05, as opposed to, say, < .25 or < .001, is a primitive effort to weigh the relative significance of type-I and type-II errors. Even so, no similar pseudo-decision-theoretic analysis carries over to the clinical use of reference ranges as bounds of “normality.” Ravel is right to say that 5% is just
“a deliberate compromise. A wider normal range (e.g., ± 3 SD) would ensure that almost all normal persons would be included within normal range limits and thus would increase the specificity of abnormal results. However, this would place additional diseased persons with relatively small test abnormality into the expanded normal range and thereby decrease test sensitivity for detection of disease.”
But there is no principled reason for choosing 5%, or any other number, as best in general, so the compromise, however deliberate, is baseless.
In sum, the moral of this section is that there is no reason to believe that a clinical measurement outside a standard reference range must be pathological in itself, or even strong evidence for any disease or pathological condition. On the contrary, not only can values outside the range be fully consistent with complete health, but reference-range methodology is almost certain to place some percent of the population in this category. I will let Galen have the last word:
"It is impossible … to evaluate a data set of reference values and select a suitable reference interval that will be meaningful for the practice of medicine. The reference interval, no matter how derived statistically, tells us nothing about disease".
Thus, if we call the 5% on each test abnormal, it follows that a normal person is anyone who has not been sufficiently investigated!