The scientific approach that characterises the western civilisation is based on reproducible empirical observations. The idea is that repeatedly observed relationships or differences are more likely to reflect reality. Another way of saying this is that, a proposition cannot be accepted unless it is supported by repeated observations, no matter how elegant and convincing it seems.
In Medicine, this culminated in the Koch's postulate defining the cause of a disease. It implies that the causative agent is always present if the disease is present, and in the absence of the agent so too the disease.
Our understanding of disease has now superseded Koch's postulate in many ways. Health issues cover a wider scope than very specific diseases such as tuberculosis. In addition, many adverse health conditions have both multiple causations and manifestations. We also understand that individuals vary in their susceptibility and resistance to illnesses which are often modified by broad and ill defined factors such as social class and lifestyle.
We have come to understand that even in something as specific as cancer, occurrence and progression depend on a combination of genetic susceptibility, aging, exposure to carcinogens, and the changing biochemical and cellular environment within the body. The exposure to a carcinogen will cause cancer to some but not all. Also, any particular treatment will benefit some but not all.
Increasingly, the complexity of advancing knowledge compels us to abandon the binary idea that something is either true or false. Rather we increasingly see true or false merely as extremes while most of reality is a continuum in between.
The uncertainty of reality needs to be approached in a consistent and logical way. Probability is a measurement of how likely things are to occur and is one of the ways to represent uncertainty. Statistics is the set of tools to handle probability.
The ancient Phoenicians were great traders and sea farers, but they tended to overload their boats. In stormy weather, goods had to be thrown overboard in order to save the ship. Owners of lost goods were then compensated by those who did not lose their goods, and the amounts involved depended on the total value of the goods. This arrangement was named havara, and this term evolved over the centuries to become average.
Currently, there are 3 ways to estimate the average. Mean is the sum of all observations divided by the number of observations. Median is the value of the 50th percentile, while mode is the value most commonly observed. Although median and mode are sometimes used, the mean is most common to represent the central tendency (average) in statistical calculations. If we have 4 measurements 1, 1, 2 and 3, then the mean is (1+1+2 +3)/4 = 7/4 = 1.75
Statistical calculations and interpretations are possible if one accepts the assumption that measurements have reproducible characteristics. Experience has indicated that this is so and the most important features of a set of measurement are its central tendency (mean) and the range along which the values are distributed (distribution). The most common and useful type of distribution observed in nature is the Normal Distribution.
History
The astronomer, Gauss, measured distances between stars. He noticed that it was difficult to reproduce his measurements exactly. However, the measurements clustered around a central value, more common near the mean, and becoming less common as they are further away from the mean. He concluded that any set of measurements would normally distribute to this pattern and called it the Normal Distribution.
De Moivre derived the formula for the Normal Distribution curve in mathematical terms. Once this was done, the features of Normal Distribution could be mathematically handled. Using simple calculus, the area under the curve (or any part of it) can be estimated. Fisher developed the concept further and used the area under the curve as a measure of probability. He argues that if the area under the whole curve be consider totality, then any portion represents the probability of the events that describe the portion. From this, he was able to calculate the probability of obtaining a measurement that exceeds a deviation from the mean value. He standardized the measurement of this deviation and called it the Normal Standard Deviate (z), later abbreviated to Standard Deviation, and derived the relationships between z and probability.
The area under the curve (probability) given a particular deviate value is complex to calculate so is usually obtained by a computer program or read from a table.
Another way to represent the distribution of measurements is to define a confidence interval; the range we can confidently expect our measurements to be within. In theory, the 100% confidence interval is infinite because the normal distribution stretches with decreasing probability to infinity. Therefore, the confidence interval is qualified by the percentage of measurements we expect to lie within it.
The most commonly used interval is the 95% confidence interval; the range within which 95% of our data points can be expected to lie. If the distribution is Normal, this means 2.5% of data are left out at each end, and the z-value for 0.025 is 1.96. The formula to calculate the 95% confidence interval is: mean ± 1.96 x SD.
In most situations, only a sample from a population is obtained and appropriate measurements taken. Inferences are then made from these measurements in order to describe the population from which the sample was drawn. For example, the sample mean is only an estimate of the population mean and therefore has a certain amount of error. This error is called the Standard Error of the mean or SE for short. Conceptually, SE represents the Standard Deviation (SD) of the mean values if repeated samples were taken. In other words, the mean value is calculated for each repeated sample from the population. The SD of these mean values are calculated which equals the SE of the mean.
If the mean is derived from the whole population, then the error is zero, as the mean is known exactly. If the mean is derived from a single observation, then the error of the mean is the same as the SD. (Note however, that SD cannot be calculated with one observation). Consequently, SE is a value between 0 and the SD. SE can be calculated by repeated random sampling of the same data (called bootstrapping) or estimated from the formula, SE = SD / sqrt(n), where n is the number of cases in the sample.
The confidence interval of the mean is the range within which we can confidently expect the mean value to lie. The calculation is similar to the confidence interval of measurements, except the Standard Error (SE) rather than the Standard Deviation (SD) is used. The formula to calculate the 95% confidence interval for the mean is: mean ± 1.96 x SE.
A reminder: SD measures the distribution of the observed data points in the sample. The confidence interval of the measurements is the range within which we can confidently expect any single measurement will lie. The Standard Error of the mean (Standard Error or SE) is an estimate of the distribution of the mean value in a theoretical repeated sampling to obtain many means. The confidence interval of the mean is the range that we can confidently expect the mean value to lie within.