Prediction is widely used in clinical practice and is possible whenever a history of observations are taken, an examination made, a test performed, or a diagnosis made. The prediction process determines what is likely to happen (or has happened) based on the observation made. Prediction statistics contain a number of procedures that evaluate the quality of a particular prediction. It determines, for example, how well glucose in the urine predicts abnormally high blood sugar, or how well some calcification seen in a mammogram predicts the presence of a carcinoma in the breast, or how well tenderness over the Right Iliac Fossa predicts that an inflamed appendix exists.
Predictive tests which involve two measured outcomes (e.g., how well urinary glucose concentration predicts blood glucose concentration) is best evaluated by correlation or regression analysis.
Where binary (yes/no) observations are used to predict measured outcomes (e.g., how well the sex of an individual predicts the maximum weight a person can lift), the confidence interval between two means is probably the best way to evaluate the relationship.
The term “prediction statistics” is usually applied to situations where the outcome to predict is binary (e.g., yes or no, life or death, presence or absence of a particular illness). Observations that help to predict can either be binary or a continuous measurement. Binary predictors will be covered in detail, as this is basics of prediction statistics. The following sections (except the last) will be related to binary predictors. The evaluation of continuous predictors is based on the Receiver Operator Characteristics (ROC), which is mathematically more complex and will only be briefly introduced in this course, in the last section of this page.
The basic evaluation of prediction involves how well a binary observation can predict a binary outcome, such as how well right iliac fossa tenderness can predict appendicitis.
An important thing to remember is that two separate but related processes are involved in the evaluation of a particular prediction or test. The first is the relationship between a positive test and positive outcome, how well the presence of tenderness predicts appendicitis. Equally important is the relationship between a negative test and negative outcome, how well the absence of tenderness predicts that appendicitis is not present. The two must be considered simultaneously because a test that predicts a positive outcome well may be poor in predicting a negative outcome, and vice versa. A truly good test must be accurate in predicting both a positive and negative outcome.
Nomenclature
A test is an observation that is used to predict an outcome. The test can be positive or negative.
An outcome is an observation or a final conclusion that the test predicts. It can be positive or negative.
Schematically, the relationships between test and outcome can be represented in a 2 x 2 table. The first and second column contains the number of cases that are outcome positive and outcome negative respectively. The first and second row contains the number of test positive and test negative respectively.
Cases that test positive and outcome positive are designated
True Positives (TP). Those that are test negative and outcome negative are designated True Negatives (TN). Those that
are test positive where
the outcome is negative are designated False Positives (FP). Those
that test negative where the outcome is positive are designated False
Negatives (FN).
|
Outcome Positive |
Outcome Negative |
Total | |
|---|---|---|---|
| Test Positive |
True Positive (TP) |
False Positive (FP) |
TP + FP |
| Test Negative |
False negative (FN) |
True negative (TN) |
FN + TN |
| Total |
TP + FN |
FP + TN |
TP + FP + FN + TN |
Sensitivity and Specificity define the quality of a test and are not affected by the prevalence of the outcome. Sensitivity is the proportion of those with a positive outcome that have a positive test. Specificity is the proportion of those with a negative outcome that have a negative test.
A Program is available at the end of the page to calculate Sensitivity and Specificity
Although Sensitivity and Specificity are good indicators of the qualities of a test, they say very little on how useful a test is. The reason is that the usefulness of a test is only partly related to the quality of the test, and partly to the prevalence of the outcome. A highly sensitive test will nevertheless be not very useful in a population where the outcome that the test predicts has very low prevalence. To determine how useful a test is in a particular population we need the Positive and Negative Predictive Value.
A Program is available at the end of the page to calculate both Positive and Negative Predictive Values
Pre-test and post-test probabilities are alternative expressions of prevalence and diagnostic values. Some find them intuitively easier to understand.
Pre-test probability is the overall probability of a positive outcome which is the same as prevalence.
Pre-test probability = (TP + FN) / All cases
Post-test probability is the probability of a positive outcome after a positive test or after a negative test.
Post-test probability (positive test) = TP / (TP + FP) = Positive Predictive Value
Post-test probability (negative test) = FN / (FN + TN) = 1 - Negative Predictive Value
Using the data from example 1 from the Positive and Negative Predictive Values section
True Positive (TP) = 90, False Negative (FN) = 10, False Positive (FP) = 5 True Negative (TN) = 95
Pre-test probability = Prevalence = (90 + 10) / (90 + 10 + 5 + 95) = 0.5 or 50% of those surveyed are academics
Post-test probability (positive test) = Positive predictive value = 0.947 or 94.7% of those with PhDs are academics
Post-test probability (negative test) = 1 - Negative predictive value = 1 - 0.905 = 0.095 or 9.5% of those without a PhD are academics
Sensitivity, Specificity, predictive values, and the pre and post test probabilities are all estimations of proportions (probabilities). Two methods exist to calculate the Confidence Interval (CI) for proportions. The first method approximates the normal distribution with a mean equal to the proportion and a Standard Error related to sample size and the proportion. Although this suffices in most circumstances, a loss of precision will occur if the number of cases involved is small. An exact CI is based on the binomial distribution and is slightly asymmetrical. Computationally, it is more complex and when computer programs are available for its calculation, it is the preferred choice.
The formulae are obtained from Ref: Altman DG, Machin D, Bryant TN Gardner MJ (2000) Statistics with Confidence (Second Edition). BMJ Books IBSN 0 7279 1375 1 p. 46
A Program is available at the end of the page to calculate confidence intervals for proportions
Likelihood ratio is the ratio of the probability of getting the prediction result amongst those who are outcome positive, and the probability of getting that prediction result amongst those who are outcome negative.
The Likelihood Ratio for the positive prediction or test is the ratio of probability of getting a positive prediction or test amongst those who are outcome positive and those who are outcome negative. The greater the ratio, the more useful is the test in identifying those who are outcome positive.
The Likelihood Ratio for the negative prediction or test is the ratio of probability of a negative prediction or test amongst those who are outcome positive and those who are outcome negative. The smaller the ratio, the more useful is the test in identifying those who are outcome negative.
A good test is therefore one with a high Likelihood Ratio for positive test and a low Likelihood Ratio for negative test. In other words, this means that those who are outcome positives are more likely to get a positive test result and less likely to get a negative test result, when compared to those that are outcome negative.
A Program is available at the end of the page to calculate Likelihood Ratios for test positives and negatives
History
Receiver Operator Characteristics (ROC) describes how well a continuous measurement predicts a binary outcome. The mathematics was first described during WWII when the radio receiver operator had to interpret the incoming radar signals and decide whether German bombers were approaching the English Channel. An over enthusiastic response to weak signals or electronic noise created false alarms and wasted resources and exhausted the fighter crew. A too cautious response on the other hand resulted in a delayed response so that more bombers could penetrate the defence. Each receiver operator was therefore evaluated with a range of signal strengths and their responses to them were described by the ROC.
Concept
Imagine the predictor as a continuous measurement. Outcome positive and outcome negative is defined with two different means of this measurement. Data is then collected from these two groups which represent two overlapping Normal distribution curves. A cut off value is used to decide (predict) which measurement belongs to which group.
In the outcome positive group, measurements with values higher than the cut off will be correctly diagnosed (as outcome positives) and represent the true positives (TP). Cases below the cut off are erroneously diagnosed as outcome negative and represent the false negatives (FN).
In the outcome negative group, measurements with values below the cut off will be correctly diagnosed (as outcome negatives) and represent the true negatives (TN). Cases with predictor values higher than the cut off are erroneously diagnosed as outcome positives and these represent the false positives (FP).
From TP, FN, TN, FP, the Sensitivity and Specificity can be calculated. The concept can be represented by this diagram. From this, it can be seen that, if the cut off value changes, then Sensitivity and Specificity also change.
If the cut off is set to the lowest value possible, then all cases will be predicted to be outcome positive, and none will be predicted as outcome negative. All those that are outcome positive will be correctly predicted, and all those that are outcome negative will be erroneously predicted. The Sensitivity will be 1 and Specificity will be 0.
If the cut off is set to the highest possible value, then all cases will be predicted to be outcome negative. All outcome positive cases will be wrongly predicted, while all outcome negative cases correctly predicted. The Sensitivity will be 0 and Specificity will be 1.
As the cut off changes from the lowest to the highest values, the Sensitivity will decrease while the Specificity increases. The line formed by these changes is called the Receiver Operator characteristics and conceptually it is represented by this diagram. Given that the ROC diagram has two sides with values 0-1, the total area described is 1. The area under the ROC curve therefore summarises the quality of prediction over the whole range of possible measurements.
In the case of perfect prediction, the outcome positives and outcome negatives do not overlap in regards to the predictor values. Here, the ROC “hugs” the side of the diagram which means the area under the ROC equals one (1). An illustration is shown in this diagram.
In a completely useless predictive test (i.e., guessing), the outcome positives and outcome negatives completely overlap. Sensitivity decreases at the same rate as Specificity increases or visa versa. Here, the ROC is represented by a diagonal line on the diagram. The area under the ROC is 0.5, as shown in this diagram.
Calculating ROC
The mathematics involved in the construction of the ROC is complex and will not be covered in this page. For those interested in the mathematics, an excellent paper to read is Hanley JA, McNeil BJ (1982) The meaning and use of the Area Under a Receiver Operating Characteristic (ROC) curve. Radiology 143:29-36
For those interested in using ROC, MRSC has developed resources on the Internet that can be used. These links can be found through the www.materresearch.org site on the Internet.