The t distribution
and
comparison of two means

Contents


Introduction

The material covered in the probability and confidence interval pages should be understood or at least be familiar to you before proceeding.

A Randomised Controlled Trial (RCT) can be considered the most transparent and effective research model to compare the outcomes of two or more interventions.  The design is randomised because suitable research subjects recruited into the trial are randomly allocated to interventions (usually according to a series of computer generated random numbers).  Consequently, subjects in each group should not be different other than the interventions allocated.  Therefore, the difference between groups will reflect the difference between interventions.

Other outcome measures and model variations exist, but this page will only consider differences in means (normally distributed outcome) between two groups.  Note that the assumption of a normally distributed outcome cannot always be assured.  An introduction to the t distribution is presented before the confidence interval of the difference between two means is discussed.

Back to Top

The t distribution

William Gosset was a brewer, but had an interest in statistics.  He found the estimation of probability for the z deviate unreliable if the observations were few.   He derived a correction of the probability estimate according to sample size and called it t.  Gosset published his papers under the pseudonym of Student and this became known as Student’s t.

Student’s t allows the use of a small number of measurements to estimate what may be true of the whole population.  This forms the basis of modern inferential statistics, where a small number of observations are made, and the results are generalized to the wider population.

The t distribution curve is wider than the normal one.  Therefore, a larger area (or higher probability) of being greater than a particular deviate is obtained compared to the normal distribution.  This difference varies with sample size (degrees of freedom), such that the probability of t approaches that of z when the sample size increases.  Conceptually, this is represented by this diagram.  With infinite degrees of freedom (i.e., a large sample size), the t and z have the same value for a particular probability, but with fewer cases, t will be larger than z in obtaining the same probability.

When calculating t, a one sided or two sided area needs to be specify.  A one sided t is conceptually similar to the z, and assumes all the excluded values are on one side of the distribution, while a two sided t assumes the area excluded are on both sides of the t distribution, so that each side contains only half of the excluded area.  In calculations involving the confidence interval, the two sided t is usually used. 

Measurements provide far more information than counts or binary (yes/no) decisions, so statistical conclusion can often be made with smaller sample sizes.  As a result, the t distribution is used extensively when measurements can be considered normally distributed.

Back to Top

Comparing two means

Determining the difference between two means is commonly carried out in both controlled trials and in epidemiological studies.  Provided the measurements are normally distributed, this is one of the most powerful statistical tests available, in that, a clear decision can be made with very few cases.

Back to Top

Version 1.0  Last change 6th July 2006