# Hypothesis Testing III – The statistics

Posted Under: Research Methods,Scientific Thinking,Statistical Thinking

To continue, we need to define a couple of terms. The first is a probability density function and the second is a sampling distribution.A probability density function expresses a particular function in terms of integrals. Thus for a frequency distribution smoothed (over repeated sampling) to form a curve as shown below, the area under the curve can be calculated and the probability of a given value occurring in the distribution can be assessed as a proportion of the amount of the curve that is to the left and/or to the right of the value. In a normal distribution, z values are used to do this.

Using that model we can see that in a Normal Distribution approximately 31.73 of the values fall between plus and minus one standard deviation from the mean (average). So, for example, if the average height of a group of school boys is 155cm and the standard deviation is 5cm, we would expect a little over 68% of the boys to fall between 150cm and 160cm if height is normally distributed. We can go a bit further and say, if height is Normally distributed, the chances of beingOn the horizontal axis is the measure of interest (if we were studying height of kids it might be height in centimeters). In this depiction the average of the measurements is set to zero and the spread is measured as plus or minus one, two and three standard deviations. The vertical axis represents the frequency of occurrence of a particular point on the curve. The highest point, the average, is the one that occurs most often and as one moves away from the average, the frequency of occurrence decreases. Because this is a theoretical model, the tails (the extreme left or right points) are asymptotic; that is, they don’t touch. This is an important distinction between theoretical curves (such as the Gaussian or Normal Distribution) and the real world.Each statistic used to test hypotheses has a probability distribution associated with it.

It is from that distribution that probability distribution that we make our decision as to the sufficiency of evidence of an experimental effect.Recall that we originally formed two hypotheses, the research hypothesis (what we think the effect will be of our action) and the null hypothesis which states that there is no effect at all. We likened the hypothesis testing logic to that of the court room where the burden of proof rests with the person (organization) making the assertion of an effect That is, enough evidence of an effect (guilt) must be presented to cause us to reject the “no effect” starting assumption.The hypothesis is tested in a study or experiment and a numerical estimate of an effect is produced. For example if two suppliers are to be compared as to the relative quality of their products, a sample from each supplier might be obtained. The samples are measured and an average value calculated for the sample from each supplier. The null hypothesis might be that there is no difference between the two suppliers. The means are compared and a difference is calculated and a test statistic (in this case “t”) is generated.

There is, as we said, a sampling distribution for the statistic t. A probability distribution has been created showing the frequency of occurrence for different values of ‘t’ that would be obtained when repeated samples have been drawn from the same population. That is, when the null hypothesis is true. This is important. Our value for t is compared with values for t obtained when the samples are drawn from the same pool. That is, we know they are not from different samples; the null hypothesis is true by definition.There are as we said, two mistakes that can be made. We can say that the two suppliers are different when they really aren’t or we can fail to say they are different when they are.

We can set the probability of making the first type of mistake as the cut off point at which we will decide there is sufficient evidence to assume there is some experimental effect.Thus if we have a t value that has less than a 1% of occurring when the null hypothesis is true, we will say that this occurrence is unlikely and we’ll reject the hypothesis as being very unlikely. How unlikely? We’d expect a difference this large (or larger) less than 1 time in 100 occurrences.

Does that mean that the alternative hypothesis is correct? Not at all. It means that according to the criteria we set for evidence we have rejected the idea that the difference between the two means is due to chance. We could be wrong…in fact if we do enough of these types of comparisons, we will be wrong.Are we wrong in this case? If we had enough information to answer that question, we wouldn’t need to do the test.