Posted Under: Research Methods,Scientific Thinking,Statistical Thinking
Briefly, there are many ways to estimate the variation of a sample of values. One is to pull out your calculator and follow the instructions for calculating the standard deviation. Typically that is what they call σ (the instructions usually have that wrong. n-1 is used for s not σ ). That estimate is of all the variation and includes both common cause variation and special cause variation if it is present. It is not the estimate we want to use to set control limits for a control chart used to detect special causes.
The foundation of the control chart:The estimate that gives us the best chance of detecting special causes (if they are present) is an estimate of common cause variation alone. We don’t want to overestimate it (it will make the control limits wider than they need to be and increase the chances of failing to see special causes when they are present). Nor, do we want to underestimate it (that will make the control limits narrower than they need to be and increase the chances of saying there was a special cause when there was not).Remember these mistakes are unavoidable except in the extreme (if you don’t ever want mistakenly call something a special cause when it’s not, then don’t call anything a special cause or, conversely, if you don’t ever want to miss a special cause, call every point a special cause). But, in eliminating one mistake entirely, you make the other as often as possible.
The idea, as Shewhart point out, is to find a balance point. Commit one error once in a while, commit the other error once in a while. He found that balance point to be at +/- 3 σ. So for our chart we want to estimate the common cause variation to set the limits. That is the purpose of sub-grouping.Generally the idea behind constructing a subgroup is to do so in a way that minimizes the occurrences of special causes within subgroups.We want special causes, if they are present, to occur between subgroups.
Thus, variation is estimated by using the within group variation and the limits based on that estimate give us the best opportunity to balance those mistakes.So we use R-bar to estimate σ .The factor for doing this is d 2. R-bar divided by d2 is an estimate of σ. Then we multiply that estimate times three and add it to and subtract it from, the grand mean (X-double-bar) to set the limits.If the limits are to be for averages (as in an X-bar R Chart where the subgroups are independent -( i.e. not based on a moving range) – we use the A2 factor which does the math of multiplying R-bar by 3 and dividing by √ n x d2. ( see central limit theorem to see why we need √ n. It has to do with the relationship between the variation of individual values and the variation of averages taken therefrom).
In an individuals chart we do not have a natural sub-group because we are only getting one value at a time. So we make up a subgroup by using a moving range of 2. Since sub-group size is always 2, we can eliminate A2 use (it is a constant @ 2) and simply multiply the average moving range by 2.66 and add that to and subtract it from X-bar (remember these are individual values) to set the limits.
Some statisticians argue against putting limits on these kinds of charts at all.Therefore:Our 2.66R-bar estimate is of 3 sigma for individual values, so we can use it to estimate the standard deviation for other purposes. But it is based on sub-group size two.How can we estimate the standard deviation of individual values for large sub-groups? That is what E2 is for.E2 is used to estimate the standard deviation of the individual values from data that has been sub-grouped in subgroup sizes larger than 2. It is not particularly meant to be used to set limits for control charts as far as I know because using an individual chart with a moving range larger than two is a bad idea. It exacerbates the problems of the chart. First that would increase the chance of having a special cause within a subgroup (which is unavoidable anyway….) and second the assumption of independence of subgroups starts going out the window. But E2 could be used to do so for an average moving range situation where the moving range is based on a group size larger than 2. At subgroup size 2, E2 and A2 yield, of course, the same result.But charts for individuals with subgroup size larger than 2 are rare. I don’t believe I’ve ever seen on in practice, and if I did, I would advise against it.
Finally, this all begs the question, why do we want to know the standard deviation of the individual values if it isn’t used to set limits.We do use the value for what is called tolerancing. Tolerancing is an effort to define nominal values for manufactured parts and acceptable levels of variation for (among other things) mating parts. For example when manufacturing a head/disk assembly as the HAD is assembled each part in the head, arm, disk, stack-up has variation associated with it. These variations are additive and the total variation has to meet certain requirements lest it be subject to too much variation, wobble, out of round, read/write errors, etc. The drawings of the HAD will have all the parts and acceptable limits for variation for each of the parts. These variations are those associated with individual values.Thus the E2 factor is used to estimate that variation and is said to give an estimate of the Natural Tolerance Interval.The source of d2 is covered in Quality Control and Industrial Statistics by Acheson Duncan and the whole subject is covered in Wheeler’s Advanced Topics in Statistical Process Control.