Full disclosure: I’m currently taking a high-speed graduate class in Experimental Design from the Colorado State University statistics department. Right now, we’re working on methods for controlling Type I errors (“false alarms”) in cases where you might want to do multiple hypothesis tests.
For example, if you are testing sample means representing patient responses to three different disease treatments plus a fourth control treatment, you might want to test each individual disease treatment mean vs. the control mean to see whether patient responses under treatment are significantly different. This gives a total of at least 3 separate hypothesis tests. If you’ve designed each test so that your Type I error rate (the rate at which you erroneously decide that the treatments represent a genuine improvement when they really don’t) is capped at some value, usually 5%, then your overall Type I rate will be higher, maybe even as high as 15%. So some adjustment has to be made to your process to keep the overall Type I error rate controlled.
This writeup was done in an effort to collect and synthesize all the different methods for measuring and controlling Type I error into a digestible package. In short, I wrote this for my own understanding because, for some reason, I think much better about things when I write about them (because how can I know what I’m thinking until I see what I write?). I hope you find it useful.
Download Notes on multiple comparisons in statistical hypothesis testing