AN INTRODUCTION TO GENERALIZED LINEAR MODELS 1. Why Generalized Linear Models are "Useful" Advances in computing technology and combined with the development of statistical theory into applications offer many more choices for data analysis today programs than existed even a few years ago. Researchers are now in the position to make informed decisions about which particular technique to apply. However, the fundamental steps to data analysis remain unchanged: 1. State hypothesis based on your research objects 2. Develop a statistical design to collect data 3. Conceptualize a model based on your design and hypothesis 4. Estimate parameters for this model from sample data 5. Test your hypothesis for significance The industrial statistician George Box, perhaps best known for his books on time series and experimental design, once wrote, "All models are wrong, but some models are useful" (Box, 1979). Reflecting on item #3 above, I would also add to that statement "some models are more useful than others" and even more to the point, "some models are more appropriate than others." One of the potential weaknesses of any statistical model is accepting the output at face value without a review of the assumptions that form the foundation of it. Just because SAS (or any other program) produced results doesn't mean it is necessarily correct or the best technique you could utilize. The premise is that one should always be able to understand and be able to explain the technique applied; however, the tradeoff is that you should not ignore the "appropriateness" of the model compared with other possibilities. Analysis of Data from non-normal Distributions as if they were Normal One of the seven "problematic" areas in statistical analysis summarized at http://www.uoregon.edu/~robinh/seven_prblms.txt is closely connected with another problematic area, "Indiscriminant Application of Regression and Analysis of Variance", that is, techniques based on the normal distribution of the residuals for inference. Failure to select an appropriate model based on the distributional assumptions of the data is the real concern. That is data are not necessarily "normally" distributed over an "infinite" range with "constant" variance. Response data, whether they are dichotomous, ordinal, counts, proportions, certain types of continuous measurements, or data in general whether they follow the normal or some other distribution are more appropriately analyzed under the terminology of generalized linear models. A simple example to be explained in Section 6 concerns the computation of an odds ratio which is a convenient way to summarize dichotomous, ordinal, and multinomial data along with a mixture of categorical or continuous covariaties. If you have collected only one observation for each subject these techniques have been incorporated into PROC LOGISTIC. However, many types of generalized linear models can be analyzed with PROC GENMOD, including discrete data collected as repeated measures.