11. Negative Binomial Regression Models Much like the Poisson distribution, the negative binomial distribution works with counts, that is the non-negative integers, 0, 1, 2, 3, 4, ..... One of the ways that counts are generated according to a negative binomial distribution is through a mixture of random variables each of which is distributed as Poisson, but with different expectations lambda{i}. If the expectations follow a gamma distribution, then you have a gamma mixture of Poisson random variables which follow the negative binomial distribution. The expectation of the negative binomial distribution is the same as the expectation of the Poisson distribution. However, the variance of the negative binomial distribution is (mu + k*mu^2) where k is called the dispersion parameter for the negative binomial. This parameter is estimated when you fit a model where you assume that the response is distributed as Negative Binomial. If k=0, then both the mean and variance are mu, recognizable as the mean and variance of a Poisson distribution. The Poisson is the negative binomial distribution where k=0. Overdispersion, as discussed in Section 9, exists when each observation is drawn from a Poisson distribution and has its own expectation. Such a model adds structure when the condition is imposed that the expectations are drawn from a gamma distribution. This feature generates data from the negative binomial distribution. PROC GENMOD will fit data to a negative binomial distribution with the folloing MODEL statement. PROC GENMOD DATA=mydata; CLASS ; MODEL counts = / DIST=negbin LINK=log TYPE3; run; You may decide to fit independent count data with both the Poisson and negative binomial models. How should you evaluate which one is better? It depends on the validity of your assumptions. What would make for a 'better' model? Is it reasonable to model your data as either one of these distributions? What is the underlying theory behind your choice? How are you selecting regressors? To fit independent count data to a negative binomial distribution, here is a simple example to estimate the mean and dispersion parameter. DATA this; INPUT y @@; cards; 1 2 2 2 3 3 3 4 4 4 5 6 6 6 7 8 9 11 ; PROC GENMOD data=this; MODEL y = / DIST=negbin LINK=log; ESTIMATE 'Mean' intercept 1 / exp; run; Criteria For Assessing Goodness Of Fit Criterion Value Log Likelihood 49.1923 NOTE: -2 * Log Likelihood = -98.4 (see OUTPUT below and the likehood function in NLMIXED) Analysis Of Parameter Estimates Likelihood Ratio Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 1.5640 0.1295 1.2939 1.8312 145.83 <.0001 Dispersion 1 0.0926 0.1012 -0.0398 0.4147 NOTE: The negative binomial dispersion parameter was estimated by maximum likelihood. Contrast Estimate Results Standard Chi- Label Estimate Error Alpha Confidence Limits Square Pr > ChiSq Mean 1.5640 0.1295 0.05 1.3101 1.8178 145.83 <.0001 Exp(Mean) 4.7778 0.6188 0.05 3.7067 6.1584 In addition to estimation of the mean with PROC GENMOD you can compute mean and dispersion parameters for the negative binomial from NLMIXED: PROC NLMIXED DATA=this; PARMS b0 = -1; eta = b0 ; mean = EXP(eta); /* loglikelihood of response given mean */ loglike = y*LOG(k*mean) - (y+(1/k))*LOG(1+k*mean) + LGAMMA(y+(1/k)) - LGAMMA(1/k); * - LGAMMA(y+1); * since the last term does not depend on parameters, the estimated coefficients are the same with or with out it, however, the likelihood statistic changes; MODEL y ~ general(loglike); ESTIMATE 'mean' exp(b0); TITLE1 'Negative Binomial estimation with NLMIXED'; run; Fit Statistics -2 Log Likelihood = -98.4 NOTE: -2*loglikelihood matches GENMOD result when last term of the likelihood is omitted; Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Lower Upper Gradient b0 1.5640 0.1295 18 12.08 <.000001 1.2919 1.8361 -1.89E-6 k 0.09261 0.1012 18 0.91 0.372339 -0.1201 0.3053 -4.17E-6 Additional Estimates Standard Label Estimate Error DF t Value Pr > |t| Alpha Lower Upper mean 4.7778 0.6188 18 7.72 <.000001 0.05 3.4778 6.0778 Note that k > 0 which implies the variance of the sample data cannot be less than the mean for a given set of counts. If k < 0, then estimation of Neg Bin may not be appropriate.