[pystatsmodels] glm.nb() in python statsmodels

j***@gmail.com

2018-11-25 15:00:51 UTC

Hi!
I am interested in the dispersion parameter from a Negative binomial distribution after fitting the model. I know that R has the glm.nb() function which gives you init.theta parameter in the results. I have been digging in statsmodels trying to reproduce the results of the theta R parameter but I couldn't make it. At the end I had to use rpy2 to run the same R function in my code, but is really slow. Does anyone knows how to obtain this parameter in pure python/statsmodels?? Thank you very much for your time!

GLM only implements the one parameter families. Except for `scale`,
the extra parameters are not estimated.

NB has variance = mu + a * mu**2
The `a` parameter is set in the family and taken as fixed in the
estimation. It has to be estimated outside of the model, e.g. to
maximize loglike/llf or deviance for models with different `a`. Kerby
has an example in a notebook that I don't find right now.
statsmodels.discrete NegativeBinomial and NegativeBinomialP estimate
this coefficient in the MLE simultaneously with the mean parameters.

dispersion/scale
All families can have an additional `scale` parameter. For continuous
models like OLS, this is part of the likelihood model. GLM families
with discrete response have scale is 1 as default. This can be changed
to a quasi-likelihood assumption using the scale option in fit, using
either pearson chi2 'x2' or deviance 'dev' to compute the
scale/dispersion coefficient.
http://www.statsmodels.org/devel/generated/statsmodels.genmod.generalized_linear_model.GLM.fit.html

pearson_chi2 and deviance are available also as results attributes
http://www.statsmodels.org/devel/generated/statsmodels.genmod.generalized_linear_model.GLMResults.html

Note: The default scale of GLM NB has changed in the last version,
before it did not set scale=1 and had the same behavior as R.
eg. https://github.com/statsmodels/statsmodels/issues/2888
https://github.com/statsmodels/statsmodels/pull/5277
There was a lot of work in the last two years in fixing scale handling
in GLM without correctly specified likelihood especially for the cases
with excess dispersion.

To replicate the default behavior for excess dispersion in R, which we
use in the unit tests
`GLM(cls.data.endog, cls.data.exog,
family=sm.families.NegativeBinomial()).fit(scale='x2')`
https://github.com/statsmodels/statsmodels/pull/3856/files#diff-bacf4e7dec296273aba74b04be3c42f8R649

Josef

Pablo