Discussion:
[pystatsmodels] design notes: backwards compatibility and consistency across models
j***@gmail.com
2018-09-26 13:33:05 UTC
Permalink
or to inherit or not to inherit
or what is the common denominator of all models
or why don't we keep everything working in the same way so I don't have to
RTFM all the time

While trying to put fittedvalues and resid in the top level
base.model.Results class, I ran into a few problems where models don't all
behave in the same way.

fittedvalues has model or model group specific implementation, and I found
some bugs where offset and exposure was ignored in it's computation or that
MNLogit fittedvalues is missing one column.
Also we have a longstanding issue that fittedvalues uses linear=True
instead of computing the mean/expected value in discrete models.

Warning: I'm thinking of fixing fittedvalues in discrete models now without
deprecation, because we need to fix some bugs/limitations anyway, which
means that the numbers that will be returned after upgrading will differ.
(In statsmodels I have mostly avoided using discrete model fittedvalues
because it has the wrong, inconsistent definition.)

The general definition that I would like to enforce is that
results.fittedvalues = results.predict() where predict without arguments
defaults to the prediction of the mean, and that resid = endog -
fittedvalues (with maybe in some cases like MNLogit endog replace by wendog)

In general:
We need a large amount of flexibility in our base classes because we have
many different types of models and are still adding new types of models.
Therefore, we cannot enforce much of the behavior on the generic base class
level. Enforcement is largely done through generic unit tests but those are
still incomplete.

Consequently we are always facing the trade-off when implementing a new
model or refactoring an existing model in whether to implement it in a
model specific way, which is usually easier when following a specific
description of a model, or whether to use inheritance and follow or force
it into a consistent pattern across models.

references
https://github.com/statsmodels/statsmodels/pull/5255
https://github.com/statsmodels/statsmodels/issues/1181
https://github.com/statsmodels/statsmodels/issues/1970
and many more

Josef

Loading...