I needed a break from watching home repairs and penalized splines.
(yet).
part to work.
Post by Andrey PortnoyThank you for the thorough reply!
In terms of design, do you think it would make sense for results objects
to have their own references to endog, instead of going through self.model?
It would be possible, but I think it is not worth the extra code
complexity. There are not many use cases outside of the current
special case. I cannot come up with any other case.
Initially I thought we would have to make a copy of endog, but
references would work well, i.e. keep a hold onto the model.endog, but
if that is replaced by assignment, then we would still have a
reference to the original which wouldn't garbage collected.
For the new endog usecase in OLS, I think the cached residual should
be enough, but that would have to be verified whether it's true in the
current implementation.
The main design problem is that I want the core models to become more
consistent in behavior with each other, so that it becomes easier to
write generic/general extensions for them. There are inherent
statistical differences across models that we still have to work
around, but I would like to avoid "unnecessary" complications. For
example, OLS should be a standard model with all the extras. But
VectorizedOLS can be optimized for computational efficiency but
doesn't get all the additional goodies.
https://github.com/statsmodels/statsmodels/issues/2203 partially
opened as a counterpoint to becoming more consistent in the models and
writing meta classes or mixins to add generic extensions to the
models.
Development of those special models is slow with a few exceptions like
a simplified WLS used in GLM and RLM or some special case helper
functions, because the main interest of contributors and maintainers
was in other areas.
Josef
Hi all,
Is there an official way of creating new results objects when exog is
fixed and only endog changes? Iâm using unweighted OLS.
The obvious approach is to create a new OLS object for each case and call fit().
fit(), however, checks for presence of exog-related objects (like the
pseudoinverse), skips fitting if they are found, and jumps straight to
computing the betas. So clearly, refitting is not necessary if the goal is
to produce a results object with only the endog swapped out.
model = smf.ols(formula, data)
old_results = model.fit()
model.wendog = new_endog
new_results = model.fit()
But is there an official way of reusing the existing OLS object, rather
than setting model.wendog directly?
No, there is still no official way to do this.
I had added the reuse of the attached pinv for use cases like this.
However, something like fit_new_endog never was implemented.
https://github.com/statsmodels/statsmodels/issues/718
A fit_new_endog would essentially do what you have. The problem is that is
dangerous for general use because the results instance might still need to
access the endog of the model until enough results attributes, mainly
resid, I guess, are cached.
For other models it would be even more fragile, and it wouldn't help in
nonlinear models like GLM anyway.
Even GLS/WLS classes that need to whiten endog and exog in a data
dependent way need to recompute wexog and pinv.
Replacing endog/wendog is pretty safe in a loop where we take the required
results attributes and then let the results instance go out of scope.
One main application would be residual bootstrap which, however, we still don't have.
One alternative would be a proper VectorizedOLS model class which is less
general than a MultivariateOLS that we use for MANOVA, (MultivariateOLS is
currently not a full model class).
https://github.com/statsmodels/statsmodels/issues/4771
Josef
Thank you,
Andrey Portnoy.