[pystatsmodels] GAM

j***@gmail.com

2018-10-04 15:45:32 UTC

Post by j***@gmail.com
theme: revival of 2015 PRs, episode 6
NUMFocus small development grant is providing financially support to
finish up and merge the GAM project from GSOC 2015.
an appetizer
https://gist.github.com/josef-pkt/5e164ab0b25a21f317fa5d540d850243
a notebook from 2015 to see whether the code still works.
action is supposed to happen here
https://github.com/statsmodels/statsmodels/pull/5296
ETA: whenever I get two quite weeks to work on it.
Josef

Warning: Too much inheritance

GLMGAM inherits from GLM and GLMResults

as GLM becomes more feature rich, there are too many option combinations
and subclasses to verify all the results for all the possibilities.

e.g. I added get_influence to GLM a few months ago, it works but I doubt it
is all correct for the penalized case.
Wald inference is generic, but if we use t and F distribution, we need the
degrees of freedom (which currently does not take penalization into account)
same for aic, bic which depend on effective degrees of freedom

GLM doesn't have get_margins yet but that should also be coming soonish.
score_test is also waiting in a PR for merging.

To have margins, score_test and influence measures work correctly for
GLMGAM, or other penalized GLM, with var_weights and cluster robust
standard errors might still require a few more years.
But when it does it will be great:
res = glm(formula, data, family=..., penalization=...,
var_weights=...).fit(cov_type='cluster', cov_kwds=dict(groups=mygroups)
marg = res.get_margins(...)
st = res.score_test(...)
print(st.summary())
...

dir(gam_res_cs_h)
[...

'_freq_weights',
'_get_robustcov_results',
'_iweights',
'_n_trials',
'_var_weights',
'aic',
'bic',
'bse',
'conf_int',
'converged',
'cov_kwds',
'cov_params',
'cov_params_default',
'cov_type',
'data_in_cache',
'deviance',
'df_model',
'df_resid',
'f_test',
'family',
'fit_history',
'fittedvalues',
'get_hat_matrix_diag',
'get_influence',
'get_prediction',
'initialize',
'k_constant',
'llf',
'llnull',
'load',
'method',
'model',
'mu',
'nobs',
'normalized_cov_params',
'null',
'null_deviance',
'params',
'partial_values',
'pearson_chi2',
'plot_added_variable',
'plot_ceres_residuals',
'plot_partial',
'plot_partial_residuals',
'predict',
'pvalues',
'remove_data',
'resid_anscombe',
'resid_anscombe_scaled',
'resid_anscombe_unscaled',
'resid_deviance',
'resid_pearson',
'resid_response',
'resid_working',
'save',
'scale',
'significance_test',
'summary',
'summary2',
't_test',
't_test_pairwise',
'tvalues',
'use_t',
'wald_test',

'wald_test_terms']