Discussion:
[pystatsmodels] Implementation of linear mixed models
Andrey Portnoy
2018-11-05 21:38:52 UTC
Permalink
Hi all,

statsmodels’ implementation of mixed models differs in structure and interface from that of lme4 in R. Where should I look in order to understand the reasons behind that difference? I would be really grateful if Dr. Shedden could comment.

If I were to implement mixed models in Python, my strategy would be to directly port lme4. Why wasn’t that the strategy of choice?

My superficial understanding is that it’s partly because of the lack of support for mixed effects formulas in Patsy and partly due to unavailability of matrix factorization routines under permissive open source licenses.

Is that correct?

Thank you,
Andrey Portnoy.
j***@gmail.com
2018-11-05 21:55:58 UTC
Permalink
Post by Andrey Portnoy
Hi all,
statsmodels’ implementation of mixed models differs in structure and
interface from that of lme4 in R. Where should I look in order to
understand the reasons behind that difference? I would be really grateful
if Dr. Shedden could comment.
If I were to implement mixed models in Python, my strategy would be to
directly port lme4. Why wasn’t that the strategy of choice?
My superficial understanding is that it’s partly because of the lack of
support for mixed effects formulas in Patsy and partly due to
unavailability of matrix factorization routines under permissive open
source licenses.
Is that correct?
Without specifics for which Kerby needs to answer.

Both are license issues. We cannot directly port lme4 because GPL is
incompatible with our BSD license (although the reverse would be allowed).
This applies to almost all R packages. There are a few exceptions for
smaller packages where authors give permission to translate their code into
Python.

The second is as you mention, CHOLMOD does not have a BSD/MIT compatible
license either and there is no license compatible open source alternative.

So most of statsmodels is coded from scratch and relies on license
compatible packages for the tools.

Aside about formulas in R: I never understood all the extras that can hide
in R's formulas. To a large extend it is intentional that additional
components need to be specified in extra keywords. So, there is no strong
incentive from our side to improve support in patsy for some of the extras.

Josef
Post by Andrey Portnoy
Thank you,
Andrey Portnoy.
j***@gmail.com
2018-11-06 03:10:03 UTC
Permalink
Post by j***@gmail.com
Post by Andrey Portnoy
Hi all,
statsmodels’ implementation of mixed models differs in structure and
interface from that of lme4 in R. Where should I look in order to
understand the reasons behind that difference? I would be really grateful
if Dr. Shedden could comment.
If I were to implement mixed models in Python, my strategy would be to
directly port lme4. Why wasn’t that the strategy of choice?
My superficial understanding is that it’s partly because of the lack of
support for mixed effects formulas in Patsy and partly due to
unavailability of matrix factorization routines under permissive open
source licenses.
Is that correct?
Without specifics for which Kerby needs to answer.
Both are license issues. We cannot directly port lme4 because GPL is
incompatible with our BSD license (although the reverse would be allowed).
This applies to almost all R packages. There are a few exceptions for
smaller packages where authors give permission to translate their code into
Python.
The second is as you mention, CHOLMOD does not have a BSD/MIT compatible
license either and there is no license compatible open source alternative.
So most of statsmodels is coded from scratch and relies on license
compatible packages for the tools.
Aside about formulas in R: I never understood all the extras that can hide
in R's formulas. To a large extend it is intentional that additional
components need to be specified in extra keywords. So, there is no strong
incentive from our side to improve support in patsy for some of the extras.
Interface design is difficult or Why we don't always follow other packages
even those that are popular.

library mgcv: which line is correct?
Post by j***@gmail.com
gam_a = gam(city.mpg ~ fuel + drive + s(weight,bs="bs", k=12,
knots=knots) + s(hp,bs="bs",k=10), data = mpg)
Post by j***@gmail.com
gam_a = gam(city.mpg ~ fuel + drive + s(weight,bs="bs", k=12) +
s(hp,bs="bs",k=10), data = mpg, knots=knots)

Why does knots have to be a dataframe and not a list?

(It took me several tries and reading examples several times to get the
knots syntax to work. Fortunately mgcv has many examples.)

Josef
Post by j***@gmail.com
Josef
Post by Andrey Portnoy
Thank you,
Andrey Portnoy.
Continue reading on narkive:
Loading...