[glm] error with binomial family

Discussion:

[glm] error with binomial family

Sol

2010-11-15 14:58:14 UTC

Hello,
I'm trying to perform GLM with statsmodels
Thanks for implementing such code into python, and I hope I will help
to build higher detailed documentation/tutorial in futur ;o)

I tried this code:

import numpy
import scikits.statsmodels as sm

SUCCESS = numpy.array([81,71,74,87,22,26,31,7,61,97,78])
FAILURE = numpy.array([19,29,26,13,78,74,69,93,39,3,22])
TOTAL = numpy.array([100,100,100,100,100,100,100,100,100,100,100])

X = numpy.array([[1868.,0.709],[1633.,0.593],[1702.,0.394],
[2001.,0.941],[506.,0.528],[598.,0.859],[713.,0.294],[161.,0.724],
[1403.,0.182],[2231.,0.994],[1794.,0.057]])
Y = numpy.array([[81, 19],[71, 29],[74, 26],[87, 13],[22, 78],[26,
74],[31, 69],[7, 93],[61, 39],[97, 3],[78, 22]])

glm = sm.GLM(Y, X, family=sm.family.Binomial())

res = glm.fit(data_weights = 100)

I raised this error:
Traceback (most recent call last):
File "C:/Users/[...]/test scipy.py", line 29, in <module>
res = glm.fit() #data_weights = 100
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-
py2.6.egg\scikits\statsmodels\glm.py", line 378, in fit
returned a nan. This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.

Could you help me to succeed into GLM performing with statsmodels?

Another questions:
1- can I perform the same GLM but with varying "data_weights" --> one
trial at 99, another a

Sol

2010-11-15 15:00:00 UTC

Hello,
I'm trying to perform GLM with statsmodels
Thanks for implementing such code into python, and I hope I will help
to build higher detailed documentation/tutorial in futur ;o)

I tried this code:
----------------------------------------------------------------------------------------------------------------------------------------------------
import numpy
import scikits.statsmodels as sm

SUCCESS = numpy.array([81,71,74,87,22,26,31,7,61,97,78])
FAILURE = numpy.array([19,29,26,13,78,74,69,93,39,3,22])
TOTAL = numpy.array([100,100,100,100,100,100,100,100,100,100,100])

X = numpy.array([[1868.,0.709],[1633.,0.593],[1702.,0.394],
[2001.,0.941],[506.,0.528],[598.,0.859],[713.,0.294],[161.,0.724],
[1403.,0.182],[2231.,0.994],[1794.,0.057]])
Y = numpy.array([[81, 19],[71, 29],[74, 26],[87, 13],[22, 78],[26,
74],[31, 69],[7, 93],[61, 39],[97, 3],[78, 22]])

glm = sm.GLM(Y, X, family=sm.family.Binomial())

res = glm.fit(data_weights = 100)
----------------------------------------------------------------------------------------------------------------------------------------------------
I raised this error:
----------------------------------------------------------------------------------------------------------------------------------------------------
Traceback (most recent call last):
File "C:/Users/[...]/test scipy.py", line 29, in <module>
res = glm.fit() #data_weights = 100
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-
py2.6.egg\scikits\statsmodels\glm.py", line 378, in fit
returned a nan. This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.
----------------------------------------------------------------------------------------------------------------------------------------------------

Could you help me to succeed into GLM performing with statsmodels?

Another questions:
1- can I perform the same GLM but with varying "data_weights" --> one
trial at 99, another at 102,...?

Thanks for all incoming advices and answers.

Skipper Seabold

2010-11-15 15:21:17 UTC

Post by Sol
Hello,
I'm trying to perform GLM with statsmodels
Thanks for implementing such code into python, and I hope I will help
to build higher detailed documentation/tutorial in futur ;o)
----------------------------------------------------------------------------------------------------------------------------------------------------
import numpy
import scikits.statsmodels as sm
SUCCESS = numpy.array([81,71,74,87,22,26,31,7,61,97,78])
FAILURE = numpy.array([19,29,26,13,78,74,69,93,39,3,22])
TOTAL = numpy.array([100,100,100,100,100,100,100,100,100,100,100])
X = numpy.array([[1868.,0.709],[1633.,0.593],[1702.,0.394],
[2001.,0.941],[506.,0.528],[598.,0.859],[713.,0.294],[161.,0.724],
[1403.,0.182],[2231.,0.994],[1794.,0.057]])
Y = numpy.array([[81, 19],[71, 29],[74, 26],[87, 13],[22, 78],[26,
74],[31, 69],[7, 93],[61, 39],[97, 3],[78, 22]])
glm = sm.GLM(Y, X, family=sm.family.Binomial())
res = glm.fit(data_weights = 100)
----------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------
File "C:/Users/[...]/test scipy.py", line 29, in <module>
res = glm.fit() #data_weights = 100
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-
py2.6.egg\scikits\statsmodels\glm.py", line 378, in fit
returned a nan. This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.
----------------------------------------------------------------------------------------------------------------------------------------------------
Could you help me to succeed into GLM performing with statsmodels?
1- can I perform the same GLM but with varying "data_weights" --> one
trial at 99, another at 102,...?
Thanks for all incoming advices and answers.

Without looking too much closer at the moment, I think it's doing
integer division. Try

Y = Y.astype(float)
TOTAL = TOTAL.astype(float)

glm = sm.GLM(Y,X, family=sm.families.Binomial())
res = glm.fit(data_weights=TOTAL)

Also note that sm.family is now sm.families in newer versions of statsmodels.

Skipper

j***@public.gmane.org

2010-11-15 15:55:26 UTC

Post by Skipper Seabold

Post by Sol
Hello,
I'm trying to perform GLM with statsmodels
Thanks for implementing such code into python, and I hope I will help
to build higher detailed documentation/tutorial in futur ;o)
----------------------------------------------------------------------------------------------------------------------------------------------------
import numpy
import scikits.statsmodels as sm
SUCCESS = numpy.array([81,71,74,87,22,26,31,7,61,97,78])
FAILURE = numpy.array([19,29,26,13,78,74,69,93,39,3,22])
TOTAL = numpy.array([100,100,100,100,100,100,100,100,100,100,100])
X = numpy.array([[1868.,0.709],[1633.,0.593],[1702.,0.394],
[2001.,0.941],[506.,0.528],[598.,0.859],[713.,0.294],[161.,0.724],
[1403.,0.182],[2231.,0.994],[1794.,0.057]])
Y = numpy.array([[81, 19],[71, 29],[74, 26],[87, 13],[22, 78],[26,
74],[31, 69],[7, 93],[61, 39],[97, 3],[78, 22]])
glm = sm.GLM(Y, X, family=sm.family.Binomial())
res = glm.fit(data_weights = 100)
----------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------
File "C:/Users/[...]/test scipy.py", line 29, in <module>
res = glm.fit() #data_weights = 100
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-
py2.6.egg\scikits\statsmodels\glm.py", line 378, in fit
returned a nan. This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.
----------------------------------------------------------------------------------------------------------------------------------------------------
Could you help me to succeed into GLM performing with statsmodels?
1- can I perform the same GLM but with varying "data_weights" --> one
trial at 99, another at 102,...?
Thanks for all incoming advices and answers.

Without looking too much closer at the moment, I think it's doing
integer division. Try
Y = Y.astype(float)
TOTAL = TOTAL.astype(float)
glm = sm.GLM(Y,X, family=sm.families.Binomial())
res = glm.fit(data_weights=TOTAL)
Also note that sm.family is now sm.families in newer versions of statsmodels.

good guess about integer, checked with expanding cases to obs

--------------------

mod = sm.GLM(Y.astype(float), X, family=sm.families.Binomial())
print mod.fit().params
print mod.fit(data_weights=TOTAL.astype(float)).params

import numpy as np

def convert_case2obs(y, X):
'''expand binomial data given by distinct cases to observations

Returns
-------
endog : ndarray, 1d
data array with 1 for success and 0 for failure

'''
y = np.asarray(y)
assert y.ndim == 2
ncases = y.shape[0]
nobs = y.sum(0)
ind0 = np.repeat(np.arange(ncases), y[:,0].astype(int))
ind1 = np.repeat(np.arange(ncases), y[:,1].astype(int))
failobs = np.column_stack((np.ones(nobs[0]), X[ind0,:]))
successobs = np.column_stack((np.zeros(nobs[1]), X[ind1,:]))
data = np.r_[successobs, failobs]
return data[:,0], data[:,1:]

yobs, xobs = convert_case2obs(Y, X)

print sm.GLM(yobs, xobs, family=sm.families.Binomial()).fit().params
print sm.Logit(yobs, xobs).fit().params

print '\nwith constant'
mod = sm.GLM(Y.astype(float), sm.add_constant(X), family=sm.families.Binomial())
print mod.fit().params
print mod.fit(data_weights=TOTAL.astype(float)).params

print sm.GLM(yobs, sm.add_constant(xobs),
family=sm.families.Binomial()).fit().params
print sm.Logit(yobs, sm.add_constant(xobs)).fit().params

--------------------

results, 2 versions of binomial agree and also the same as discretemod.logit

[ 1.25930498e-03 -1.80115251e+00]
[ 1.25930498e-03 -1.80115251e+00]
[ 1.25930498e-03 -1.80115251e+00]
Optimization terminated successfully.
Current function value: 619.589828
Iterations 5
[ 1.25930498e-03 -1.80115251e+00]

with constant
[ 2.17177355e-03 2.92779632e-01 -2.65329810e+00]
[ 2.17177355e-03 2.92779632e-01 -2.65329810e+00]
[ 2.17177355e-03 2.92779632e-01 -2.65329810e+00]
Optimization terminated successfully.
Current function value: 539.318195
Iterations 5
[ 2.17177355e-03 2.92779632e-01 -2.65329810e+00]

Josef

Post by Skipper Seabold
Skipper

j***@public.gmane.org

2010-11-15 16:58:07 UTC

Post by j***@public.gmane.org

Post by Skipper Seabold

Post by Sol
Hello,
I'm trying to perform GLM with statsmodels
Thanks for implementing such code into python, and I hope I will help
to build higher detailed documentation/tutorial in futur ;o)
----------------------------------------------------------------------------------------------------------------------------------------------------
import numpy
import scikits.statsmodels as sm
SUCCESS = numpy.array([81,71,74,87,22,26,31,7,61,97,78])
FAILURE = numpy.array([19,29,26,13,78,74,69,93,39,3,22])
TOTAL = numpy.array([100,100,100,100,100,100,100,100,100,100,100])
X = numpy.array([[1868.,0.709],[1633.,0.593],[1702.,0.394],
[2001.,0.941],[506.,0.528],[598.,0.859],[713.,0.294],[161.,0.724],
[1403.,0.182],[2231.,0.994],[1794.,0.057]])
Y = numpy.array([[81, 19],[71, 29],[74, 26],[87, 13],[22, 78],[26,
74],[31, 69],[7, 93],[61, 39],[97, 3],[78, 22]])
glm = sm.GLM(Y, X, family=sm.family.Binomial())
res = glm.fit(data_weights = 100)
----------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------
File "C:/Users/[...]/test scipy.py", line 29, in <module>
res = glm.fit() #data_weights = 100
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-
py2.6.egg\scikits\statsmodels\glm.py", line 378, in fit
returned a nan. This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.
----------------------------------------------------------------------------------------------------------------------------------------------------
Could you help me to succeed into GLM performing with statsmodels?
1- can I perform the same GLM but with varying "data_weights" --> one
trial at 99, another at 102,...?
Thanks for all incoming advices and answers.

Without looking too much closer at the moment, I think it's doing
integer division. Try
Y = Y.astype(float)
TOTAL = TOTAL.astype(float)
glm = sm.GLM(Y,X, family=sm.families.Binomial())
res = glm.fit(data_weights=TOTAL)
Also note that sm.family is now sm.families in newer versions of statsmodels.

good guess about integer, checked with expanding cases to obs

I fixed it in my branch and added a ticket
https://bugs.launchpad.net/statsmodels/+bug/675631

Sol, If you want to fix it in your copy of statsmodels, it's just
adding a `1.` in family/family.py Binomial initialize
so that it is
return y*1./self.n
instead of
return y/self.n

Josef

Post by j***@public.gmane.org
--------------------
mod = sm.GLM(Y.astype(float), X, family=sm.families.Binomial())
print mod.fit().params
print mod.fit(data_weights=TOTAL.astype(float)).params
import numpy as np
'''expand binomial data given by distinct cases to observations
Returns
-------
endog : ndarray, 1d
data array with 1 for success and 0 for failure
'''
y = np.asarray(y)
assert y.ndim == 2
ncases = y.shape[0]
nobs = y.sum(0)
ind0 = np.repeat(np.arange(ncases), y[:,0].astype(int))
ind1 = np.repeat(np.arange(ncases), y[:,1].astype(int))
failobs = np.column_stack((np.ones(nobs[0]), X[ind0,:]))
successobs = np.column_stack((np.zeros(nobs[1]), X[ind1,:]))
data = np.r_[successobs, failobs]
return data[:,0], data[:,1:]
yobs, xobs = convert_case2obs(Y, X)
print sm.GLM(yobs, xobs, family=sm.families.Binomial()).fit().params
print sm.Logit(yobs, xobs).fit().params
print '\nwith constant'
mod = sm.GLM(Y.astype(float), sm.add_constant(X), family=sm.families.Binomial())
print mod.fit().params
print mod.fit(data_weights=TOTAL.astype(float)).params
print sm.GLM(yobs, sm.add_constant(xobs),
family=sm.families.Binomial()).fit().params
print sm.Logit(yobs, sm.add_constant(xobs)).fit().params
--------------------
results, 2 versions of binomial agree and also the same as discretemod.logit
[ 1.25930498e-03 -1.80115251e+00]
[ 1.25930498e-03 -1.80115251e+00]
[ 1.25930498e-03 -1.80115251e+00]
Optimization terminated successfully.
Current function value: 619.589828
Iterations 5
[ 1.25930498e-03 -1.80115251e+00]
with constant
[ 2.17177355e-03 2.92779632e-01 -2.65329810e+00]
[ 2.17177355e-03 2.92779632e-01 -2.65329810e+00]
[ 2.17177355e-03 2.92779632e-01 -2.65329810e+00]
Optimization terminated successfully.
Current function value: 539.318195
Iterations 5
[ 2.17177355e-03 2.92779632e-01 -2.65329810e+00]
Josef

Post by Skipper Seabold
Skipper

Sol

2010-11-16 13:52:51 UTC

Thanks Skipper and Josef for your detailed answers.

Another question:

how to get results (as summary function in R) -->
coefficients:
estimates, standard errors, Z_value, Pr(>z) for intercept and
associated explanatory variables (I have both discrete and
continuous).
AIC ou likelihood
...

Thanks for your interests.
Sol

Post by j***@public.gmane.org

Post by Skipper Seabold

Post by Sol
Hello,
I'm trying to perform GLM with statsmodels
Thanks for implementing such code into python, and I hope I will help
to build higher detailed documentation/tutorial in futur ;o)
----------------------------------------------------------------------------------------------------------------------------------------------------
import numpy
import scikits.statsmodels as sm
SUCCESS = numpy.array([81,71,74,87,22,26,31,7,61,97,78])
FAILURE = numpy.array([19,29,26,13,78,74,69,93,39,3,22])
TOTAL = numpy.array([100,100,100,100,100,100,100,100,100,100,100])
X = numpy.array([[1868.,0.709],[1633.,0.593],[1702.,0.394],
[2001.,0.941],[506.,0.528],[598.,0.859],[713.,0.294],[161.,0.724],
[1403.,0.182],[2231.,0.994],[1794.,0.057]])
Y = numpy.array([[81, 19],[71, 29],[74, 26],[87, 13],[22, 78],[26,
74],[31, 69],[7, 93],[61, 39],[97, 3],[78, 22]])
glm = sm.GLM(Y, X, family=sm.family.Binomial())
res = glm.fit(data_weights = 100)
----------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------
File "C:/Users/[...]/test scipy.py", line 29, in <module>
res = glm.fit() #data_weights = 100
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-
py2.6.egg\scikits\statsmodels\glm.py", line 378, in fit
returned a nan. This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.
----------------------------------------------------------------------------------------------------------------------------------------------------
Could you help me to succeed into GLM performing with statsmodels?
1- can I perform the same GLM but with varying "data_weights" --> one
trial at 99, another at 102,...?
Thanks for all incoming advices and answers.

Without looking too much closer at the moment, I think it's doing
integer division. Try
Y = Y.astype(float)
TOTAL = TOTAL.astype(float)
glm = sm.GLM(Y,X, family=sm.families.Binomial())
res = glm.fit(data_weights=TOTAL)
Also note that sm.family is now sm.families in newer versions of statsmodels.

good guess about integer, checked with expanding cases to obs

I fixed it in my branch and added a tickethttps://bugs.launchpad.net/statsmodels/+bug/675631
Sol, If you want to fix it in your copy of statsmodels, it's just
adding a `1.` in family/family.py Binomial initialize
so that it is
return y*1./self.n
instead of
return y/self.n
Josef

Post by j***@public.gmane.org
--------------------
mod = sm.GLM(Y.astype(float), X, family=sm.families.Binomial())
print mod.fit().params
print mod.fit(data_weights=TOTAL.astype(float)).params
import numpy as np
'''expand binomial data given by distinct cases to observations
Returns
-------
endog : ndarray, 1d
data array with 1 for success and 0 for failure
'''
y = np.asarray(y)
assert y.ndim == 2
ncases = y.shape[0]
nobs = y.sum(0)
ind0 = np.repeat(np.arange(ncases), y[:,0].astype(int))
ind1 = np.repeat(np.arange(ncases), y[:,1].astype(int))
failobs = np.column_stack((np.ones(nobs[0]), X[ind0,:]))
successobs = np.column_stack((np.zeros(nobs[1]), X[ind1,:]))
data = np.r_[successobs, failobs]
return data[:,0], data[:,1:]
yobs, xobs = convert_case2obs(Y, X)
print sm.GLM(yobs, xobs, family=sm.families.Binomial()).fit().params
print sm.Logit(yobs, xobs).fit().params
print '\nwith constant'
mod = sm.GLM(Y.astype(float), sm.add_constant(X), family=sm.families.Binomial())
print mod.fit().params
print mod.fit(data_weights=TOTAL.astype(float)).params
print sm.GLM(yobs, sm.add_constant(xobs),
family=sm.families.Binomial()).fit().params
print sm.Logit(yobs, sm.add_constant(xobs)).fit().params
--------------------
results, 2 versions of binomial agree and also the same as discretemod.logit
[ 1.25930498e-03 -1.80115251e+00]
[ 1.25930498e-03 -1.80115251e+00]
[ 1.25930498e-03 -1.80115251e+00]
Optimization terminated successfully.
Current function value: 619.589828
Iterations 5
[ 1.25930498e-03 -1.80115251e+00]
with constant
[ 2.17177355e-03 2.92779632e-01 -2.65329810e+00]
[ 2.17177355e-03 2.92779632e-01 -2.65329810e+00]
[ 2.17177355e-03 2.92779632e-01 -2.65329810e+00]
Optimization terminated successfully.
Current function value: 539.318195
Iterations 5
[ 2.17177355e-03 2.92779632e-01 -2.65329810e+00]
Josef

Post by Skipper Seabold
Skipper

Skipper Seabold

2010-11-16 14:16:29 UTC

Post by Sol
Thanks Skipper and Josef for your detailed answers.
how to get results (as summary function in R) -->
estimates, standard errors, Z_value, Pr(>z) for intercept and
associated explanatory variables (I have both discrete and
continuous).
AIC ou likelihood
...

The results will either be attached to the res object that is returned
by fit. Ie., res.params are coefficients, res.bse are standard
errors, res.t() *should* actually return Z values for the GLM's (I'll
have to check), etc. Check the docstrings for
sm.glm.GLMResults.__doc__
or see here: <http://statsmodels.sourceforge.net/generated.scikits.statsmodels.glm.GLMResults.html#scikits.statsmodels.glm.GLMResults>

Or there is a summary method attached to the results object (not sure
if this is done yet for GLM, actually).

Skipper

Sol

2010-11-16 15:51:37 UTC

Sorry again Skipper or Josef,

so GLM performs by default *without intercept*, can I get an
intercept?

After our discussions about, I have writen a "tutorial", step-by-step
on how to use Statsmodels (and its corresponding R code) to help next
students to perform GLM. Are you interested to publish it on the
"statsmodels documentation"? Can I send you by email (codes and step-
by-step explanations in an odt file)?

Re-again thanks for your advices ;o)
Best regards

sol

Post by Skipper Seabold

Post by Sol
Thanks Skipper and Josef for your detailed answers.
how to get results (as summary function in R) -->
estimates, standard errors, Z_value, Pr(>z) for intercept and
associated explanatory variables (I have both discrete and
continuous).
AIC ou likelihood
...

The results will either be attached to the res object that is returned
by fit. Ie., res.params are coefficients, res.bse are standard
errors, res.t() *should* actually return Z values for the GLM's (I'll
have to check), etc. Check the docstrings for
sm.glm.GLMResults.__doc__
or see here: <http://statsmodels.sourceforge.net/generated.scikits.statsmodels.glm....>
Or there is a summary method attached to the results object (not sure
if this is done yet for GLM, actually).
Skipper

Skipper Seabold

2010-11-16 15:57:52 UTC

Post by Sol
Sorry again Skipper or Josef,
so GLM performs by default *without intercept*, can I get an
intercept?

You have to explicitly add the constant to the data.

x = sm.add_constant(x, prepend=True)

Post by Sol
After our discussions about, I have writen a "tutorial", step-by-step
on how to use Statsmodels (and its corresponding R code) to help next
students to perform GLM. Are you interested to publish it on the
"statsmodels documentation"? Can I send you by email (codes and step-
by-step explanations in an odt file)?

Having a statsmodels for R users might be pretty helpful. An .rst
file might be easier to integrate more quickly. You can see some
examples in our /scikits/statsmodels/docs (still want to move /docs/
up to the main directory similar to numpy/scipy/scikits-learn). There
is also a numpy ReST tutorial, but I can't find it at the moment.

There is a similar effort on this blog.

http://ascratchpad.blogspot.com/2010/09/glm-and-python-scikitsstatsmodels.html

Post by Sol
Re-again thanks for your advices ;o)
Best regards
sol

Post by Skipper Seabold

Post by Sol
Thanks Skipper and Josef for your detailed answers.
how to get results (as summary function in R) -->
estimates, standard errors, Z_value, Pr(>z) for intercept and
associated explanatory variables (I have both discrete and
continuous).
AIC ou likelihood
...

The results will either be attached to the res object that is returned
by fit. Ie., res.params are coefficients, res.bse are standard
errors, res.t() *should* actually return Z values for the GLM's (I'll
have to check), etc. Check the docstrings for
sm.glm.GLMResults.__doc__
or see here: <http://statsmodels.sourceforge.net/generated.scikits.statsmodels.glm....>
Or there is a summary method attached to the results object (not sure
if this is done yet for GLM, actually).
Skipper

j***@public.gmane.org

2010-11-16 16:00:43 UTC

Post by Sol
Sorry again Skipper or Josef,
so GLM performs by default *without intercept*, can I get an
intercept?

we have a utility function sm.add_constant as in the example I showed before

mod = sm.GLM(Y.astype(float), sm.add_constant(X), family=sm.families.Binomial())

It has a prepend option to add the constant as the first column
instead of at the end (default)

Post by Sol
After our discussions about, I have writen a "tutorial", step-by-step
on how to use Statsmodels (and its corresponding R code) to help next
students to perform GLM. Are you interested to publish it on the
"statsmodels documentation"? Can I send you by email (codes and step-
by-step explanations in an odt file)?

Thank you, additional tutorials are always very helpful
If you want you can send it to the list, or you can send it to
Skippers or my private email.

We could either add it as a file to the documentation or convert it to
restructured text so it can be directly included in the documentation.

Post by Sol
Re-again thanks for your advices ;o)

Keep asking and tell as if there are things unclear, or are buggy or look buggy.

Cheers,

Josef

Post by Sol
Best regards
sol

Post by Skipper Seabold

Post by Sol
Thanks Skipper and Josef for your detailed answers.
how to get results (as summary function in R) -->
estimates, standard errors, Z_value, Pr(>z) for intercept and
associated explanatory variables (I have both discrete and
continuous).
AIC ou likelihood
...

The results will either be attached to the res object that is returned
by fit. Ie., res.params are coefficients, res.bse are standard
errors, res.t() *should* actually return Z values for the GLM's (I'll
have to check), etc. Check the docstrings for
sm.glm.GLMResults.__doc__
or see here: <http://statsmodels.sourceforge.net/generated.scikits.statsmodels.glm....>
Or there is a summary method attached to the results object (not sure
if this is done yet for GLM, actually).
Skipper

Sol

2010-11-16 16:07:42 UTC

Wow you are impressive! ^^
Thankssss!

I try to converte and finish my R/statsmodels tutorial in .rsf for
tomorrow.

Best regards
sol

Post by Sol
Sorry again Skipper or Josef,
so GLM performs by default *without intercept*, can I get an
intercept?

we have a utility function sm.add_constant as in the example I showed before
mod = sm.GLM(Y.astype(float), sm.add_constant(X), family=sm.families.Binomial())
It has a prepend option to add the constant as the first column
instead of at the end (default)

Post by Sol
After our discussions about, I have writen a "tutorial", step-by-step
on how to use Statsmodels (and its corresponding R code) to help next
students to perform GLM. Are you interested to publish it on the
"statsmodels documentation"? Can I send you by email (codes and step-
by-step explanations in an odt file)?

Thank you, additional tutorials are always very helpful
If you want you can send it to the list, or you can send it to
Skippers or my private email.
We could either add it as a file to the documentation or convert it to
restructured text so it can be directly included in the documentation.

Post by Sol
Re-again thanks for your advices ;o)

Keep asking and tell as if there are things unclear, or are buggy or look buggy.
Cheers,
Josef

Post by Sol
Best regards
sol

Post by Skipper Seabold

Post by Sol
Thanks Skipper and Josef for your detailed answers.
how to get results (as summary function in R) -->
estimates, standard errors, Z_value, Pr(>z) for intercept and
associated explanatory variables (I have both discrete and
continuous).
AIC ou likelihood
...

The results will either be attached to the res object that is returned
by fit. Ie., res.params are coefficients, res.bse are standard
errors, res.t() *should* actually return Z values for the GLM's (I'll
have to check), etc. Check the docstrings for
sm.glm.GLMResults.__doc__
or see here: <http://statsmodels.sourceforge.net/generated.scikits.statsmodels.glm....>
Or there is a summary method attached to the results object (not sure
if this is done yet for GLM, actually).
Skipper

Sol

2010-11-22 10:29:48 UTC

Hello,
another question:
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN
Traceback (most recent call last):
File "C:\Users\Solenn\Desktop\2010 octobre AUDE\ramdomcline 0.1.py",
line 153, in <module>
res=glm.fit(data_weights = nbtot)
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-
py2.6.egg\scikits\statsmodels\glm.py", line 378, in fit
returned a nan. This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.

My glm is
glm = sm.GLM(y.astype(float), sm.add_constant(XYP.astype(float)),
family=sm.family.Binomial())
res=glm.fit(data_weights = nbtot)

and my data are for example
y = [[1.0, 15.0], [1.0, 18.0], [1.0, 19.0], [2.0, 20.0], [3.0, 21.0],
[2.0, 22.0], [1.0, 25.0], [2.0, 27.0], [1.0, 30.0], [1.0, 34.0], [1.0,
34.0], [0.0, 36.0], [2.0, 34.0], [2.0, 34.0], [1.0, 37.0], [3.0,
36.0], [3.0, 36.0], [3.0, 39.0], [0.0, 45.0], [3.0, 43.0], [5.0,
43.0], [1.0, 48.0], [1.0, 48.0], [4.0, 48.0], [0.0, 52.0], [1.0,
51.0], [2.0, 50.0], [1.0, 52.0], [1.0, 54.0], [0.0, 55.0], [6.0,
63.0], [1.0, 75.0]]
--------
X = [[876748.973], [720567.16299999994], [320150.36310000002],
[583218.3175], [408575.89500000002], [472261.23489999998],
[824823.13659999997], [538344.76379999996], [842367.61600000004],
[338325.4326], [295414.4596], [367216.44319999998],
[383680.80109999998], [749160.96149999998], [832882.53350000002],
[414151.92540000001], [714448.80689999997], [379632.01740000001],
[833843.3578], [809404.05850000004], [655167.06129999994],
[525426.95799999998], [752852.99410000001], [307259.39569999999],
[695146.15410000004], [764498.94220000005], [870005.8885],
[287810.36540000001], [388498.35110000003], [819041.00199999998],
[334226.90710000001], [573148.73690000002]]
--------
Y = [[2403908.9360000002], [2438478.3020000001], [2331190.0350000001],
[2369150.0660000001], [2334425.2570000002], [2353680.696],
[2456066.3250000002], [2366416.9180000001], [2460245.0040000002],
[2354388.6329999999], [2375419.429], [2336460.1949999998],
[2105292.3700000001], [2476741.588], [2398922.7119999998],
[2311211.449], [2474548.6600000001], [2358930.898],
[2453682.1030000001], [2463081.9350000001], [2447596.0210000002],
[2362995.2039999999], [2503244.5490000001], [2284394.2790000001],
[2466206.2760000001], [2416882.2779999999], [2420118.1549999998],
[2351559.3870000001], [2136919.9670000002], [2414933.3399999999],
[2208447.912], [2379643.2480000001]]
--------
PERIODE = [[0], [0], [0], [0], [1], [1], [0], [0], [1], [1], [0], [0],
[0], [0], [0], [0], [0], [1], [0], [1], [1], [1], [1], [1], [1], [1],
[1], [1], [1], [1], [1], [1]]
--------
nbtot = [16, 19, 20, 22, 24, 24, 26, 29, 31, 35, 35, 36, 36, 36, 38,
39, 39, 42, 45, 46, 48, 49, 49, 52, 52, 52, 52, 53, 55, 55, 69, 76]

with
XY = numpy.concatenate((X, Y), axis=1)
XYP = numpy.concatenate((XY, PERIODE), axis=1)
XYP.astype(float)

After reading glm.py and family.py I don't know from where it comes.

In R, it gives me:
Call:
glm(formula = (y/(tot - y)) ~ X + Y + P, family = binomial(),
weights = (tot))

Deviance Residuals:
Min 1Q Median 3Q Max
-2.11865 -0.93983 0.02340 0.67569 2.20055

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.239e+00 4.066e+00 -0.551 0.582
X -1.039e-06 9.162e-07 -1.134 0.257
Y -5.255e-08 1.862e-06 -0.028 0.977
P -1.230e-01 2.771e-01 -0.444 0.657

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 44.098 on 31 degrees of freedom
Residual deviance: 41.346 on 28 degrees of freedom
AIC: 116.04

Number of Fisher Scoring iterations: 5

Any idea please?

sol

Post by Sol
Wow you are impressive! ^^
Thankssss!
I try to converte and finish my R/statsmodels tutorial in .rsf for
tomorrow.
Best regards
sol

Post by Sol
Sorry again Skipper or Josef,
so GLM performs by default *without intercept*, can I get an
intercept?

we have a utility function sm.add_constant as in the example I showed before
mod = sm.GLM(Y.astype(float), sm.add_constant(X), family=sm.families.Binomial())
It has a prepend option to add the constant as the first column
instead of at the end (default)

Post by Sol
After our discussions about, I have writen a "tutorial", step-by-step
on how to use Statsmodels (and its corresponding R code) to help next
students to perform GLM. Are you interested to publish it on the
"statsmodels documentation"? Can I send you by email (codes and step-
by-step explanations in an odt file)?

Thank you, additional tutorials are always very helpful
If you want you can send it to the list, or you can send it to
Skippers or my private email.
We could either add it as a file to the documentation or convert it to
restructured text so it can be directly included in the documentation.

Post by Sol
Re-again thanks for your advices ;o)

Keep asking and tell as if there are things unclear, or are buggy or look buggy.
Cheers,
Josef

Post by Sol
Best regards
sol

Post by Skipper Seabold

Post by Sol
Thanks Skipper and Josef for your detailed answers.
how to get results (as summary function in R) -->
estimates, standard errors, Z_value, Pr(>z) for intercept and
associated explanatory variables (I have both discrete and
continuous).
AIC ou likelihood
...

The results will either be attached to the res object that is returned
by fit. Ie., res.params are coefficients, res.bse are standard
errors, res.t() *should* actually return Z values for the GLM's (I'll
have to check), etc. Check the docstrings for
sm.glm.GLMResults.__doc__
or see here: <http://statsmodels.sourceforge.net/generated.scikits.statsmodels.glm....>
Or there is a summary method attached to the results object (not sure
if this is done yet for GLM, actually).
Skipper

Sol

2010-11-22 10:44:21 UTC

Hello,
another question:
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN
Traceback (most recent call last):
File "C:\Users\...",
line 153, in <module>
res=glm.fit(data_weights = nbtot)
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-
py2.6.egg\scikits\statsmodels\glm.py", line 378, in fit
returned a nan. This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.

My glm is
glm = sm.GLM(y.astype(float), sm.add_constant(XYP.astype(float)),
family=sm.family.Binomial())
res=glm.fit(data_weights = nbtot)

and my data are for example
y = [[1.0, 15.0], [1.0, 18.0], [1.0, 19.0], [2.0, 20.0], [3.0, 21.0],
[2.0, 22.0], [1.0, 25.0], [2.0, 27.0], [1.0, 30.0], [1.0, 34.0], [1.0,
34.0], [0.0, 36.0], [2.0, 34.0], [2.0, 34.0], [1.0, 37.0], [3.0,
36.0], [3.0, 36.0], [3.0, 39.0], [0.0, 45.0], [3.0, 43.0], [5.0,
43.0], [1.0, 48.0], [1.0, 48.0], [4.0, 48.0], [0.0, 52.0], [1.0,
51.0], [2.0, 50.0], [1.0, 52.0], [1.0, 54.0], [0.0, 55.0], [6.0,
63.0], [1.0, 75.0]]
--------
X = [[876748.973], [720567.16299999994], [320150.36310000002],
[583218.3175], [408575.89500000002], [472261.23489999998],
[824823.13659999997], [538344.76379999996], [842367.61600000004],
[338325.4326], [295414.4596], [367216.44319999998],
[383680.80109999998], [749160.96149999998], [832882.53350000002],
[414151.92540000001], [714448.80689999997], [379632.01740000001],
[833843.3578], [809404.05850000004], [655167.06129999994],
[525426.95799999998], [752852.99410000001], [307259.39569999999],
[695146.15410000004], [764498.94220000005], [870005.8885],
[287810.36540000001], [388498.35110000003], [819041.00199999998],
[334226.90710000001], [573148.73690000002]]
--------
Y = [[2403908.9360000002], [2438478.3020000001], [2331190.0350000001],
[2369150.0660000001], [2334425.2570000002], [2353680.696],
[2456066.3250000002], [2366416.9180000001], [2460245.0040000002],
[2354388.6329999999], [2375419.429], [2336460.1949999998],
[2105292.3700000001], [2476741.588], [2398922.7119999998],
[2311211.449], [2474548.6600000001], [2358930.898],
[2453682.1030000001], [2463081.9350000001], [2447596.0210000002],
[2362995.2039999999], [2503244.5490000001], [2284394.2790000001],
[2466206.2760000001], [2416882.2779999999], [2420118.1549999998],
[2351559.3870000001], [2136919.9670000002], [2414933.3399999999],
[2208447.912], [2379643.2480000001]]
--------
PERIODE = [[0], [0], [0], [0], [1], [1], [0], [0], [1], [1], [0], [0],
[0], [0], [0], [0], [0], [1], [0], [1], [1], [1], [1], [1], [1], [1],
[1], [1], [1], [1], [1], [1]]
--------
nbtot = [16, 19, 20, 22, 24, 24, 26, 29, 31, 35, 35, 36, 36, 36, 38,
39, 39, 42, 45, 46, 48, 49, 49, 52, 52, 52, 52, 53, 55, 55, 69, 76]

with
XY = numpy.concatenate((X, Y), axis=1)
XYP = numpy.concatenate((XY, PERIODE), axis=1)
XYP.astype(float)

After reading glm.py and family.py I don't know from where it comes.

In R, it gives me:
Call:
glm(formula = (y/(tot - y)) ~ X + Y + P, family = binomial(),
weights = (tot))

Deviance Residuals:
Min 1Q Median 3Q Max
-2.11865 -0.93983 0.02340 0.67569 2.20055

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.239e+00 4.066e+00 -0.551 0.582
X -1.039e-06 9.162e-07 -1.134 0.257
Y -5.255e-08 1.862e-06 -0.028 0.977
P -1.230e-01 2.771e-01 -0.444 0.657

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 44.098 on 31 degrees of freedom
Residual deviance: 41.346 on 28 degrees of freedom
AIC: 116.04

Number of Fisher Scoring iterations: 5

Any idea please?

sol

Post by Sol
Wow you are impressive! ^^
Thankssss!
I try to converte and finish my R/statsmodels tutorial in .rsf for
tomorrow.
Best regards
sol

Post by Sol
Sorry again Skipper or Josef,
so GLM performs by default *without intercept*, can I get an
intercept?

we have a utility function sm.add_constant as in the example I showed before
mod = sm.GLM(Y.astype(float), sm.add_constant(X), family=sm.families.Binomial())
It has a prepend option to add the constant as the first column
instead of at the end (default)

Post by Sol
After our discussions about, I have writen a "tutorial", step-by-step
on how to use Statsmodels (and its corresponding R code) to help next
students to perform GLM. Are you interested to publish it on the
"statsmodels documentation"? Can I send you by email (codes and step-
by-step explanations in an odt file)?

Thank you, additional tutorials are always very helpful
If you want you can send it to the list, or you can send it to
Skippers or my private email.
We could either add it as a file to the documentation or convert it to
restructured text so it can be directly included in the documentation.

Post by Sol
Re-again thanks for your advices ;o)

Keep asking and tell as if there are things unclear, or are buggy or look buggy.
Cheers,
Josef

Post by Sol
Best regards
sol

Post by Skipper Seabold

Post by Sol
Thanks Skipper and Josef for your detailed answers.
how to get results (as summary function in R) -->
estimates, standard errors, Z_value, Pr(>z) for intercept and
associated explanatory variables (I have both discrete and
continuous).
AIC ou likelihood
...

The results will either be attached to the res object that is returned
by fit. Ie., res.params are coefficients, res.bse are standard
errors, res.t() *should* actually return Z values for the GLM's (I'll
have to check), etc. Check the docstrings for
sm.glm.GLMResults.__doc__
or see here: <http://statsmodels.sourceforge.net/generated.scikits.statsmodels.glm....>
Or there is a summary method attached to the results object (not sure
if this is done yet for GLM, actually).
Skipper

j***@public.gmane.org

2010-11-22 13:32:54 UTC

Post by Sol
Hello,
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN

my guess is that this time it's the 0*log(0) is nan error, adding a
small value to y, and comparing with the other versions:
(familiess.family.Binomial.loglike)

y = np.asarray(y,float)

glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float), prepend=True),
family=sm.families.Binomial())
res=glm.fit(data_weights = nbtot)

print res.params

yobs, xobs = convert_case2obs(y, XYP)

res2 = sm.GLM(yobs, sm.add_constant(xobs, prepend=True),
family=sm.families.Binomial()).fit()
print res2.params
resl = sm.Logit(yobs, sm.add_constant(xobs, prepend=True)).fit()
print resl.params

[ -2.24652828e+00 -9.87710451e-07 -9.08729819e-08 -1.31120473e-01]
[ -2.24652828e+00 -9.87710451e-07 -9.08729819e-08 -1.31120473e-01]
Optimization terminated successfully.
Current function value: 229.672697
Iterations 5
[ -2.24656656e+00 -9.87635018e-07 -9.08699857e-08 -1.31112889e-01]

The values differ a bit from R but they are at the default numerical
precision for the optimizers, and
none of the parameters is significant.
adding tol=1e-10 in glm fit didn't change the results.

Is there a difference in how R sets the convergence criteria, which
might be relevant in this case?
I tend to belief our numbers since Logit (direct implementation of MLE
with analytical Gradient and Hessian) returns the same as glm, but
that one also works most likely at the convergence tolerance levels.

Thank you for reporting these problems, 0*log0 shows up at several
places including scipy and stackoverflow, and we might have to find a
more general solution.

I'm not sure, about the statistically insignificant numerical
differences to R. I tried also the glm fit with demeaned XYP, but the
results don't change much and still differ from R.

I will open a ticket for the Binomial loglike.

Josef

Post by Sol
File "C:\Users\...",
line 153, in <module>
res=glm.fit(data_weights = nbtot)
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-
py2.6.egg\scikits\statsmodels\glm.py", line 378, in fit
returned a nan. This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.
My glm is
glm = sm.GLM(y.astype(float), sm.add_constant(XYP.astype(float)),
family=sm.family.Binomial())
res=glm.fit(data_weights = nbtot)
and my data are for example
y = [[1.0, 15.0], [1.0, 18.0], [1.0, 19.0], [2.0, 20.0], [3.0, 21.0],
[2.0, 22.0], [1.0, 25.0], [2.0, 27.0], [1.0, 30.0], [1.0, 34.0], [1.0,
34.0], [0.0, 36.0], [2.0, 34.0], [2.0, 34.0], [1.0, 37.0], [3.0,
36.0], [3.0, 36.0], [3.0, 39.0], [0.0, 45.0], [3.0, 43.0], [5.0,
43.0], [1.0, 48.0], [1.0, 48.0], [4.0, 48.0], [0.0, 52.0], [1.0,
51.0], [2.0, 50.0], [1.0, 52.0], [1.0, 54.0], [0.0, 55.0], [6.0,
63.0], [1.0, 75.0]]
--------
X = [[876748.973], [720567.16299999994], [320150.36310000002],
[583218.3175], [408575.89500000002], [472261.23489999998],
[824823.13659999997], [538344.76379999996], [842367.61600000004],
[338325.4326], [295414.4596], [367216.44319999998],
[383680.80109999998], [749160.96149999998], [832882.53350000002],
[414151.92540000001], [714448.80689999997], [379632.01740000001],
[833843.3578], [809404.05850000004], [655167.06129999994],
[525426.95799999998], [752852.99410000001], [307259.39569999999],
[695146.15410000004], [764498.94220000005], [870005.8885],
[287810.36540000001], [388498.35110000003], [819041.00199999998],
[334226.90710000001], [573148.73690000002]]
--------
Y = [[2403908.9360000002], [2438478.3020000001], [2331190.0350000001],
[2369150.0660000001], [2334425.2570000002], [2353680.696],
[2456066.3250000002], [2366416.9180000001], [2460245.0040000002],
[2354388.6329999999], [2375419.429], [2336460.1949999998],
[2105292.3700000001], [2476741.588], [2398922.7119999998],
[2311211.449], [2474548.6600000001], [2358930.898],
[2453682.1030000001], [2463081.9350000001], [2447596.0210000002],
[2362995.2039999999], [2503244.5490000001], [2284394.2790000001],
[2466206.2760000001], [2416882.2779999999], [2420118.1549999998],
[2351559.3870000001], [2136919.9670000002], [2414933.3399999999],
[2208447.912], [2379643.2480000001]]
--------
PERIODE = [[0], [0], [0], [0], [1], [1], [0], [0], [1], [1], [0], [0],
[0], [0], [0], [0], [0], [1], [0], [1], [1], [1], [1], [1], [1], [1],
[1], [1], [1], [1], [1], [1]]
--------
nbtot = [16, 19, 20, 22, 24, 24, 26, 29, 31, 35, 35, 36, 36, 36, 38,
39, 39, 42, 45, 46, 48, 49, 49, 52, 52, 52, 52, 53, 55, 55, 69, 76]
with
XY = numpy.concatenate((X, Y), axis=1)
XYP = numpy.concatenate((XY, PERIODE), axis=1)
XYP.astype(float)
After reading glm.py and family.py I don't know from where it comes.
glm(formula = (y/(tot - y)) ~ X + Y + P, family = binomial(),
weights = (tot))
Min 1Q Median 3Q Max
-2.11865 -0.93983 0.02340 0.67569 2.20055
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.239e+00 4.066e+00 -0.551 0.582
X -1.039e-06 9.162e-07 -1.134 0.257
Y -5.255e-08 1.862e-06 -0.028 0.977
P -1.230e-01 2.771e-01 -0.444 0.657
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 44.098 on 31 degrees of freedom
Residual deviance: 41.346 on 28 degrees of freedom
AIC: 116.04
Number of Fisher Scoring iterations: 5
Any idea please?
sol

Post by Sol
Wow you are impressive! ^^
Thankssss!
I try to converte and finish my R/statsmodels tutorial in .rsf for
tomorrow.
Best regards
sol

Post by Sol
Sorry again Skipper or Josef,
so GLM performs by default *without intercept*, can I get an
intercept?

we have a utility function sm.add_constant as in the example I showed before
mod = sm.GLM(Y.astype(float), sm.add_constant(X), family=sm.families.Binomial())
It has a prepend option to add the constant as the first column
instead of at the end (default)

Post by Sol
After our discussions about, I have writen a "tutorial", step-by-step
on how to use Statsmodels (and its corresponding R code) to help next
students to perform GLM. Are you interested to publish it on the
"statsmodels documentation"? Can I send you by email (codes and step-
by-step explanations in an odt file)?

Thank you, additional tutorials are always very helpful
If you want you can send it to the list, or you can send it to
Skippers or my private email.
We could either add it as a file to the documentation or convert it to
restructured text so it can be directly included in the documentation.

Post by Sol
Re-again thanks for your advices ;o)

Keep asking and tell as if there are things unclear, or are buggy or look buggy.
Cheers,
Josef

Post by Sol
Best regards
sol

Post by Skipper Seabold

Post by Sol
Thanks Skipper and Josef for your detailed answers.
how to get results (as summary function in R) -->
estimates, standard errors, Z_value, Pr(>z) for intercept and
associated explanatory variables (I have both discrete and
continuous).
AIC ou likelihood
...

The results will either be attached to the res object that is returned
by fit. Ie., res.params are coefficients, res.bse are standard
errors, res.t() *should* actually return Z values for the GLM's (I'll
have to check), etc. Check the docstrings for
sm.glm.GLMResults.__doc__
or see here: <http://statsmodels.sourceforge.net/generated.scikits.statsmodels.glm....>
Or there is a summary method attached to the results object (not sure
if this is done yet for GLM, actually).
Skipper

j***@public.gmane.org

2010-11-22 14:46:01 UTC

I forgot to mention: if you add , prepend=True in add_constant, then
you get the constant as the first column as in R, e.g.

sm.add_constant(XYP, prepend=True)

Josef

Skipper Seabold

2010-11-22 14:50:46 UTC

Post by j***@public.gmane.org

Post by Sol
Hello,
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN

my guess is that this time it's the 0*log(0) is nan error, adding a
(familiess.family.Binomial.loglike)
y = np.asarray(y,float)
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float), prepend=True),
family=sm.families.Binomial())
res=glm.fit(data_weights = nbtot)
print res.params
yobs, xobs = convert_case2obs(y, XYP)
res2 = sm.GLM(yobs, sm.add_constant(xobs, prepend=True),
family=sm.families.Binomial()).fit()
print res2.params
resl = sm.Logit(yobs, sm.add_constant(xobs, prepend=True)).fit()
print resl.params
[ -2.24652828e+00 -9.87710451e-07 -9.08729819e-08 -1.31120473e-01]
[ -2.24652828e+00 -9.87710451e-07 -9.08729819e-08 -1.31120473e-01]
Optimization terminated successfully.
Current function value: 229.672697
Iterations 5
[ -2.24656656e+00 -9.87635018e-07 -9.08699857e-08 -1.31112889e-01]
The values differ a bit from R but they are at the default numerical
precision for the optimizers, and
none of the parameters is significant.
adding tol=1e-10 in glm fit didn't change the results.
Is there a difference in how R sets the convergence criteria, which
might be relevant in this case?
I tend to belief our numbers since Logit (direct implementation of MLE
with analytical Gradient and Hessian) returns the same as glm, but
that one also works most likely at the convergence tolerance levels.
Thank you for reporting these problems, 0*log0 shows up at several
places including scipy and stackoverflow, and we might have to find a
more general solution.
I'm not sure, about the statistically insignificant numerical
differences to R. I tried also the glm fit with demeaned XYP, but the
results don't change much and still differ from R.

I want to say that R does scaling by default during optimization, but
I'm not sure.

Post by j***@public.gmane.org
I will open a ticket for the Binomial loglike.
Josef

Yeah, that looks like it. Thanks, I will have a look.

Skipper

Sol

2010-11-24 15:28:04 UTC

hello,
again me :oS
Your proposition worked well for all cases till I have a complete
successfull experiment (see above line 2 --> 19 successes for 0
failures). It returned me the same error message about Nan value.
[ 14. 2.]
[ 19. 0.]
[ 16. 4.]
[ 11. 11.]
[ 13. 11.]
[ 17. 7.]
[ 24. 2.]
[ 28. 1.]
[ 18. 13.]
[ 9. 26.]
[ 20. 15.]
[ 20. 16.]
[ 19. 17.]
[ 12. 24.]
[ 16. 22.]
[ 14. 25.]
[ 12. 27.]
[ 20. 22.]
[ 12. 33.]
[ 11. 35.]
[ 16. 32.]
[ 13. 36.]
[ 19. 30.]
[ 14. 38.]
[ 12. 40.]
[ 21. 31.]
[ 19. 33.]
[ 22. 31.]
[ 17. 38.]
[ 14. 41.]
[ 17. 52.]
[ 29. 47.]

code was
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float)), family=sm.family.Binomial())
res=glm.fit(data_weights = nbtot)

So, adding +1e-200 may cause the error here
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-py2.6.egg
\scikits\statsmodels\glm.py", line 378, in fit
if np.isnan(dev):
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.

Do I have to inser a condition about presence/absence of a Nan (so
cheking for complete failure ou complete success within an experiment)
and to add or soustract +1e-200? Is it correct?*

Thanks again!
sol

Post by Skipper Seabold

Post by j***@public.gmane.org

Post by Sol
Hello,
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN

my guess is that this time it's the 0*log(0) is nan error, adding a
(familiess.family.Binomial.loglike)
y = np.asarray(y,float)
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float), prepend=True),
family=sm.families.Binomial())
res=glm.fit(data_weights = nbtot)
print res.params
yobs, xobs = convert_case2obs(y, XYP)
res2 = sm.GLM(yobs, sm.add_constant(xobs, prepend=True),
family=sm.families.Binomial()).fit()
print res2.params
resl = sm.Logit(yobs, sm.add_constant(xobs, prepend=True)).fit()
print resl.params
[ -2.24652828e+00 -9.87710451e-07 -9.08729819e-08 -1.31120473e-01]
[ -2.24652828e+00 -9.87710451e-07 -9.08729819e-08 -1.31120473e-01]
Optimization terminated successfully.
Current function value: 229.672697
Iterations 5
[ -2.24656656e+00 -9.87635018e-07 -9.08699857e-08 -1.31112889e-01]
The values differ a bit from R but they are at the default numerical
precision for the optimizers, and
none of the parameters is significant.
adding tol=1e-10 in glm fit didn't change the results.
Is there a difference in how R sets the convergence criteria, which
might be relevant in this case?
I tend to belief our numbers since Logit (direct implementation of MLE
with analytical Gradient and Hessian) returns the same as glm, but
that one also works most likely at the convergence tolerance levels.
Thank you for reporting these problems, 0*log0 shows up at several
places including scipy and stackoverflow, and we might have to find a
more general solution.
I'm not sure, about the statistically insignificant numerical
differences to R. I tried also the glm fit with demeaned XYP, but the
results don't change much and still differ from R.

I want to say that R does scaling by default during optimization, but
I'm not sure.

Post by j***@public.gmane.org
I will open a ticket for the Binomial loglike.
Josef

Yeah, that looks like it. Thanks, I will have a look.
Skipper

Sol

2010-11-24 15:28:13 UTC

hello,
again me :oS
Your proposition worked well for all cases till I have a complete
successfull experiment (see above line 2 --> 19 successes for 0
failures). It returned me the same error message about Nan value.
[ 14. 2.]
[ 19. 0.]
[ 16. 4.]
[ 11. 11.]
[ 13. 11.]
[ 17. 7.]
[ 24. 2.]
[ 28. 1.]
[ 18. 13.]
[ 9. 26.]
[ 20. 15.]
[ 20. 16.]
[ 19. 17.]
[ 12. 24.]
[ 16. 22.]
[ 14. 25.]
[ 12. 27.]
[ 20. 22.]
[ 12. 33.]
[ 11. 35.]
[ 16. 32.]
[ 13. 36.]
[ 19. 30.]
[ 14. 38.]
[ 12. 40.]
[ 21. 31.]
[ 19. 33.]
[ 22. 31.]
[ 17. 38.]
[ 14. 41.]
[ 17. 52.]
[ 29. 47.]

code was
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float)), family=sm.family.Binomial())
res=glm.fit(data_weights = nbtot)

So, adding +1e-200 may cause the error here
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-py2.6.egg
\scikits\statsmodels\glm.py", line 378, in fit
if np.isnan(dev):
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.

Do I have to inser a condition about presence/absence of a Nan (so
cheking for complete failure ou complete success within an experiment)
and to add or soustract +1e-200? Is it correct?*

Thanks again!
sol

Post by Skipper Seabold

Post by j***@public.gmane.org

Post by Sol
Hello,
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN

my guess is that this time it's the 0*log(0) is nan error, adding a
(familiess.family.Binomial.loglike)
y = np.asarray(y,float)
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float), prepend=True),
family=sm.families.Binomial())
res=glm.fit(data_weights = nbtot)
print res.params
yobs, xobs = convert_case2obs(y, XYP)
res2 = sm.GLM(yobs, sm.add_constant(xobs, prepend=True),
family=sm.families.Binomial()).fit()
print res2.params
resl = sm.Logit(yobs, sm.add_constant(xobs, prepend=True)).fit()
print resl.params
[ -2.24652828e+00 -9.87710451e-07 -9.08729819e-08 -1.31120473e-01]
[ -2.24652828e+00 -9.87710451e-07 -9.08729819e-08 -1.31120473e-01]
Optimization terminated successfully.
Current function value: 229.672697
Iterations 5
[ -2.24656656e+00 -9.87635018e-07 -9.08699857e-08 -1.31112889e-01]
The values differ a bit from R but they are at the default numerical
precision for the optimizers, and
none of the parameters is significant.
adding tol=1e-10 in glm fit didn't change the results.
Is there a difference in how R sets the convergence criteria, which
might be relevant in this case?
I tend to belief our numbers since Logit (direct implementation of MLE
with analytical Gradient and Hessian) returns the same as glm, but
that one also works most likely at the convergence tolerance levels.
Thank you for reporting these problems, 0*log0 shows up at several
places including scipy and stackoverflow, and we might have to find a
more general solution.
I'm not sure, about the statistically insignificant numerical
differences to R. I tried also the glm fit with demeaned XYP, but the
results don't change much and still differ from R.

I want to say that R does scaling by default during optimization, but
I'm not sure.

Post by j***@public.gmane.org
I will open a ticket for the Binomial loglike.
Josef

Yeah, that looks like it. Thanks, I will have a look.
Skipper

Sol

2010-11-24 15:28:21 UTC

hello,
again me :oS
Your proposition worked well for all cases till I have a complete
successfull experiment (see above line 2 --> 19 successes for 0
failures). It returned me the same error message about Nan value.
[ 14. 2.]
[ 19. 0.]
[ 16. 4.]
[ 11. 11.]
[ 13. 11.]
[ 17. 7.]
[ 24. 2.]
[ 28. 1.]
[ 18. 13.]
[ 9. 26.]
[ 20. 15.]
[ 20. 16.]
[ 19. 17.]
[ 12. 24.]
[ 16. 22.]
[ 14. 25.]
[ 12. 27.]
[ 20. 22.]
[ 12. 33.]
[ 11. 35.]
[ 16. 32.]
[ 13. 36.]
[ 19. 30.]
[ 14. 38.]
[ 12. 40.]
[ 21. 31.]
[ 19. 33.]
[ 22. 31.]
[ 17. 38.]
[ 14. 41.]
[ 17. 52.]
[ 29. 47.]

code was
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float)), family=sm.family.Binomial())
res=glm.fit(data_weights = nbtot)

So, adding +1e-200 may cause the error here
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-py2.6.egg
\scikits\statsmodels\glm.py", line 378, in fit
if np.isnan(dev):
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.

Do I have to inser a condition about presence/absence of a Nan (so
cheking for complete failure ou complete success within an experiment)
and to add or soustract +1e-200? Is it correct?*

Thanks again!
sol

Post by Skipper Seabold

Post by j***@public.gmane.org

Post by Sol
Hello,
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN

my guess is that this time it's the 0*log(0) is nan error, adding a
(familiess.family.Binomial.loglike)
y = np.asarray(y,float)
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float), prepend=True),
family=sm.families.Binomial())
res=glm.fit(data_weights = nbtot)
print res.params
yobs, xobs = convert_case2obs(y, XYP)
res2 = sm.GLM(yobs, sm.add_constant(xobs, prepend=True),
family=sm.families.Binomial()).fit()
print res2.params
resl = sm.Logit(yobs, sm.add_constant(xobs, prepend=True)).fit()
print resl.params
[ -2.24652828e+00 -9.87710451e-07 -9.08729819e-08 -1.31120473e-01]
[ -2.24652828e+00 -9.87710451e-07 -9.08729819e-08 -1.31120473e-01]
Optimization terminated successfully.
Current function value: 229.672697
Iterations 5
[ -2.24656656e+00 -9.87635018e-07 -9.08699857e-08 -1.31112889e-01]
The values differ a bit from R but they are at the default numerical
precision for the optimizers, and
none of the parameters is significant.
adding tol=1e-10 in glm fit didn't change the results.
Is there a difference in how R sets the convergence criteria, which
might be relevant in this case?
I tend to belief our numbers since Logit (direct implementation of MLE
with analytical Gradient and Hessian) returns the same as glm, but
that one also works most likely at the convergence tolerance levels.
Thank you for reporting these problems, 0*log0 shows up at several
places including scipy and stackoverflow, and we might have to find a
more general solution.
I'm not sure, about the statistically insignificant numerical
differences to R. I tried also the glm fit with demeaned XYP, but the
results don't change much and still differ from R.

I want to say that R does scaling by default during optimization, but
I'm not sure.

Post by j***@public.gmane.org
I will open a ticket for the Binomial loglike.
Josef

Yeah, that looks like it. Thanks, I will have a look.
Skipper

Sol

2010-11-24 15:39:15 UTC

I just tried to add +2e-200 to data_weights...it failed again (Nan
error)

...may be a silly try, but just to tell :oS

Post by Sol
hello,
again me :oS
Your proposition worked well for all cases till I have a complete
successfull experiment (see above line 2 --> 19 successes for 0
failures). It returned me the same error message about Nan value.
[ 14. 2.]
[ 19. 0.]
[ 16. 4.]
[ 11. 11.]
[ 13. 11.]
[ 17. 7.]
[ 24. 2.]
[ 28. 1.]
[ 18. 13.]
[ 9. 26.]
[ 20. 15.]
[ 20. 16.]
[ 19. 17.]
[ 12. 24.]
[ 16. 22.]
[ 14. 25.]
[ 12. 27.]
[ 20. 22.]
[ 12. 33.]
[ 11. 35.]
[ 16. 32.]
[ 13. 36.]
[ 19. 30.]
[ 14. 38.]
[ 12. 40.]
[ 21. 31.]
[ 19. 33.]
[ 22. 31.]
[ 17. 38.]
[ 14. 41.]
[ 17. 52.]
[ 29. 47.]
code was
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float)), family=sm.family.Binomial())
res=glm.fit(data_weights = nbtot)
So, adding +1e-200 may cause the error here
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-py2.6.egg
\scikits\statsmodels\glm.py", line 378, in fit
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.
Do I have to inser a condition about presence/absence of a Nan (so
cheking for complete failure ou complete success within an experiment)
and to add or soustract +1e-200? Is it correct?*
Thanks again!
sol

Post by Skipper Seabold

Post by j***@public.gmane.org

Post by Sol
Hello,
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN

my guess is that this time it's the 0*log(0) is nan error, adding a
(familiess.family.Binomial.loglike)
y = np.asarray(y,float)
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float), prepend=True),
family=sm.families.Binomial())
res=glm.fit(data_weights = nbtot)
print res.params
yobs, xobs = convert_case2obs(y, XYP)
res2 = sm.GLM(yobs, sm.add_constant(xobs, prepend=True),
family=sm.families.Binomial()).fit()
print res2.params
resl = sm.Logit(yobs, sm.add_constant(xobs, prepend=True)).fit()
print resl.params
[ -2.24652828e+00 -9.87710451e-07 -9.08729819e-08 -1.31120473e-01]
[ -2.24652828e+00 -9.87710451e-07 -9.08729819e-08 -1.31120473e-01]
Optimization terminated successfully.
Current function value: 229.672697
Iterations 5
[ -2.24656656e+00 -9.87635018e-07 -9.08699857e-08 -1.31112889e-01]
The values differ a bit from R but they are at the default numerical
precision for the optimizers, and
none of the parameters is significant.
adding tol=1e-10 in glm fit didn't change the results.
Is there a difference in how R sets the convergence criteria, which
might be relevant in this case?
I tend to belief our numbers since Logit (direct implementation of MLE
with analytical Gradient and Hessian) returns the same as glm, but
that one also works most likely at the convergence tolerance levels.
Thank you for reporting these problems, 0*log0 shows up at several
places including scipy and stackoverflow, and we might have to find a
more general solution.
I'm not sure, about the statistically insignificant numerical
differences to R. I tried also the glm fit with demeaned XYP, but the
results don't change much and still differ from R.

I want to say that R does scaling by default during optimization, but
I'm not sure.

Post by j***@public.gmane.org
I will open a ticket for the Binomial loglike.
Josef

Yeah, that looks like it. Thanks, I will have a look.
Skipper

j***@public.gmane.org

2010-11-24 15:45:36 UTC

Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS
From my reading of the code this should have worked. Because of the

0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.

I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?

Josef

Post by Sol
hello,
again me :oS
Your proposition worked well for all cases till I have a complete
successfull experiment (see above line 2 --> 19 successes for 0
failures). It returned me the same error message about Nan value.
[ 14. 2.]
[ 19. 0.]
[ 16. 4.]
[ 11. 11.]
[ 13. 11.]
[ 17. 7.]
[ 24. 2.]
[ 28. 1.]
[ 18. 13.]
[ 9. 26.]
[ 20. 15.]
[ 20. 16.]
[ 19. 17.]
[ 12. 24.]
[ 16. 22.]
[ 14. 25.]
[ 12. 27.]
[ 20. 22.]
[ 12. 33.]
[ 11. 35.]
[ 16. 32.]
[ 13. 36.]
[ 19. 30.]
[ 14. 38.]
[ 12. 40.]
[ 21. 31.]
[ 19. 33.]
[ 22. 31.]
[ 17. 38.]
[ 14. 41.]
[ 17. 52.]
[ 29. 47.]
code was
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float)), family=sm.family.Binomial())
res=glm.fit(data_weights = nbtot)
So, adding +1e-200 may cause the error here
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-py2.6.egg
\scikits\statsmodels\glm.py", line 378, in fit
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.
Do I have to inser a condition about presence/absence of a Nan (so
cheking for complete failure ou complete success within an experiment)
and to add or soustract +1e-200? Is it correct?*
Thanks again!
sol

Post by Skipper Seabold

Post by j***@public.gmane.org

Post by Sol
Hello,
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN

my guess is that this time it's the 0*log(0) is nan error, adding a
(familiess.family.Binomial.loglike)
y = np.asarray(y,float)
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float), prepend=True),
family=sm.families.Binomial())
res=glm.fit(data_weights = nbtot)
print res.params
yobs, xobs = convert_case2obs(y, XYP)
res2 = sm.GLM(yobs, sm.add_constant(xobs, prepend=True),
family=sm.families.Binomial()).fit()
print res2.params
resl = sm.Logit(yobs, sm.add_constant(xobs, prepend=True)).fit()
print resl.params
[ -2.24652828e+00 -9.87710451e-07 -9.08729819e-08 -1.31120473e-01]
[ -2.24652828e+00 -9.87710451e-07 -9.08729819e-08 -1.31120473e-01]
Optimization terminated successfully.
Current function value: 229.672697
Iterations 5
[ -2.24656656e+00 -9.87635018e-07 -9.08699857e-08 -1.31112889e-01]
The values differ a bit from R but they are at the default numerical
precision for the optimizers, and
none of the parameters is significant.
adding tol=1e-10 in glm fit didn't change the results.
Is there a difference in how R sets the convergence criteria, which
might be relevant in this case?
I tend to belief our numbers since Logit (direct implementation of MLE
with analytical Gradient and Hessian) returns the same as glm, but
that one also works most likely at the convergence tolerance levels.
Thank you for reporting these problems, 0*log0 shows up at several
places including scipy and stackoverflow, and we might have to find a
more general solution.
I'm not sure, about the statistically insignificant numerical
differences to R. I tried also the glm fit with demeaned XYP, but the
results don't change much and still differ from R.

I want to say that R does scaling by default during optimization, but
I'm not sure.

Post by j***@public.gmane.org
I will open a ticket for the Binomial loglike.
Josef

Yeah, that looks like it. Thanks, I will have a look.
Skipper

j***@public.gmane.org

2010-11-24 15:48:43 UTC

Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS

From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?

If you copy the output of

Post by Sol
y

array([....

then it would be easier to copy your data, and I don't have to parse the list.

Josef

Post by Sol
Josef

Post by Sol
hello,
again me :oS
Your proposition worked well for all cases till I have a complete
successfull experiment (see above line 2 --> 19 successes for 0
failures). It returned me the same error message about Nan value.
[ 14. 2.]
[ 19. 0.]
[ 16. 4.]
[ 11. 11.]
[ 13. 11.]
[ 17. 7.]
[ 24. 2.]
[ 28. 1.]
[ 18. 13.]
[ 9. 26.]
[ 20. 15.]
[ 20. 16.]
[ 19. 17.]
[ 12. 24.]
[ 16. 22.]
[ 14. 25.]
[ 12. 27.]
[ 20. 22.]
[ 12. 33.]
[ 11. 35.]
[ 16. 32.]
[ 13. 36.]
[ 19. 30.]
[ 14. 38.]
[ 12. 40.]
[ 21. 31.]
[ 19. 33.]
[ 22. 31.]
[ 17. 38.]
[ 14. 41.]
[ 17. 52.]
[ 29. 47.]
code was
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float)), family=sm.family.Binomial())
res=glm.fit(data_weights = nbtot)
So, adding +1e-200 may cause the error here
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-py2.6.egg
\scikits\statsmodels\glm.py", line 378, in fit
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.
Do I have to inser a condition about presence/absence of a Nan (so
cheking for complete failure ou complete success within an experiment)
and to add or soustract +1e-200? Is it correct?*
Thanks again!
sol

Post by Skipper Seabold

Post by j***@public.gmane.org

Post by Sol
Hello,
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN

my guess is that this time it's the 0*log(0) is nan error, adding a
(familiess.family.Binomial.loglike)
y = np.asarray(y,float)
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float), prepend=True),
family=sm.families.Binomial())
res=glm.fit(data_weights = nbtot)
print res.params
yobs, xobs = convert_case2obs(y, XYP)
res2 = sm.GLM(yobs, sm.add_constant(xobs, prepend=True),
family=sm.families.Binomial()).fit()
print res2.params
resl = sm.Logit(yobs, sm.add_constant(xobs, prepend=True)).fit()
print resl.params
[ -2.24652828e+00 -9.87710451e-07 -9.08729819e-08 -1.31120473e-01]
[ -2.24652828e+00 -9.87710451e-07 -9.08729819e-08 -1.31120473e-01]
Optimization terminated successfully.
Current function value: 229.672697
Iterations 5
[ -2.24656656e+00 -9.87635018e-07 -9.08699857e-08 -1.31112889e-01]
The values differ a bit from R but they are at the default numerical
precision for the optimizers, and
none of the parameters is significant.
adding tol=1e-10 in glm fit didn't change the results.
Is there a difference in how R sets the convergence criteria, which
might be relevant in this case?
I tend to belief our numbers since Logit (direct implementation of MLE
with analytical Gradient and Hessian) returns the same as glm, but
that one also works most likely at the convergence tolerance levels.
Thank you for reporting these problems, 0*log0 shows up at several
places including scipy and stackoverflow, and we might have to find a
more general solution.
I'm not sure, about the statistically insignificant numerical
differences to R. I tried also the glm fit with demeaned XYP, but the
results don't change much and still differ from R.

I want to say that R does scaling by default during optimization, but
I'm not sure.

Post by j***@public.gmane.org
I will open a ticket for the Binomial loglike.
Josef

Yeah, that looks like it. Thanks, I will have a look.
Skipper

Skipper Seabold

2010-11-24 15:49:46 UTC

Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS

From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?

Does it work in trunk? This looks like the original boundary problem
in glm where the predicted value is equal to y (I think it does
something like log(1 - y/mu)), but I thought I fixed it for Binomial.
Maybe only for Negative Binomial.

I will have a look when I get to a machine where I can, but the
solution is to just add a clip in the family (I think). You can see
the fixes for the others to get an idea.

Skipper

Sol

2010-11-25 08:24:42 UTC

Hello,
I tried yesterday evening different cases and it failed only when i
get 0 failure somewhere

example of y:

array([[ 16., 0.],
[ 11., 8.],
[ 18., 2.],
[ 13., 9.],
[ 19., 5.],
[ 22., 2.],
[ 10., 16.],
[ 10., 19.],
[ 21., 10.],
[ 21., 14.],
[ 16., 19.],
[ 20., 16.],
[ 14., 22.],
[ 21., 15.],
[ 4., 34.],
[ 18., 21.],
[ 23., 16.],
[ 16., 26.],
[ 12., 33.],
[ 16., 30.],
[ 21., 27.],
[ 19., 30.],
[ 17., 32.],
[ 26., 26.],
[ 22., 30.],
[ 21., 31.],
[ 14., 38.],
[ 13., 40.],
[ 11., 44.],
[ 21., 34.],
[ 23., 46.],
[ 9., 67.]])

Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS

From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?

Does it work in trunk? This looks like the original boundary problem
in glm where the predicted value is equal to y (I think it does
something like log(1 - y/mu)), but I thought I fixed it for Binomial.
Maybe only for Negative Binomial.
I will have a look when I get to a machine where I can, but the
solution is to just add a clip in the family (I think). You can see
the fixes for the others to get an idea.
Skipper

Sol

2010-11-25 08:27:46 UTC

sorry, yes XYP are always the same. It just changed success / failures
(so y)

thanks for your help.

Post by Sol
Hello,
I tried yesterday evening different cases and it failed only when i
get 0 failure somewhere
array([[ 16., 0.],
[ 11., 8.],
[ 18., 2.],
[ 13., 9.],
[ 19., 5.],
[ 22., 2.],
[ 10., 16.],
[ 10., 19.],
[ 21., 10.],
[ 21., 14.],
[ 16., 19.],
[ 20., 16.],
[ 14., 22.],
[ 21., 15.],
[ 4., 34.],
[ 18., 21.],
[ 23., 16.],
[ 16., 26.],
[ 12., 33.],
[ 16., 30.],
[ 21., 27.],
[ 19., 30.],
[ 17., 32.],
[ 26., 26.],
[ 22., 30.],
[ 21., 31.],
[ 14., 38.],
[ 13., 40.],
[ 11., 44.],
[ 21., 34.],
[ 23., 46.],
[ 9., 67.]])

Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS

From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?

Does it work in trunk? This looks like the original boundary problem
in glm where the predicted value is equal to y (I think it does
something like log(1 - y/mu)), but I thought I fixed it for Binomial.
Maybe only for Negative Binomial.
I will have a look when I get to a machine where I can, but the
solution is to just add a clip in the family (I think). You can see
the fixes for the others to get an idea.
Skipper

Sol

2010-11-25 08:29:15 UTC

sorry, yes XYP are always the same. It just changed success / failures
(so y)

thanks for your help.

Post by Sol
Hello,
I tried yesterday evening different cases and it failed only when i
get 0 failure somewhere
array([[ 16., 0.],
[ 11., 8.],
[ 18., 2.],
[ 13., 9.],
[ 19., 5.],
[ 22., 2.],
[ 10., 16.],
[ 10., 19.],
[ 21., 10.],
[ 21., 14.],
[ 16., 19.],
[ 20., 16.],
[ 14., 22.],
[ 21., 15.],
[ 4., 34.],
[ 18., 21.],
[ 23., 16.],
[ 16., 26.],
[ 12., 33.],
[ 16., 30.],
[ 21., 27.],
[ 19., 30.],
[ 17., 32.],
[ 26., 26.],
[ 22., 30.],
[ 21., 31.],
[ 14., 38.],
[ 13., 40.],
[ 11., 44.],
[ 21., 34.],
[ 23., 46.],
[ 9., 67.]])

Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS

From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?

Does it work in trunk? This looks like the original boundary problem
in glm where the predicted value is equal to y (I think it does
something like log(1 - y/mu)), but I thought I fixed it for Binomial.
Maybe only for Negative Binomial.
I will have a look when I get to a machine where I can, but the
solution is to just add a clip in the family (I think). You can see
the fixes for the others to get an idea.
Skipper

Sol

2010-11-25 08:34:55 UTC

about already fixed errors: I use codes from
scikits.statsmodels-0.2.0.zip
Does it include them?

Post by Sol
sorry, yes XYP are always the same. It just changed success / failures
(so y)
thanks for your help.

Post by Sol
Hello,
I tried yesterday evening different cases and it failed only when i
get 0 failure somewhere
array([[ 16., 0.],
[ 11., 8.],
[ 18., 2.],
[ 13., 9.],
[ 19., 5.],
[ 22., 2.],
[ 10., 16.],
[ 10., 19.],
[ 21., 10.],
[ 21., 14.],
[ 16., 19.],
[ 20., 16.],
[ 14., 22.],
[ 21., 15.],
[ 4., 34.],
[ 18., 21.],
[ 23., 16.],
[ 16., 26.],
[ 12., 33.],
[ 16., 30.],
[ 21., 27.],
[ 19., 30.],
[ 17., 32.],
[ 26., 26.],
[ 22., 30.],
[ 21., 31.],
[ 14., 38.],
[ 13., 40.],
[ 11., 44.],
[ 21., 34.],
[ 23., 46.],
[ 9., 67.]])

Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS

From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?

Does it work in trunk? This looks like the original boundary problem
in glm where the predicted value is equal to y (I think it does
something like log(1 - y/mu)), but I thought I fixed it for Binomial.
Maybe only for Negative Binomial.
I will have a look when I get to a machine where I can, but the
solution is to just add a clip in the family (I think). You can see
the fixes for the others to get an idea.
Skipper

j***@public.gmane.org

2010-11-25 13:58:23 UTC

Post by Sol
about already fixed errors: I use codes from
scikits.statsmodels-0.2.0.zip
Does it include them?

No, there are some bugfixes that we did since we released 0.2.0.
Unfortunately, our current release is already two months behind
schedule.

The best would be if you could switch to 0.3-devel or the trunk
version. Then you wouldn't have to make the few adjustments that are
not backwards compatible, e.g. renaming family to families.

The trunk version might also still miss a few more recent changes. The
0.3.-devel version should be stable for the old parts that were also
present in 0.2.0, but the newer code is still moving quite a bit.

update:

I tried you zero failure example

print '\n case with zero failures'
mod_glm = sm.GLM(yn.astype(float)+1e-14,
sm.add_constant(XYP.astype(float), prepend=True),
family=sm.families.Binomial())

res = mod_glm.fit(data_weights = nbtot)
print res.params

--->

case with zero failures
[ -6.83285117e+00 -1.47701877e-06 3.19973011e-06 -3.81612790e-01]

----------
The zero failure bug is still present in 0.3-devel. Because in this
case the difference is compared to 1, the difference has to be
numerically significant for double precision. The smallest amount that
I had to add to get a working solution is 1e-14.

Essentially, the code for Binomial has a*log(b) at several different
places, and any of them could produce a nan for corner cases with
0*log(0).

For now, you have two options: you stay with 0.2.0 and work around
this corner case bugs by keeping away from zeros, or you switch to
0.3-devel, and we fix the bugs as fast as possible (and we try to keep
0.3.devel failure free). If you have bazaar installed the latter
version would be the most useful for you and for making statsmodels
more robust.

(I'm not sure whether it applies to glm Binomial, but in some cases
complete separation, that is perfect predictability of the outcome, is
a corner case that creates convergence problems and is not handled by
the models themselves.)

Josef

Post by Sol
sorry, yes XYP are always the same. It just changed success / failures
(so y)
thanks for your help.

Post by Sol
Hello,
I tried yesterday evening different cases and it failed only when i
get 0 failure somewhere
array([[ 16., 0.],
[ 11., 8.],
[ 18., 2.],
[ 13., 9.],
[ 19., 5.],
[ 22., 2.],
[ 10., 16.],
[ 10., 19.],
[ 21., 10.],
[ 21., 14.],
[ 16., 19.],
[ 20., 16.],
[ 14., 22.],
[ 21., 15.],
[ 4., 34.],
[ 18., 21.],
[ 23., 16.],
[ 16., 26.],
[ 12., 33.],
[ 16., 30.],
[ 21., 27.],
[ 19., 30.],
[ 17., 32.],
[ 26., 26.],
[ 22., 30.],
[ 21., 31.],
[ 14., 38.],
[ 13., 40.],
[ 11., 44.],
[ 21., 34.],
[ 23., 46.],
[ 9., 67.]])

Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS

From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?

Does it work in trunk? This looks like the original boundary problem
in glm where the predicted value is equal to y (I think it does
something like log(1 - y/mu)), but I thought I fixed it for Binomial.
Maybe only for Negative Binomial.
I will have a look when I get to a machine where I can, but the
solution is to just add a clip in the family (I think). You can see
the fixes for the others to get an idea.
Skipper

j***@public.gmane.org

2010-11-25 14:40:01 UTC

Post by j***@public.gmane.org

Post by Sol
about already fixed errors: I use codes from
scikits.statsmodels-0.2.0.zip
Does it include them?

No, there are some bugfixes that we did since we released 0.2.0.
Unfortunately, our current release is already two months behind
schedule.
The best would be if you could switch to 0.3-devel or the trunk
version. Then you wouldn't have to make the few adjustments that are
not backwards compatible, e.g. renaming family to families.
The trunk version might also still miss a few more recent changes. The
0.3.-devel version should be stable for the old parts that were also
present in 0.2.0, but the newer code is still moving quite a bit.
I tried you zero failure example
print '\n case with zero failures'
mod_glm = sm.GLM(yn.astype(float)+1e-14,
sm.add_constant(XYP.astype(float), prepend=True),
family=sm.families.Binomial())
res = mod_glm.fit(data_weights = nbtot)
print res.params
--->
case with zero failures
[ -6.83285117e+00 -1.47701877e-06 3.19973011e-06 -3.81612790e-01]
----------
The zero failure bug is still present in 0.3-devel. Because in this
case the difference is compared to 1, the difference has to be
numerically significant for double precision. The smallest amount that
I had to add to get a working solution is 1e-14.
Essentially, the code for Binomial has a*log(b) at several different
places, and any of them could produce a nan for corner cases with
0*log(0).
For now, you have two options: you stay with 0.2.0 and work around
this corner case bugs by keeping away from zeros, or you switch to
0.3-devel, and we fix the bugs as fast as possible (and we try to keep
0.3.devel failure free). If you have bazaar installed the latter
version would be the most useful for you and for making statsmodels
more robust.

I fixed it in my branch and will merge to 0.3-devel later today.

http://bazaar.launchpad.net/~josef-pktd/statsmodels/statsmodels-josef-experimental-gsoc/revision/2194

Adding 1e-200 shouldn't change any numerical results, but I haven't
run the test suite yet.

Sol, can I add your example to the statsmodels test suite, then I
don't have to find my own numbers. Or, maybe it will be easy to adapt
an existing test, I haven't looked yet.

Josef

Post by j***@public.gmane.org
(I'm not sure whether it applies to glm Binomial, but in some cases
complete separation, that is perfect predictability of the outcome, is
a corner case that creates convergence problems and is not handled by
the models themselves.)
Josef

Post by Sol
sorry, yes XYP are always the same. It just changed success / failures
(so y)
thanks for your help.

Post by Sol
Hello,
I tried yesterday evening different cases and it failed only when i
get 0 failure somewhere
array([[ 16., 0.],
[ 11., 8.],
[ 18., 2.],
[ 13., 9.],
[ 19., 5.],
[ 22., 2.],
[ 10., 16.],
[ 10., 19.],
[ 21., 10.],
[ 21., 14.],
[ 16., 19.],
[ 20., 16.],
[ 14., 22.],
[ 21., 15.],
[ 4., 34.],
[ 18., 21.],
[ 23., 16.],
[ 16., 26.],
[ 12., 33.],
[ 16., 30.],
[ 21., 27.],
[ 19., 30.],
[ 17., 32.],
[ 26., 26.],
[ 22., 30.],
[ 21., 31.],
[ 14., 38.],
[ 13., 40.],
[ 11., 44.],
[ 21., 34.],
[ 23., 46.],
[ 9., 67.]])

Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS

From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?

Does it work in trunk? This looks like the original boundary problem
in glm where the predicted value is equal to y (I think it does
something like log(1 - y/mu)), but I thought I fixed it for Binomial.
Maybe only for Negative Binomial.
I will have a look when I get to a machine where I can, but the
solution is to just add a clip in the family (I think). You can see
the fixes for the others to get an idea.
Skipper

Sol

2010-11-26 15:21:43 UTC

I just have a look at your answers since I have a huge working
day...so I will explore your two possible solutions to weigh the pros
and cons ;o)

I answer you above all about my examples --> I give you the right to
use and abuse of them, no problem, it will be a pleasure to help you
evenif it seems that I give you more works than I help ;o) . As well
to diffuse the beginner tutorial I send you ^^

Have a lovely week end and thanks for your anwers

Post by j***@public.gmane.org

Post by j***@public.gmane.org

Post by Sol
about already fixed errors: I use codes from
scikits.statsmodels-0.2.0.zip
Does it include them?

No, there are some bugfixes that we did since we released 0.2.0.
Unfortunately, our current release is already two months behind
schedule.
The best would be if you could switch to 0.3-devel or the trunk
version. Then you wouldn't have to make the few adjustments that are
not backwards compatible, e.g. renaming family to families.
The trunk version might also still miss a few more recent changes. The
0.3.-devel version should be stable for the old parts that were also
present in 0.2.0, but the newer code is still moving quite a bit.
I tried you zero failure example
print '\n case with zero failures'
mod_glm = sm.GLM(yn.astype(float)+1e-14,
sm.add_constant(XYP.astype(float), prepend=True),
family=sm.families.Binomial())
res = mod_glm.fit(data_weights = nbtot)
print res.params
--->
case with zero failures
[ -6.83285117e+00 -1.47701877e-06 3.19973011e-06 -3.81612790e-01]
----------
The zero failure bug is still present in 0.3-devel. Because in this
case the difference is compared to 1, the difference has to be
numerically significant for double precision. The smallest amount that
I had to add to get a working solution is 1e-14.
Essentially, the code for Binomial has a*log(b) at several different
places, and any of them could produce a nan for corner cases with
0*log(0).
For now, you have two options: you stay with 0.2.0 and work around
this corner case bugs by keeping away from zeros, or you switch to
0.3-devel, and we fix the bugs as fast as possible (and we try to keep
0.3.devel failure free). If you have bazaar installed the latter
version would be the most useful for you and for making statsmodels
more robust.

I fixed it in my branch and will merge to 0.3-devel later today.
http://bazaar.launchpad.net/~josef-pktd/statsmodels/statsmodels-josef...
Adding 1e-200 shouldn't change any numerical results, but I haven't
run the test suite yet.
Sol, can I add your example to the statsmodels test suite, then I
don't have to find my own numbers. Or, maybe it will be easy to adapt
an existing test, I haven't looked yet.
Josef

Post by j***@public.gmane.org
(I'm not sure whether it applies to glm Binomial, but in some cases
complete separation, that is perfect predictability of the outcome, is
a corner case that creates convergence problems and is not handled by
the models themselves.)
Josef

Post by Sol
sorry, yes XYP are always the same. It just changed success / failures
(so y)
thanks for your help.

Post by Sol
Hello,
I tried yesterday evening different cases and it failed only when i
get 0 failure somewhere
array([[ 16., 0.],
[ 11., 8.],
[ 18., 2.],
[ 13., 9.],
[ 19., 5.],
[ 22., 2.],
[ 10., 16.],
[ 10., 19.],
[ 21., 10.],
[ 21., 14.],
[ 16., 19.],
[ 20., 16.],
[ 14., 22.],
[ 21., 15.],
[ 4., 34.],
[ 18., 21.],
[ 23., 16.],
[ 16., 26.],
[ 12., 33.],
[ 16., 30.],
[ 21., 27.],
[ 19., 30.],
[ 17., 32.],
[ 26., 26.],
[ 22., 30.],
[ 21., 31.],
[ 14., 38.],
[ 13., 40.],
[ 11., 44.],
[ 21., 34.],
[ 23., 46.],
[ 9., 67.]])

Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS

From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?

Does it work in trunk? This looks like the original boundary problem
in glm where the predicted value is equal to y (I think it does
something like log(1 - y/mu)), but I thought I fixed it for Binomial.
Maybe only for Negative Binomial.
I will have a look when I get to a machine where I can, but the
solution is to just add a clip in the family (I think). You can see
the fixes for the others to get an idea.
Skipper

Sol

2010-11-25 08:24:49 UTC

Hello,
I tried yesterday evening different cases and it failed only when i
get 0 failure somewhere

example of y:

array([[ 16., 0.],
[ 11., 8.],
[ 18., 2.],
[ 13., 9.],
[ 19., 5.],
[ 22., 2.],
[ 10., 16.],
[ 10., 19.],
[ 21., 10.],
[ 21., 14.],
[ 16., 19.],
[ 20., 16.],
[ 14., 22.],
[ 21., 15.],
[ 4., 34.],
[ 18., 21.],
[ 23., 16.],
[ 16., 26.],
[ 12., 33.],
[ 16., 30.],
[ 21., 27.],
[ 19., 30.],
[ 17., 32.],
[ 26., 26.],
[ 22., 30.],
[ 21., 31.],
[ 14., 38.],
[ 13., 40.],
[ 11., 44.],
[ 21., 34.],
[ 23., 46.],
[ 9., 67.]])

Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS

From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?

Does it work in trunk? This looks like the original boundary problem
in glm where the predicted value is equal to y (I think it does
something like log(1 - y/mu)), but I thought I fixed it for Binomial.
Maybe only for Negative Binomial.
I will have a look when I get to a machine where I can, but the
solution is to just add a clip in the family (I think). You can see
the fixes for the others to get an idea.
Skipper

Continue reading on narkive:

Search results for '[glm] error with binomial family' (Questions and Answers)

show me some sample questions of IMU entrance test?

started 2009-11-07 07:21:05 UTC

standards & testing

A survey of 117,156 women who were free of heart disease showed that 31,101 had at least one?

started 2011-03-15 15:12:48 UTC

Atheists: Is it true that evolution is the backbone of biology?

started 2010-06-15 23:26:43 UTC

religion & spirituality

Estimating the probability for a binomial distribution and 95% confidence interval?

started 2012-11-02 04:00:30 UTC

PLEASE HELP!!! Probability question?

started 2009-05-05 08:18:38 UTC

30 Replies
203 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Sol 2010-11-15 14:58:14 UTC

Sol 2010-11-15 15:00:00 UTC

Skipper Seabold 2010-11-15 15:21:17 UTC

j***@public.gmane.org 2010-11-15 15:55:26 UTC

j***@public.gmane.org 2010-11-15 16:58:07 UTC

Sol 2010-11-16 13:52:51 UTC

Skipper Seabold 2010-11-16 14:16:29 UTC

Sol 2010-11-16 15:51:37 UTC

Skipper Seabold 2010-11-16 15:57:52 UTC

j***@public.gmane.org 2010-11-16 16:00:43 UTC

Sol 2010-11-16 16:07:42 UTC

Sol 2010-11-22 10:29:48 UTC

Sol 2010-11-22 10:44:21 UTC

j***@public.gmane.org 2010-11-22 13:32:54 UTC

j***@public.gmane.org 2010-11-22 14:46:01 UTC

Skipper Seabold 2010-11-22 14:50:46 UTC

Sol 2010-11-24 15:28:04 UTC

Sol 2010-11-24 15:28:13 UTC

Sol 2010-11-24 15:28:21 UTC

Sol 2010-11-24 15:39:15 UTC

j***@public.gmane.org 2010-11-24 15:45:36 UTC

j***@public.gmane.org 2010-11-24 15:48:43 UTC

Skipper Seabold 2010-11-24 15:49:46 UTC

Sol 2010-11-25 08:24:42 UTC

Sol 2010-11-25 08:27:46 UTC

Sol 2010-11-25 08:29:15 UTC

Sol 2010-11-25 08:34:55 UTC

j***@public.gmane.org 2010-11-25 13:58:23 UTC

j***@public.gmane.org 2010-11-25 14:40:01 UTC

Sol 2010-11-26 15:21:43 UTC

Sol 2010-11-25 08:24:49 UTC

about - legalese

Loading...