Discussion:
[glm] error with binomial family
Sol
2010-11-15 14:58:14 UTC
Permalink
Hello,
I'm trying to perform GLM with statsmodels
Thanks for implementing such code into python, and I hope I will help
to build higher detailed documentation/tutorial in futur ;o)

I tried this code:

import numpy
import scikits.statsmodels as sm

SUCCESS = numpy.array([81,71,74,87,22,26,31,7,61,97,78])
FAILURE = numpy.array([19,29,26,13,78,74,69,93,39,3,22])
TOTAL = numpy.array([100,100,100,100,100,100,100,100,100,100,100])

X = numpy.array([[1868.,0.709],[1633.,0.593],[1702.,0.394],
[2001.,0.941],[506.,0.528],[598.,0.859],[713.,0.294],[161.,0.724],
[1403.,0.182],[2231.,0.994],[1794.,0.057]])
Y = numpy.array([[81, 19],[71, 29],[74, 26],[87, 13],[22, 78],[26,
74],[31, 69],[7, 93],[61, 39],[97, 3],[78, 22]])

glm = sm.GLM(Y, X, family=sm.family.Binomial())

res = glm.fit(data_weights = 100)

I raised this error:
Traceback (most recent call last):
File "C:/Users/[...]/test scipy.py", line 29, in <module>
res = glm.fit() #data_weights = 100
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-
py2.6.egg\scikits\statsmodels\glm.py", line 378, in fit
returned a nan. This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.



Could you help me to succeed into GLM performing with statsmodels?

Another questions:
1- can I perform the same GLM but with varying "data_weights" --> one
trial at 99, another a
Sol
2010-11-15 15:00:00 UTC
Permalink
Hello,
I'm trying to perform GLM with statsmodels
Thanks for implementing such code into python, and I hope I will help
to build higher detailed documentation/tutorial in futur ;o)

I tried this code:
----------------------------------------------------------------------------------------------------------------------------------------------------
import numpy
import scikits.statsmodels as sm

SUCCESS = numpy.array([81,71,74,87,22,26,31,7,61,97,78])
FAILURE = numpy.array([19,29,26,13,78,74,69,93,39,3,22])
TOTAL = numpy.array([100,100,100,100,100,100,100,100,100,100,100])

X = numpy.array([[1868.,0.709],[1633.,0.593],[1702.,0.394],
[2001.,0.941],[506.,0.528],[598.,0.859],[713.,0.294],[161.,0.724],
[1403.,0.182],[2231.,0.994],[1794.,0.057]])
Y = numpy.array([[81, 19],[71, 29],[74, 26],[87, 13],[22, 78],[26,
74],[31, 69],[7, 93],[61, 39],[97, 3],[78, 22]])

glm = sm.GLM(Y, X, family=sm.family.Binomial())

res = glm.fit(data_weights = 100)
----------------------------------------------------------------------------------------------------------------------------------------------------
I raised this error:
----------------------------------------------------------------------------------------------------------------------------------------------------
Traceback (most recent call last):
File "C:/Users/[...]/test scipy.py", line 29, in <module>
res = glm.fit() #data_weights = 100
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-
py2.6.egg\scikits\statsmodels\glm.py", line 378, in fit
returned a nan. This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.
----------------------------------------------------------------------------------------------------------------------------------------------------


Could you help me to succeed into GLM performing with statsmodels?

Another questions:
1- can I perform the same GLM but with varying "data_weights" --> one
trial at 99, another at 102,...?

Thanks for all incoming advices and answers.
Skipper Seabold
2010-11-15 15:21:17 UTC
Permalink
Post by Sol
Hello,
I'm trying to perform GLM with statsmodels
Thanks for implementing such code into python, and I hope I will help
to build higher detailed documentation/tutorial in futur ;o)
----------------------------------------------------------------------------------------------------------------------------------------------------
import numpy
import scikits.statsmodels as sm
SUCCESS = numpy.array([81,71,74,87,22,26,31,7,61,97,78])
FAILURE = numpy.array([19,29,26,13,78,74,69,93,39,3,22])
TOTAL = numpy.array([100,100,100,100,100,100,100,100,100,100,100])
X = numpy.array([[1868.,0.709],[1633.,0.593],[1702.,0.394],
[2001.,0.941],[506.,0.528],[598.,0.859],[713.,0.294],[161.,0.724],
[1403.,0.182],[2231.,0.994],[1794.,0.057]])
Y =  numpy.array([[81, 19],[71, 29],[74, 26],[87, 13],[22, 78],[26,
74],[31, 69],[7,  93],[61, 39],[97, 3],[78, 22]])
glm = sm.GLM(Y, X, family=sm.family.Binomial())
res = glm.fit(data_weights = 100)
----------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------
 File "C:/Users/[...]/test scipy.py", line 29, in <module>
   res = glm.fit()         #data_weights = 100
 File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-
py2.6.egg\scikits\statsmodels\glm.py", line 378, in fit
   returned a nan.  This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.
----------------------------------------------------------------------------------------------------------------------------------------------------
Could you help me to succeed into GLM performing with statsmodels?
1- can I perform the same GLM but with varying "data_weights" --> one
trial at 99, another at 102,...?
Thanks for all incoming advices and answers.
Without looking too much closer at the moment, I think it's doing
integer division. Try

Y = Y.astype(float)
TOTAL = TOTAL.astype(float)

glm = sm.GLM(Y,X, family=sm.families.Binomial())
res = glm.fit(data_weights=TOTAL)

Also note that sm.family is now sm.families in newer versions of statsmodels.

Skipper
j***@public.gmane.org
2010-11-15 15:55:26 UTC
Permalink
Post by Skipper Seabold
Post by Sol
Hello,
I'm trying to perform GLM with statsmodels
Thanks for implementing such code into python, and I hope I will help
to build higher detailed documentation/tutorial in futur ;o)
----------------------------------------------------------------------------------------------------------------------------------------------------
import numpy
import scikits.statsmodels as sm
SUCCESS = numpy.array([81,71,74,87,22,26,31,7,61,97,78])
FAILURE = numpy.array([19,29,26,13,78,74,69,93,39,3,22])
TOTAL = numpy.array([100,100,100,100,100,100,100,100,100,100,100])
X = numpy.array([[1868.,0.709],[1633.,0.593],[1702.,0.394],
[2001.,0.941],[506.,0.528],[598.,0.859],[713.,0.294],[161.,0.724],
[1403.,0.182],[2231.,0.994],[1794.,0.057]])
Y =  numpy.array([[81, 19],[71, 29],[74, 26],[87, 13],[22, 78],[26,
74],[31, 69],[7,  93],[61, 39],[97, 3],[78, 22]])
glm = sm.GLM(Y, X, family=sm.family.Binomial())
res = glm.fit(data_weights = 100)
----------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------
 File "C:/Users/[...]/test scipy.py", line 29, in <module>
   res = glm.fit()         #data_weights = 100
 File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-
py2.6.egg\scikits\statsmodels\glm.py", line 378, in fit
   returned a nan.  This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.
----------------------------------------------------------------------------------------------------------------------------------------------------
Could you help me to succeed into GLM performing with statsmodels?
1- can I perform the same GLM but with varying "data_weights" --> one
trial at 99, another at 102,...?
Thanks for all incoming advices and answers.
Without looking too much closer at the moment, I think it's doing
integer division.  Try
Y = Y.astype(float)
TOTAL = TOTAL.astype(float)
glm = sm.GLM(Y,X, family=sm.families.Binomial())
res = glm.fit(data_weights=TOTAL)
Also note that sm.family is now sm.families in newer versions of statsmodels.
good guess about integer, checked with expanding cases to obs

--------------------

mod = sm.GLM(Y.astype(float), X, family=sm.families.Binomial())
print mod.fit().params
print mod.fit(data_weights=TOTAL.astype(float)).params


import numpy as np

def convert_case2obs(y, X):
'''expand binomial data given by distinct cases to observations

Returns
-------
endog : ndarray, 1d
data array with 1 for success and 0 for failure

'''
y = np.asarray(y)
assert y.ndim == 2
ncases = y.shape[0]
nobs = y.sum(0)
ind0 = np.repeat(np.arange(ncases), y[:,0].astype(int))
ind1 = np.repeat(np.arange(ncases), y[:,1].astype(int))
failobs = np.column_stack((np.ones(nobs[0]), X[ind0,:]))
successobs = np.column_stack((np.zeros(nobs[1]), X[ind1,:]))
data = np.r_[successobs, failobs]
return data[:,0], data[:,1:]

yobs, xobs = convert_case2obs(Y, X)

print sm.GLM(yobs, xobs, family=sm.families.Binomial()).fit().params
print sm.Logit(yobs, xobs).fit().params

print '\nwith constant'
mod = sm.GLM(Y.astype(float), sm.add_constant(X), family=sm.families.Binomial())
print mod.fit().params
print mod.fit(data_weights=TOTAL.astype(float)).params

print sm.GLM(yobs, sm.add_constant(xobs),
family=sm.families.Binomial()).fit().params
print sm.Logit(yobs, sm.add_constant(xobs)).fit().params

--------------------

results, 2 versions of binomial agree and also the same as discretemod.logit

[ 1.25930498e-03 -1.80115251e+00]
[ 1.25930498e-03 -1.80115251e+00]
[ 1.25930498e-03 -1.80115251e+00]
Optimization terminated successfully.
Current function value: 619.589828
Iterations 5
[ 1.25930498e-03 -1.80115251e+00]

with constant
[ 2.17177355e-03 2.92779632e-01 -2.65329810e+00]
[ 2.17177355e-03 2.92779632e-01 -2.65329810e+00]
[ 2.17177355e-03 2.92779632e-01 -2.65329810e+00]
Optimization terminated successfully.
Current function value: 539.318195
Iterations 5
[ 2.17177355e-03 2.92779632e-01 -2.65329810e+00]


Josef
Post by Skipper Seabold
Skipper
j***@public.gmane.org
2010-11-15 16:58:07 UTC
Permalink
Post by j***@public.gmane.org
Post by Skipper Seabold
Post by Sol
Hello,
I'm trying to perform GLM with statsmodels
Thanks for implementing such code into python, and I hope I will help
to build higher detailed documentation/tutorial in futur ;o)
----------------------------------------------------------------------------------------------------------------------------------------------------
import numpy
import scikits.statsmodels as sm
SUCCESS = numpy.array([81,71,74,87,22,26,31,7,61,97,78])
FAILURE = numpy.array([19,29,26,13,78,74,69,93,39,3,22])
TOTAL = numpy.array([100,100,100,100,100,100,100,100,100,100,100])
X = numpy.array([[1868.,0.709],[1633.,0.593],[1702.,0.394],
[2001.,0.941],[506.,0.528],[598.,0.859],[713.,0.294],[161.,0.724],
[1403.,0.182],[2231.,0.994],[1794.,0.057]])
Y =  numpy.array([[81, 19],[71, 29],[74, 26],[87, 13],[22, 78],[26,
74],[31, 69],[7,  93],[61, 39],[97, 3],[78, 22]])
glm = sm.GLM(Y, X, family=sm.family.Binomial())
res = glm.fit(data_weights = 100)
----------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------
 File "C:/Users/[...]/test scipy.py", line 29, in <module>
   res = glm.fit()         #data_weights = 100
 File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-
py2.6.egg\scikits\statsmodels\glm.py", line 378, in fit
   returned a nan.  This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.
----------------------------------------------------------------------------------------------------------------------------------------------------
Could you help me to succeed into GLM performing with statsmodels?
1- can I perform the same GLM but with varying "data_weights" --> one
trial at 99, another at 102,...?
Thanks for all incoming advices and answers.
Without looking too much closer at the moment, I think it's doing
integer division.  Try
Y = Y.astype(float)
TOTAL = TOTAL.astype(float)
glm = sm.GLM(Y,X, family=sm.families.Binomial())
res = glm.fit(data_weights=TOTAL)
Also note that sm.family is now sm.families in newer versions of statsmodels.
good guess about integer, checked with expanding cases to obs
I fixed it in my branch and added a ticket
https://bugs.launchpad.net/statsmodels/+bug/675631

Sol, If you want to fix it in your copy of statsmodels, it's just
adding a `1.` in family/family.py Binomial initialize
so that it is
return y*1./self.n
instead of
return y/self.n

Josef
Post by j***@public.gmane.org
--------------------
mod = sm.GLM(Y.astype(float), X, family=sm.families.Binomial())
print mod.fit().params
print mod.fit(data_weights=TOTAL.astype(float)).params
import numpy as np
   '''expand binomial data given by distinct cases to observations
   Returns
   -------
   endog : ndarray, 1d
       data array with 1 for success and 0 for failure
   '''
   y = np.asarray(y)
   assert y.ndim == 2
   ncases = y.shape[0]
   nobs = y.sum(0)
   ind0 = np.repeat(np.arange(ncases), y[:,0].astype(int))
   ind1 = np.repeat(np.arange(ncases), y[:,1].astype(int))
   failobs = np.column_stack((np.ones(nobs[0]), X[ind0,:]))
   successobs = np.column_stack((np.zeros(nobs[1]), X[ind1,:]))
   data = np.r_[successobs, failobs]
   return data[:,0], data[:,1:]
yobs, xobs = convert_case2obs(Y, X)
print sm.GLM(yobs, xobs, family=sm.families.Binomial()).fit().params
print sm.Logit(yobs, xobs).fit().params
print '\nwith constant'
mod = sm.GLM(Y.astype(float), sm.add_constant(X), family=sm.families.Binomial())
print mod.fit().params
print mod.fit(data_weights=TOTAL.astype(float)).params
print sm.GLM(yobs, sm.add_constant(xobs),
family=sm.families.Binomial()).fit().params
print sm.Logit(yobs, sm.add_constant(xobs)).fit().params
--------------------
results, 2 versions of binomial agree and also the same as discretemod.logit
[  1.25930498e-03  -1.80115251e+00]
[  1.25930498e-03  -1.80115251e+00]
[  1.25930498e-03  -1.80115251e+00]
Optimization terminated successfully.
        Current function value: 619.589828
        Iterations 5
[  1.25930498e-03  -1.80115251e+00]
with constant
[  2.17177355e-03   2.92779632e-01  -2.65329810e+00]
[  2.17177355e-03   2.92779632e-01  -2.65329810e+00]
[  2.17177355e-03   2.92779632e-01  -2.65329810e+00]
Optimization terminated successfully.
        Current function value: 539.318195
        Iterations 5
[  2.17177355e-03   2.92779632e-01  -2.65329810e+00]
Josef
Post by Skipper Seabold
Skipper
Sol
2010-11-16 13:52:51 UTC
Permalink
Thanks Skipper and Josef for your detailed answers.

Another question:

how to get results (as summary function in R) -->
coefficients:
estimates, standard errors, Z_value, Pr(>z) for intercept and
associated explanatory variables (I have both discrete and
continuous).
AIC ou likelihood
...

Thanks for your interests.
Sol
Post by j***@public.gmane.org
Post by Skipper Seabold
Post by Sol
Hello,
I'm trying to perform GLM with statsmodels
Thanks for implementing such code into python, and I hope I will help
to build higher detailed documentation/tutorial in futur ;o)
----------------------------------------------------------------------------------------------------------------------------------------------------
import numpy
import scikits.statsmodels as sm
SUCCESS = numpy.array([81,71,74,87,22,26,31,7,61,97,78])
FAILURE = numpy.array([19,29,26,13,78,74,69,93,39,3,22])
TOTAL = numpy.array([100,100,100,100,100,100,100,100,100,100,100])
X = numpy.array([[1868.,0.709],[1633.,0.593],[1702.,0.394],
[2001.,0.941],[506.,0.528],[598.,0.859],[713.,0.294],[161.,0.724],
[1403.,0.182],[2231.,0.994],[1794.,0.057]])
Y =  numpy.array([[81, 19],[71, 29],[74, 26],[87, 13],[22, 78],[26,
74],[31, 69],[7,  93],[61, 39],[97, 3],[78, 22]])
glm = sm.GLM(Y, X, family=sm.family.Binomial())
res = glm.fit(data_weights = 100)
----------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------
 File "C:/Users/[...]/test scipy.py", line 29, in <module>
   res = glm.fit()         #data_weights = 100
 File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-
py2.6.egg\scikits\statsmodels\glm.py", line 378, in fit
   returned a nan.  This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.
----------------------------------------------------------------------------------------------------------------------------------------------------
Could you help me to succeed into GLM performing with statsmodels?
1- can I perform the same GLM but with varying "data_weights" --> one
trial at 99, another at 102,...?
Thanks for all incoming advices and answers.
Without looking too much closer at the moment, I think it's doing
integer division.  Try
Y = Y.astype(float)
TOTAL = TOTAL.astype(float)
glm = sm.GLM(Y,X, family=sm.families.Binomial())
res = glm.fit(data_weights=TOTAL)
Also note that sm.family is now sm.families in newer versions of statsmodels.
good guess about integer, checked with expanding cases to obs
I fixed it in my branch and added a tickethttps://bugs.launchpad.net/statsmodels/+bug/675631
Sol, If you want to fix it in your copy of statsmodels, it's just
adding a `1.` in family/family.py Binomial initialize
so that it is
return y*1./self.n
instead of
return y/self.n
Josef
Post by j***@public.gmane.org
--------------------
mod = sm.GLM(Y.astype(float), X, family=sm.families.Binomial())
print mod.fit().params
print mod.fit(data_weights=TOTAL.astype(float)).params
import numpy as np
   '''expand binomial data given by distinct cases to observations
   Returns
   -------
   endog : ndarray, 1d
       data array with 1 for success and 0 for failure
   '''
   y = np.asarray(y)
   assert y.ndim == 2
   ncases = y.shape[0]
   nobs = y.sum(0)
   ind0 = np.repeat(np.arange(ncases), y[:,0].astype(int))
   ind1 = np.repeat(np.arange(ncases), y[:,1].astype(int))
   failobs = np.column_stack((np.ones(nobs[0]), X[ind0,:]))
   successobs = np.column_stack((np.zeros(nobs[1]), X[ind1,:]))
   data = np.r_[successobs, failobs]
   return data[:,0], data[:,1:]
yobs, xobs = convert_case2obs(Y, X)
print sm.GLM(yobs, xobs, family=sm.families.Binomial()).fit().params
print sm.Logit(yobs, xobs).fit().params
print '\nwith constant'
mod = sm.GLM(Y.astype(float), sm.add_constant(X), family=sm.families.Binomial())
print mod.fit().params
print mod.fit(data_weights=TOTAL.astype(float)).params
print sm.GLM(yobs, sm.add_constant(xobs),
family=sm.families.Binomial()).fit().params
print sm.Logit(yobs, sm.add_constant(xobs)).fit().params
--------------------
results, 2 versions of binomial agree and also the same as discretemod.logit
[  1.25930498e-03  -1.80115251e+00]
[  1.25930498e-03  -1.80115251e+00]
[  1.25930498e-03  -1.80115251e+00]
Optimization terminated successfully.
        Current function value: 619.589828
        Iterations 5
[  1.25930498e-03  -1.80115251e+00]
with constant
[  2.17177355e-03   2.92779632e-01  -2.65329810e+00]
[  2.17177355e-03   2.92779632e-01  -2.65329810e+00]
[  2.17177355e-03   2.92779632e-01  -2.65329810e+00]
Optimization terminated successfully.
        Current function value: 539.318195
        Iterations 5
[  2.17177355e-03   2.92779632e-01  -2.65329810e+00]
Josef
Post by Skipper Seabold
Skipper
Skipper Seabold
2010-11-16 14:16:29 UTC
Permalink
Post by Sol
Thanks Skipper and Josef for your detailed answers.
how to get results (as summary function in R) -->
estimates, standard errors, Z_value, Pr(>z) for intercept and
associated explanatory variables (I have both discrete and
continuous).
AIC ou likelihood
...
The results will either be attached to the res object that is returned
by fit. Ie., res.params are coefficients, res.bse are standard
errors, res.t() *should* actually return Z values for the GLM's (I'll
have to check), etc. Check the docstrings for
sm.glm.GLMResults.__doc__
or see here: <http://statsmodels.sourceforge.net/generated.scikits.statsmodels.glm.GLMResults.html#scikits.statsmodels.glm.GLMResults>

Or there is a summary method attached to the results object (not sure
if this is done yet for GLM, actually).

Skipper
Sol
2010-11-16 15:51:37 UTC
Permalink
Sorry again Skipper or Josef,

so GLM performs by default *without intercept*, can I get an
intercept?

After our discussions about, I have writen a "tutorial", step-by-step
on how to use Statsmodels (and its corresponding R code) to help next
students to perform GLM. Are you interested to publish it on the
"statsmodels documentation"? Can I send you by email (codes and step-
by-step explanations in an odt file)?

Re-again thanks for your advices ;o)
Best regards

sol
Post by Skipper Seabold
Post by Sol
Thanks Skipper and Josef for your detailed answers.
how to get results (as summary function in R) -->
estimates, standard errors, Z_value, Pr(>z) for intercept and
associated explanatory variables (I have both discrete and
continuous).
AIC ou likelihood
...
The results will either be attached to the res object that is returned
by fit.  Ie., res.params are coefficients, res.bse are standard
errors, res.t() *should* actually return Z values for the GLM's (I'll
have to check), etc.  Check the docstrings for
sm.glm.GLMResults.__doc__
or see here: <http://statsmodels.sourceforge.net/generated.scikits.statsmodels.glm....>
Or there is a summary method attached to the results object (not sure
if this is done yet for GLM, actually).
Skipper
Skipper Seabold
2010-11-16 15:57:52 UTC
Permalink
Post by Sol
Sorry again Skipper or Josef,
so GLM performs by default *without intercept*, can I get an
intercept?
You have to explicitly add the constant to the data.

x = sm.add_constant(x, prepend=True)
Post by Sol
After our discussions about, I have writen a "tutorial", step-by-step
on how to use Statsmodels (and its corresponding R code) to help next
students to perform GLM. Are you interested to publish it on the
"statsmodels documentation"? Can I send you by email (codes and step-
by-step explanations in an odt file)?
Having a statsmodels for R users might be pretty helpful. An .rst
file might be easier to integrate more quickly. You can see some
examples in our /scikits/statsmodels/docs (still want to move /docs/
up to the main directory similar to numpy/scipy/scikits-learn). There
is also a numpy ReST tutorial, but I can't find it at the moment.

There is a similar effort on this blog.

http://ascratchpad.blogspot.com/2010/09/glm-and-python-scikitsstatsmodels.html
Post by Sol
Re-again thanks for your advices ;o)
Best regards
sol
Post by Skipper Seabold
Post by Sol
Thanks Skipper and Josef for your detailed answers.
how to get results (as summary function in R) -->
estimates, standard errors, Z_value, Pr(>z) for intercept and
associated explanatory variables (I have both discrete and
continuous).
AIC ou likelihood
...
The results will either be attached to the res object that is returned
by fit.  Ie., res.params are coefficients, res.bse are standard
errors, res.t() *should* actually return Z values for the GLM's (I'll
have to check), etc.  Check the docstrings for
sm.glm.GLMResults.__doc__
or see here: <http://statsmodels.sourceforge.net/generated.scikits.statsmodels.glm....>
Or there is a summary method attached to the results object (not sure
if this is done yet for GLM, actually).
Skipper
j***@public.gmane.org
2010-11-16 16:00:43 UTC
Permalink
Post by Sol
Sorry again Skipper or Josef,
so GLM performs by default *without intercept*, can I get an
intercept?
we have a utility function sm.add_constant as in the example I showed before

mod = sm.GLM(Y.astype(float), sm.add_constant(X), family=sm.families.Binomial())

It has a prepend option to add the constant as the first column
instead of at the end (default)
Post by Sol
After our discussions about, I have writen a "tutorial", step-by-step
on how to use Statsmodels (and its corresponding R code) to help next
students to perform GLM. Are you interested to publish it on the
"statsmodels documentation"? Can I send you by email (codes and step-
by-step explanations in an odt file)?
Thank you, additional tutorials are always very helpful
If you want you can send it to the list, or you can send it to
Skippers or my private email.

We could either add it as a file to the documentation or convert it to
restructured text so it can be directly included in the documentation.
Post by Sol
Re-again thanks for your advices ;o)
Keep asking and tell as if there are things unclear, or are buggy or look buggy.

Cheers,

Josef
Post by Sol
Best regards
sol
Post by Skipper Seabold
Post by Sol
Thanks Skipper and Josef for your detailed answers.
how to get results (as summary function in R) -->
estimates, standard errors, Z_value, Pr(>z) for intercept and
associated explanatory variables (I have both discrete and
continuous).
AIC ou likelihood
...
The results will either be attached to the res object that is returned
by fit.  Ie., res.params are coefficients, res.bse are standard
errors, res.t() *should* actually return Z values for the GLM's (I'll
have to check), etc.  Check the docstrings for
sm.glm.GLMResults.__doc__
or see here: <http://statsmodels.sourceforge.net/generated.scikits.statsmodels.glm....>
Or there is a summary method attached to the results object (not sure
if this is done yet for GLM, actually).
Skipper
Sol
2010-11-16 16:07:42 UTC
Permalink
Wow you are impressive! ^^
Thankssss!

I try to converte and finish my R/statsmodels tutorial in .rsf for
tomorrow.

Best regards
sol
Post by Sol
Sorry again Skipper or Josef,
so GLM performs by default *without intercept*, can I get an
intercept?
we have a utility function sm.add_constant  as in the example I showed before
mod = sm.GLM(Y.astype(float), sm.add_constant(X), family=sm.families.Binomial())
It has a prepend option to add the constant as the first column
instead of at the end (default)
Post by Sol
After our discussions about, I have writen a "tutorial", step-by-step
on how to use Statsmodels (and its corresponding R code) to help next
students to perform GLM. Are you interested to publish it on the
"statsmodels documentation"? Can I send you by email (codes and step-
by-step explanations in an odt file)?
Thank you, additional tutorials are always very helpful
If you want you can send it to the list, or you can send it to
Skippers or my private email.
We could either add it as a file to the documentation or convert it to
restructured text so it can be directly included in the documentation.
Post by Sol
Re-again thanks for your advices ;o)
Keep asking and tell as if there are things unclear, or are buggy or look buggy.
Cheers,
Josef
Post by Sol
Best regards
sol
Post by Skipper Seabold
Post by Sol
Thanks Skipper and Josef for your detailed answers.
how to get results (as summary function in R) -->
estimates, standard errors, Z_value, Pr(>z) for intercept and
associated explanatory variables (I have both discrete and
continuous).
AIC ou likelihood
...
The results will either be attached to the res object that is returned
by fit.  Ie., res.params are coefficients, res.bse are standard
errors, res.t() *should* actually return Z values for the GLM's (I'll
have to check), etc.  Check the docstrings for
sm.glm.GLMResults.__doc__
or see here: <http://statsmodels.sourceforge.net/generated.scikits.statsmodels.glm....>
Or there is a summary method attached to the results object (not sure
if this is done yet for GLM, actually).
Skipper
Sol
2010-11-22 10:29:48 UTC
Permalink
Hello,
another question:
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN
Traceback (most recent call last):
File "C:\Users\Solenn\Desktop\2010 octobre AUDE\ramdomcline 0.1.py",
line 153, in <module>
res=glm.fit(data_weights = nbtot)
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-
py2.6.egg\scikits\statsmodels\glm.py", line 378, in fit
returned a nan. This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.


My glm is
glm = sm.GLM(y.astype(float), sm.add_constant(XYP.astype(float)),
family=sm.family.Binomial())
res=glm.fit(data_weights = nbtot)

and my data are for example
y = [[1.0, 15.0], [1.0, 18.0], [1.0, 19.0], [2.0, 20.0], [3.0, 21.0],
[2.0, 22.0], [1.0, 25.0], [2.0, 27.0], [1.0, 30.0], [1.0, 34.0], [1.0,
34.0], [0.0, 36.0], [2.0, 34.0], [2.0, 34.0], [1.0, 37.0], [3.0,
36.0], [3.0, 36.0], [3.0, 39.0], [0.0, 45.0], [3.0, 43.0], [5.0,
43.0], [1.0, 48.0], [1.0, 48.0], [4.0, 48.0], [0.0, 52.0], [1.0,
51.0], [2.0, 50.0], [1.0, 52.0], [1.0, 54.0], [0.0, 55.0], [6.0,
63.0], [1.0, 75.0]]
--------
X = [[876748.973], [720567.16299999994], [320150.36310000002],
[583218.3175], [408575.89500000002], [472261.23489999998],
[824823.13659999997], [538344.76379999996], [842367.61600000004],
[338325.4326], [295414.4596], [367216.44319999998],
[383680.80109999998], [749160.96149999998], [832882.53350000002],
[414151.92540000001], [714448.80689999997], [379632.01740000001],
[833843.3578], [809404.05850000004], [655167.06129999994],
[525426.95799999998], [752852.99410000001], [307259.39569999999],
[695146.15410000004], [764498.94220000005], [870005.8885],
[287810.36540000001], [388498.35110000003], [819041.00199999998],
[334226.90710000001], [573148.73690000002]]
--------
Y = [[2403908.9360000002], [2438478.3020000001], [2331190.0350000001],
[2369150.0660000001], [2334425.2570000002], [2353680.696],
[2456066.3250000002], [2366416.9180000001], [2460245.0040000002],
[2354388.6329999999], [2375419.429], [2336460.1949999998],
[2105292.3700000001], [2476741.588], [2398922.7119999998],
[2311211.449], [2474548.6600000001], [2358930.898],
[2453682.1030000001], [2463081.9350000001], [2447596.0210000002],
[2362995.2039999999], [2503244.5490000001], [2284394.2790000001],
[2466206.2760000001], [2416882.2779999999], [2420118.1549999998],
[2351559.3870000001], [2136919.9670000002], [2414933.3399999999],
[2208447.912], [2379643.2480000001]]
--------
PERIODE = [[0], [0], [0], [0], [1], [1], [0], [0], [1], [1], [0], [0],
[0], [0], [0], [0], [0], [1], [0], [1], [1], [1], [1], [1], [1], [1],
[1], [1], [1], [1], [1], [1]]
--------
nbtot = [16, 19, 20, 22, 24, 24, 26, 29, 31, 35, 35, 36, 36, 36, 38,
39, 39, 42, 45, 46, 48, 49, 49, 52, 52, 52, 52, 53, 55, 55, 69, 76]

with
XY = numpy.concatenate((X, Y), axis=1)
XYP = numpy.concatenate((XY, PERIODE), axis=1)
XYP.astype(float)


After reading glm.py and family.py I don't know from where it comes.

In R, it gives me:
Call:
glm(formula = (y/(tot - y)) ~ X + Y + P, family = binomial(),
weights = (tot))

Deviance Residuals:
Min 1Q Median 3Q Max
-2.11865 -0.93983 0.02340 0.67569 2.20055

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.239e+00 4.066e+00 -0.551 0.582
X -1.039e-06 9.162e-07 -1.134 0.257
Y -5.255e-08 1.862e-06 -0.028 0.977
P -1.230e-01 2.771e-01 -0.444 0.657

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 44.098 on 31 degrees of freedom
Residual deviance: 41.346 on 28 degrees of freedom
AIC: 116.04

Number of Fisher Scoring iterations: 5

Any idea please?

sol
Post by Sol
Wow you are impressive! ^^
Thankssss!
I try to converte and finish my R/statsmodels tutorial in .rsf for
tomorrow.
Best regards
sol
Post by Sol
Sorry again Skipper or Josef,
so GLM performs by default *without intercept*, can I get an
intercept?
we have a utility function sm.add_constant  as in the example I showed before
mod = sm.GLM(Y.astype(float), sm.add_constant(X), family=sm.families.Binomial())
It has a prepend option to add the constant as the first column
instead of at the end (default)
Post by Sol
After our discussions about, I have writen a "tutorial", step-by-step
on how to use Statsmodels (and its corresponding R code) to help next
students to perform GLM. Are you interested to publish it on the
"statsmodels documentation"? Can I send you by email (codes and step-
by-step explanations in an odt file)?
Thank you, additional tutorials are always very helpful
If you want you can send it to the list, or you can send it to
Skippers or my private email.
We could either add it as a file to the documentation or convert it to
restructured text so it can be directly included in the documentation.
Post by Sol
Re-again thanks for your advices ;o)
Keep asking and tell as if there are things unclear, or are buggy or look buggy.
Cheers,
Josef
Post by Sol
Best regards
sol
Post by Skipper Seabold
Post by Sol
Thanks Skipper and Josef for your detailed answers.
how to get results (as summary function in R) -->
estimates, standard errors, Z_value, Pr(>z) for intercept and
associated explanatory variables (I have both discrete and
continuous).
AIC ou likelihood
...
The results will either be attached to the res object that is returned
by fit.  Ie., res.params are coefficients, res.bse are standard
errors, res.t() *should* actually return Z values for the GLM's (I'll
have to check), etc.  Check the docstrings for
sm.glm.GLMResults.__doc__
or see here: <http://statsmodels.sourceforge.net/generated.scikits.statsmodels.glm....>
Or there is a summary method attached to the results object (not sure
if this is done yet for GLM, actually).
Skipper
Sol
2010-11-22 10:44:21 UTC
Permalink
Hello,
another question:
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN
Traceback (most recent call last):
File "C:\Users\...",
line 153, in <module>
res=glm.fit(data_weights = nbtot)
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-
py2.6.egg\scikits\statsmodels\glm.py", line 378, in fit
returned a nan. This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.

My glm is
glm = sm.GLM(y.astype(float), sm.add_constant(XYP.astype(float)),
family=sm.family.Binomial())
res=glm.fit(data_weights = nbtot)

and my data are for example
y = [[1.0, 15.0], [1.0, 18.0], [1.0, 19.0], [2.0, 20.0], [3.0, 21.0],
[2.0, 22.0], [1.0, 25.0], [2.0, 27.0], [1.0, 30.0], [1.0, 34.0], [1.0,
34.0], [0.0, 36.0], [2.0, 34.0], [2.0, 34.0], [1.0, 37.0], [3.0,
36.0], [3.0, 36.0], [3.0, 39.0], [0.0, 45.0], [3.0, 43.0], [5.0,
43.0], [1.0, 48.0], [1.0, 48.0], [4.0, 48.0], [0.0, 52.0], [1.0,
51.0], [2.0, 50.0], [1.0, 52.0], [1.0, 54.0], [0.0, 55.0], [6.0,
63.0], [1.0, 75.0]]
--------
X = [[876748.973], [720567.16299999994], [320150.36310000002],
[583218.3175], [408575.89500000002], [472261.23489999998],
[824823.13659999997], [538344.76379999996], [842367.61600000004],
[338325.4326], [295414.4596], [367216.44319999998],
[383680.80109999998], [749160.96149999998], [832882.53350000002],
[414151.92540000001], [714448.80689999997], [379632.01740000001],
[833843.3578], [809404.05850000004], [655167.06129999994],
[525426.95799999998], [752852.99410000001], [307259.39569999999],
[695146.15410000004], [764498.94220000005], [870005.8885],
[287810.36540000001], [388498.35110000003], [819041.00199999998],
[334226.90710000001], [573148.73690000002]]
--------
Y = [[2403908.9360000002], [2438478.3020000001], [2331190.0350000001],
[2369150.0660000001], [2334425.2570000002], [2353680.696],
[2456066.3250000002], [2366416.9180000001], [2460245.0040000002],
[2354388.6329999999], [2375419.429], [2336460.1949999998],
[2105292.3700000001], [2476741.588], [2398922.7119999998],
[2311211.449], [2474548.6600000001], [2358930.898],
[2453682.1030000001], [2463081.9350000001], [2447596.0210000002],
[2362995.2039999999], [2503244.5490000001], [2284394.2790000001],
[2466206.2760000001], [2416882.2779999999], [2420118.1549999998],
[2351559.3870000001], [2136919.9670000002], [2414933.3399999999],
[2208447.912], [2379643.2480000001]]
--------
PERIODE = [[0], [0], [0], [0], [1], [1], [0], [0], [1], [1], [0], [0],
[0], [0], [0], [0], [0], [1], [0], [1], [1], [1], [1], [1], [1], [1],
[1], [1], [1], [1], [1], [1]]
--------
nbtot = [16, 19, 20, 22, 24, 24, 26, 29, 31, 35, 35, 36, 36, 36, 38,
39, 39, 42, 45, 46, 48, 49, 49, 52, 52, 52, 52, 53, 55, 55, 69, 76]

with
XY = numpy.concatenate((X, Y), axis=1)
XYP = numpy.concatenate((XY, PERIODE), axis=1)
XYP.astype(float)

After reading glm.py and family.py I don't know from where it comes.

In R, it gives me:
Call:
glm(formula = (y/(tot - y)) ~ X + Y + P, family = binomial(),
weights = (tot))

Deviance Residuals:
Min 1Q Median 3Q Max
-2.11865 -0.93983 0.02340 0.67569 2.20055

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.239e+00 4.066e+00 -0.551 0.582
X -1.039e-06 9.162e-07 -1.134 0.257
Y -5.255e-08 1.862e-06 -0.028 0.977
P -1.230e-01 2.771e-01 -0.444 0.657

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 44.098 on 31 degrees of freedom
Residual deviance: 41.346 on 28 degrees of freedom
AIC: 116.04

Number of Fisher Scoring iterations: 5

Any idea please?

sol
Post by Sol
Wow you are impressive! ^^
Thankssss!
I try to converte and finish my R/statsmodels tutorial in .rsf for
tomorrow.
Best regards
sol
Post by Sol
Sorry again Skipper or Josef,
so GLM performs by default *without intercept*, can I get an
intercept?
we have a utility function sm.add_constant  as in the example I showed before
mod = sm.GLM(Y.astype(float), sm.add_constant(X), family=sm.families.Binomial())
It has a prepend option to add the constant as the first column
instead of at the end (default)
Post by Sol
After our discussions about, I have writen a "tutorial", step-by-step
on how to use Statsmodels (and its corresponding R code) to help next
students to perform GLM. Are you interested to publish it on the
"statsmodels documentation"? Can I send you by email (codes and step-
by-step explanations in an odt file)?
Thank you, additional tutorials are always very helpful
If you want you can send it to the list, or you can send it to
Skippers or my private email.
We could either add it as a file to the documentation or convert it to
restructured text so it can be directly included in the documentation.
Post by Sol
Re-again thanks for your advices ;o)
Keep asking and tell as if there are things unclear, or are buggy or look buggy.
Cheers,
Josef
Post by Sol
Best regards
sol
Post by Skipper Seabold
Post by Sol
Thanks Skipper and Josef for your detailed answers.
how to get results (as summary function in R) -->
estimates, standard errors, Z_value, Pr(>z) for intercept and
associated explanatory variables (I have both discrete and
continuous).
AIC ou likelihood
...
The results will either be attached to the res object that is returned
by fit.  Ie., res.params are coefficients, res.bse are standard
errors, res.t() *should* actually return Z values for the GLM's (I'll
have to check), etc.  Check the docstrings for
sm.glm.GLMResults.__doc__
or see here: <http://statsmodels.sourceforge.net/generated.scikits.statsmodels.glm....>
Or there is a summary method attached to the results object (not sure
if this is done yet for GLM, actually).
Skipper
j***@public.gmane.org
2010-11-22 13:32:54 UTC
Permalink
Post by Sol
Hello,
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN
my guess is that this time it's the 0*log(0) is nan error, adding a
small value to y, and comparing with the other versions:
(familiess.family.Binomial.loglike)

y = np.asarray(y,float)

glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float), prepend=True),
family=sm.families.Binomial())
res=glm.fit(data_weights = nbtot)

print res.params

yobs, xobs = convert_case2obs(y, XYP)

res2 = sm.GLM(yobs, sm.add_constant(xobs, prepend=True),
family=sm.families.Binomial()).fit()
print res2.params
resl = sm.Logit(yobs, sm.add_constant(xobs, prepend=True)).fit()
print resl.params

[ -2.24652828e+00 -9.87710451e-07 -9.08729819e-08 -1.31120473e-01]
[ -2.24652828e+00 -9.87710451e-07 -9.08729819e-08 -1.31120473e-01]
Optimization terminated successfully.
Current function value: 229.672697
Iterations 5
[ -2.24656656e+00 -9.87635018e-07 -9.08699857e-08 -1.31112889e-01]

The values differ a bit from R but they are at the default numerical
precision for the optimizers, and
none of the parameters is significant.
adding tol=1e-10 in glm fit didn't change the results.

Is there a difference in how R sets the convergence criteria, which
might be relevant in this case?
I tend to belief our numbers since Logit (direct implementation of MLE
with analytical Gradient and Hessian) returns the same as glm, but
that one also works most likely at the convergence tolerance levels.

Thank you for reporting these problems, 0*log0 shows up at several
places including scipy and stackoverflow, and we might have to find a
more general solution.

I'm not sure, about the statistically insignificant numerical
differences to R. I tried also the glm fit with demeaned XYP, but the
results don't change much and still differ from R.

I will open a ticket for the Binomial loglike.

Josef
Post by Sol
 File "C:\Users\...",
line 153, in <module>
   res=glm.fit(data_weights = nbtot)
 File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-
py2.6.egg\scikits\statsmodels\glm.py", line 378, in fit
   returned a nan.  This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.
My glm is
glm = sm.GLM(y.astype(float), sm.add_constant(XYP.astype(float)),
family=sm.family.Binomial())
res=glm.fit(data_weights = nbtot)
and my data are for example
y = [[1.0, 15.0], [1.0, 18.0], [1.0, 19.0], [2.0, 20.0], [3.0, 21.0],
[2.0, 22.0], [1.0, 25.0], [2.0, 27.0], [1.0, 30.0], [1.0, 34.0], [1.0,
34.0], [0.0, 36.0], [2.0, 34.0], [2.0, 34.0], [1.0, 37.0], [3.0,
36.0], [3.0, 36.0], [3.0, 39.0], [0.0, 45.0], [3.0, 43.0], [5.0,
43.0], [1.0, 48.0], [1.0, 48.0], [4.0, 48.0], [0.0, 52.0], [1.0,
51.0], [2.0, 50.0], [1.0, 52.0], [1.0, 54.0], [0.0, 55.0], [6.0,
63.0], [1.0, 75.0]]
--------
X = [[876748.973], [720567.16299999994], [320150.36310000002],
[583218.3175], [408575.89500000002], [472261.23489999998],
[824823.13659999997], [538344.76379999996], [842367.61600000004],
[338325.4326], [295414.4596], [367216.44319999998],
[383680.80109999998], [749160.96149999998], [832882.53350000002],
[414151.92540000001], [714448.80689999997], [379632.01740000001],
[833843.3578], [809404.05850000004], [655167.06129999994],
[525426.95799999998], [752852.99410000001], [307259.39569999999],
[695146.15410000004], [764498.94220000005], [870005.8885],
[287810.36540000001], [388498.35110000003], [819041.00199999998],
[334226.90710000001], [573148.73690000002]]
--------
Y = [[2403908.9360000002], [2438478.3020000001], [2331190.0350000001],
[2369150.0660000001], [2334425.2570000002], [2353680.696],
[2456066.3250000002], [2366416.9180000001], [2460245.0040000002],
[2354388.6329999999], [2375419.429], [2336460.1949999998],
[2105292.3700000001], [2476741.588], [2398922.7119999998],
[2311211.449], [2474548.6600000001], [2358930.898],
[2453682.1030000001], [2463081.9350000001], [2447596.0210000002],
[2362995.2039999999], [2503244.5490000001], [2284394.2790000001],
[2466206.2760000001], [2416882.2779999999], [2420118.1549999998],
[2351559.3870000001], [2136919.9670000002], [2414933.3399999999],
[2208447.912], [2379643.2480000001]]
--------
PERIODE = [[0], [0], [0], [0], [1], [1], [0], [0], [1], [1], [0], [0],
[0], [0], [0], [0], [0], [1], [0], [1], [1], [1], [1], [1], [1], [1],
[1], [1], [1], [1], [1], [1]]
--------
nbtot = [16, 19, 20, 22, 24, 24, 26, 29, 31, 35, 35, 36, 36, 36, 38,
39, 39, 42, 45, 46, 48, 49, 49, 52, 52, 52, 52, 53, 55, 55, 69, 76]
with
XY = numpy.concatenate((X, Y), axis=1)
XYP = numpy.concatenate((XY, PERIODE), axis=1)
XYP.astype(float)
After reading glm.py and family.py I don't know from where it comes.
glm(formula = (y/(tot - y)) ~ X + Y + P, family = binomial(),
   weights = (tot))
    Min        1Q    Median        3Q       Max
-2.11865  -0.93983   0.02340   0.67569   2.20055
             Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.239e+00  4.066e+00  -0.551    0.582
X           -1.039e-06  9.162e-07  -1.134    0.257
Y           -5.255e-08  1.862e-06  -0.028    0.977
P           -1.230e-01  2.771e-01  -0.444    0.657
(Dispersion parameter for binomial family taken to be 1)
   Null deviance: 44.098  on 31  degrees of freedom
Residual deviance: 41.346  on 28  degrees of freedom
AIC: 116.04
Number of Fisher Scoring iterations: 5
Any idea please?
sol
Post by Sol
Wow you are impressive! ^^
Thankssss!
I try to converte and finish my R/statsmodels tutorial in .rsf for
tomorrow.
Best regards
sol
Post by Sol
Sorry again Skipper or Josef,
so GLM performs by default *without intercept*, can I get an
intercept?
we have a utility function sm.add_constant  as in the example I showed before
mod = sm.GLM(Y.astype(float), sm.add_constant(X), family=sm.families.Binomial())
It has a prepend option to add the constant as the first column
instead of at the end (default)
Post by Sol
After our discussions about, I have writen a "tutorial", step-by-step
on how to use Statsmodels (and its corresponding R code) to help next
students to perform GLM. Are you interested to publish it on the
"statsmodels documentation"? Can I send you by email (codes and step-
by-step explanations in an odt file)?
Thank you, additional tutorials are always very helpful
If you want you can send it to the list, or you can send it to
Skippers or my private email.
We could either add it as a file to the documentation or convert it to
restructured text so it can be directly included in the documentation.
Post by Sol
Re-again thanks for your advices ;o)
Keep asking and tell as if there are things unclear, or are buggy or look buggy.
Cheers,
Josef
Post by Sol
Best regards
sol
Post by Skipper Seabold
Post by Sol
Thanks Skipper and Josef for your detailed answers.
how to get results (as summary function in R) -->
estimates, standard errors, Z_value, Pr(>z) for intercept and
associated explanatory variables (I have both discrete and
continuous).
AIC ou likelihood
...
The results will either be attached to the res object that is returned
by fit.  Ie., res.params are coefficients, res.bse are standard
errors, res.t() *should* actually return Z values for the GLM's (I'll
have to check), etc.  Check the docstrings for
sm.glm.GLMResults.__doc__
or see here: <http://statsmodels.sourceforge.net/generated.scikits.statsmodels.glm....>
Or there is a summary method attached to the results object (not sure
if this is done yet for GLM, actually).
Skipper
j***@public.gmane.org
2010-11-22 14:46:01 UTC
Permalink
I forgot to mention: if you add , prepend=True in add_constant, then
you get the constant as the first column as in R, e.g.

sm.add_constant(XYP, prepend=True)

Josef
Skipper Seabold
2010-11-22 14:50:46 UTC
Permalink
Post by j***@public.gmane.org
Post by Sol
Hello,
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN
my guess is that this time it's the 0*log(0) is nan error, adding a
(familiess.family.Binomial.loglike)
y = np.asarray(y,float)
glm = sm.GLM(y.astype(float)+1e-200,
            sm.add_constant(XYP.astype(float), prepend=True),
            family=sm.families.Binomial())
res=glm.fit(data_weights = nbtot)
print res.params
yobs, xobs = convert_case2obs(y, XYP)
res2 = sm.GLM(yobs, sm.add_constant(xobs, prepend=True),
family=sm.families.Binomial()).fit()
print res2.params
resl = sm.Logit(yobs, sm.add_constant(xobs, prepend=True)).fit()
print resl.params
[ -2.24652828e+00  -9.87710451e-07  -9.08729819e-08  -1.31120473e-01]
[ -2.24652828e+00  -9.87710451e-07  -9.08729819e-08  -1.31120473e-01]
Optimization terminated successfully.
        Current function value: 229.672697
        Iterations 5
[ -2.24656656e+00  -9.87635018e-07  -9.08699857e-08  -1.31112889e-01]
The values differ a bit from R but they are at the default numerical
precision for the optimizers, and
none of the parameters is significant.
adding tol=1e-10 in glm fit didn't change the results.
Is there a difference in how R sets the convergence criteria, which
might be relevant in this case?
I tend to belief our numbers since Logit (direct implementation of MLE
with analytical Gradient and Hessian) returns the same as glm, but
that one also works most likely at the convergence tolerance levels.
Thank you for reporting these problems, 0*log0 shows up at several
places including scipy and stackoverflow, and we might have to find a
more general solution.
I'm not sure, about the statistically insignificant numerical
differences to R. I tried also the glm fit with demeaned XYP, but the
results don't change much and still differ from R.
I want to say that R does scaling by default during optimization, but
I'm not sure.
Post by j***@public.gmane.org
I will open a ticket for the Binomial loglike.
Josef
Yeah, that looks like it. Thanks, I will have a look.

Skipper
Sol
2010-11-24 15:28:04 UTC
Permalink
hello,
again me :oS
Your proposition worked well for all cases till I have a complete
successfull experiment (see above line 2 --> 19 successes for 0
failures). It returned me the same error message about Nan value.
[ 14. 2.]
[ 19. 0.]
[ 16. 4.]
[ 11. 11.]
[ 13. 11.]
[ 17. 7.]
[ 24. 2.]
[ 28. 1.]
[ 18. 13.]
[ 9. 26.]
[ 20. 15.]
[ 20. 16.]
[ 19. 17.]
[ 12. 24.]
[ 16. 22.]
[ 14. 25.]
[ 12. 27.]
[ 20. 22.]
[ 12. 33.]
[ 11. 35.]
[ 16. 32.]
[ 13. 36.]
[ 19. 30.]
[ 14. 38.]
[ 12. 40.]
[ 21. 31.]
[ 19. 33.]
[ 22. 31.]
[ 17. 38.]
[ 14. 41.]
[ 17. 52.]
[ 29. 47.]

code was
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float)), family=sm.family.Binomial())
res=glm.fit(data_weights = nbtot)

So, adding +1e-200 may cause the error here
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-py2.6.egg
\scikits\statsmodels\glm.py", line 378, in fit
if np.isnan(dev):
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.

Do I have to inser a condition about presence/absence of a Nan (so
cheking for complete failure ou complete success within an experiment)
and to add or soustract +1e-200? Is it correct?*

Thanks again!
sol
Post by Skipper Seabold
Post by j***@public.gmane.org
Post by Sol
Hello,
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN
my guess is that this time it's the 0*log(0) is nan error, adding a
(familiess.family.Binomial.loglike)
y = np.asarray(y,float)
glm = sm.GLM(y.astype(float)+1e-200,
            sm.add_constant(XYP.astype(float), prepend=True),
            family=sm.families.Binomial())
res=glm.fit(data_weights = nbtot)
print res.params
yobs, xobs = convert_case2obs(y, XYP)
res2 = sm.GLM(yobs, sm.add_constant(xobs, prepend=True),
family=sm.families.Binomial()).fit()
print res2.params
resl = sm.Logit(yobs, sm.add_constant(xobs, prepend=True)).fit()
print resl.params
[ -2.24652828e+00  -9.87710451e-07  -9.08729819e-08  -1.31120473e-01]
[ -2.24652828e+00  -9.87710451e-07  -9.08729819e-08  -1.31120473e-01]
Optimization terminated successfully.
        Current function value: 229.672697
        Iterations 5
[ -2.24656656e+00  -9.87635018e-07  -9.08699857e-08  -1.31112889e-01]
The values differ a bit from R but they are at the default numerical
precision for the optimizers, and
none of the parameters is significant.
adding tol=1e-10 in glm fit didn't change the results.
Is there a difference in how R sets the convergence criteria, which
might be relevant in this case?
I tend to belief our numbers since Logit (direct implementation of MLE
with analytical Gradient and Hessian) returns the same as glm, but
that one also works most likely at the convergence tolerance levels.
Thank you for reporting these problems, 0*log0 shows up at several
places including scipy and stackoverflow, and we might have to find a
more general solution.
I'm not sure, about the statistically insignificant numerical
differences to R. I tried also the glm fit with demeaned XYP, but the
results don't change much and still differ from R.
I want to say that R does scaling by default during optimization, but
I'm not sure.
Post by j***@public.gmane.org
I will open a ticket for the Binomial loglike.
Josef
Yeah, that looks like it.  Thanks, I will have a look.
Skipper
Sol
2010-11-24 15:28:13 UTC
Permalink
hello,
again me :oS
Your proposition worked well for all cases till I have a complete
successfull experiment (see above line 2 --> 19 successes for 0
failures). It returned me the same error message about Nan value.
[ 14. 2.]
[ 19. 0.]
[ 16. 4.]
[ 11. 11.]
[ 13. 11.]
[ 17. 7.]
[ 24. 2.]
[ 28. 1.]
[ 18. 13.]
[ 9. 26.]
[ 20. 15.]
[ 20. 16.]
[ 19. 17.]
[ 12. 24.]
[ 16. 22.]
[ 14. 25.]
[ 12. 27.]
[ 20. 22.]
[ 12. 33.]
[ 11. 35.]
[ 16. 32.]
[ 13. 36.]
[ 19. 30.]
[ 14. 38.]
[ 12. 40.]
[ 21. 31.]
[ 19. 33.]
[ 22. 31.]
[ 17. 38.]
[ 14. 41.]
[ 17. 52.]
[ 29. 47.]

code was
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float)), family=sm.family.Binomial())
res=glm.fit(data_weights = nbtot)

So, adding +1e-200 may cause the error here
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-py2.6.egg
\scikits\statsmodels\glm.py", line 378, in fit
if np.isnan(dev):
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.

Do I have to inser a condition about presence/absence of a Nan (so
cheking for complete failure ou complete success within an experiment)
and to add or soustract +1e-200? Is it correct?*

Thanks again!
sol
Post by Skipper Seabold
Post by j***@public.gmane.org
Post by Sol
Hello,
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN
my guess is that this time it's the 0*log(0) is nan error, adding a
(familiess.family.Binomial.loglike)
y = np.asarray(y,float)
glm = sm.GLM(y.astype(float)+1e-200,
            sm.add_constant(XYP.astype(float), prepend=True),
            family=sm.families.Binomial())
res=glm.fit(data_weights = nbtot)
print res.params
yobs, xobs = convert_case2obs(y, XYP)
res2 = sm.GLM(yobs, sm.add_constant(xobs, prepend=True),
family=sm.families.Binomial()).fit()
print res2.params
resl = sm.Logit(yobs, sm.add_constant(xobs, prepend=True)).fit()
print resl.params
[ -2.24652828e+00  -9.87710451e-07  -9.08729819e-08  -1.31120473e-01]
[ -2.24652828e+00  -9.87710451e-07  -9.08729819e-08  -1.31120473e-01]
Optimization terminated successfully.
        Current function value: 229.672697
        Iterations 5
[ -2.24656656e+00  -9.87635018e-07  -9.08699857e-08  -1.31112889e-01]
The values differ a bit from R but they are at the default numerical
precision for the optimizers, and
none of the parameters is significant.
adding tol=1e-10 in glm fit didn't change the results.
Is there a difference in how R sets the convergence criteria, which
might be relevant in this case?
I tend to belief our numbers since Logit (direct implementation of MLE
with analytical Gradient and Hessian) returns the same as glm, but
that one also works most likely at the convergence tolerance levels.
Thank you for reporting these problems, 0*log0 shows up at several
places including scipy and stackoverflow, and we might have to find a
more general solution.
I'm not sure, about the statistically insignificant numerical
differences to R. I tried also the glm fit with demeaned XYP, but the
results don't change much and still differ from R.
I want to say that R does scaling by default during optimization, but
I'm not sure.
Post by j***@public.gmane.org
I will open a ticket for the Binomial loglike.
Josef
Yeah, that looks like it.  Thanks, I will have a look.
Skipper
Sol
2010-11-24 15:28:21 UTC
Permalink
hello,
again me :oS
Your proposition worked well for all cases till I have a complete
successfull experiment (see above line 2 --> 19 successes for 0
failures). It returned me the same error message about Nan value.
[ 14. 2.]
[ 19. 0.]
[ 16. 4.]
[ 11. 11.]
[ 13. 11.]
[ 17. 7.]
[ 24. 2.]
[ 28. 1.]
[ 18. 13.]
[ 9. 26.]
[ 20. 15.]
[ 20. 16.]
[ 19. 17.]
[ 12. 24.]
[ 16. 22.]
[ 14. 25.]
[ 12. 27.]
[ 20. 22.]
[ 12. 33.]
[ 11. 35.]
[ 16. 32.]
[ 13. 36.]
[ 19. 30.]
[ 14. 38.]
[ 12. 40.]
[ 21. 31.]
[ 19. 33.]
[ 22. 31.]
[ 17. 38.]
[ 14. 41.]
[ 17. 52.]
[ 29. 47.]

code was
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float)), family=sm.family.Binomial())
res=glm.fit(data_weights = nbtot)

So, adding +1e-200 may cause the error here
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-py2.6.egg
\scikits\statsmodels\glm.py", line 378, in fit
if np.isnan(dev):
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.

Do I have to inser a condition about presence/absence of a Nan (so
cheking for complete failure ou complete success within an experiment)
and to add or soustract +1e-200? Is it correct?*

Thanks again!
sol
Post by Skipper Seabold
Post by j***@public.gmane.org
Post by Sol
Hello,
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN
my guess is that this time it's the 0*log(0) is nan error, adding a
(familiess.family.Binomial.loglike)
y = np.asarray(y,float)
glm = sm.GLM(y.astype(float)+1e-200,
            sm.add_constant(XYP.astype(float), prepend=True),
            family=sm.families.Binomial())
res=glm.fit(data_weights = nbtot)
print res.params
yobs, xobs = convert_case2obs(y, XYP)
res2 = sm.GLM(yobs, sm.add_constant(xobs, prepend=True),
family=sm.families.Binomial()).fit()
print res2.params
resl = sm.Logit(yobs, sm.add_constant(xobs, prepend=True)).fit()
print resl.params
[ -2.24652828e+00  -9.87710451e-07  -9.08729819e-08  -1.31120473e-01]
[ -2.24652828e+00  -9.87710451e-07  -9.08729819e-08  -1.31120473e-01]
Optimization terminated successfully.
        Current function value: 229.672697
        Iterations 5
[ -2.24656656e+00  -9.87635018e-07  -9.08699857e-08  -1.31112889e-01]
The values differ a bit from R but they are at the default numerical
precision for the optimizers, and
none of the parameters is significant.
adding tol=1e-10 in glm fit didn't change the results.
Is there a difference in how R sets the convergence criteria, which
might be relevant in this case?
I tend to belief our numbers since Logit (direct implementation of MLE
with analytical Gradient and Hessian) returns the same as glm, but
that one also works most likely at the convergence tolerance levels.
Thank you for reporting these problems, 0*log0 shows up at several
places including scipy and stackoverflow, and we might have to find a
more general solution.
I'm not sure, about the statistically insignificant numerical
differences to R. I tried also the glm fit with demeaned XYP, but the
results don't change much and still differ from R.
I want to say that R does scaling by default during optimization, but
I'm not sure.
Post by j***@public.gmane.org
I will open a ticket for the Binomial loglike.
Josef
Yeah, that looks like it.  Thanks, I will have a look.
Skipper
Sol
2010-11-24 15:39:15 UTC
Permalink
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)

...may be a silly try, but just to tell :oS
Post by Sol
hello,
again me :oS
Your proposition worked well for all cases till I have a complete
successfull experiment (see above line 2 --> 19 successes for 0
failures). It returned me the same error message about Nan value.
[ 14.   2.]
 [ 19.   0.]
 [ 16.   4.]
 [ 11.  11.]
 [ 13.  11.]
 [ 17.   7.]
 [ 24.   2.]
 [ 28.   1.]
 [ 18.  13.]
 [  9.  26.]
 [ 20.  15.]
 [ 20.  16.]
 [ 19.  17.]
 [ 12.  24.]
 [ 16.  22.]
 [ 14.  25.]
 [ 12.  27.]
 [ 20.  22.]
 [ 12.  33.]
 [ 11.  35.]
 [ 16.  32.]
 [ 13.  36.]
 [ 19.  30.]
 [ 14.  38.]
 [ 12.  40.]
 [ 21.  31.]
 [ 19.  33.]
 [ 22.  31.]
 [ 17.  38.]
 [ 14.  41.]
 [ 17.  52.]
 [ 29.  47.]
code was
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float)), family=sm.family.Binomial())
    res=glm.fit(data_weights = nbtot)
So, adding +1e-200 may cause the error here
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-py2.6.egg
\scikits\statsmodels\glm.py", line 378, in fit
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.
Do I have to inser a condition about presence/absence of a Nan (so
cheking for complete failure ou complete success within an experiment)
and to add or soustract +1e-200? Is it correct?*
Thanks again!
sol
Post by Skipper Seabold
Post by j***@public.gmane.org
Post by Sol
Hello,
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN
my guess is that this time it's the 0*log(0) is nan error, adding a
(familiess.family.Binomial.loglike)
y = np.asarray(y,float)
glm = sm.GLM(y.astype(float)+1e-200,
            sm.add_constant(XYP.astype(float), prepend=True),
            family=sm.families.Binomial())
res=glm.fit(data_weights = nbtot)
print res.params
yobs, xobs = convert_case2obs(y, XYP)
res2 = sm.GLM(yobs, sm.add_constant(xobs, prepend=True),
family=sm.families.Binomial()).fit()
print res2.params
resl = sm.Logit(yobs, sm.add_constant(xobs, prepend=True)).fit()
print resl.params
[ -2.24652828e+00  -9.87710451e-07  -9.08729819e-08  -1.31120473e-01]
[ -2.24652828e+00  -9.87710451e-07  -9.08729819e-08  -1.31120473e-01]
Optimization terminated successfully.
        Current function value: 229.672697
        Iterations 5
[ -2.24656656e+00  -9.87635018e-07  -9.08699857e-08  -1.31112889e-01]
The values differ a bit from R but they are at the default numerical
precision for the optimizers, and
none of the parameters is significant.
adding tol=1e-10 in glm fit didn't change the results.
Is there a difference in how R sets the convergence criteria, which
might be relevant in this case?
I tend to belief our numbers since Logit (direct implementation of MLE
with analytical Gradient and Hessian) returns the same as glm, but
that one also works most likely at the convergence tolerance levels.
Thank you for reporting these problems, 0*log0 shows up at several
places including scipy and stackoverflow, and we might have to find a
more general solution.
I'm not sure, about the statistically insignificant numerical
differences to R. I tried also the glm fit with demeaned XYP, but the
results don't change much and still differ from R.
I want to say that R does scaling by default during optimization, but
I'm not sure.
Post by j***@public.gmane.org
I will open a ticket for the Binomial loglike.
Josef
Yeah, that looks like it.  Thanks, I will have a look.
Skipper
j***@public.gmane.org
2010-11-24 15:45:36 UTC
Permalink
Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS
From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.

I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?

Josef
Post by Sol
Post by Sol
hello,
again me :oS
Your proposition worked well for all cases till I have a complete
successfull experiment (see above line 2 --> 19 successes for 0
failures). It returned me the same error message about Nan value.
[ 14.   2.]
 [ 19.   0.]
 [ 16.   4.]
 [ 11.  11.]
 [ 13.  11.]
 [ 17.   7.]
 [ 24.   2.]
 [ 28.   1.]
 [ 18.  13.]
 [  9.  26.]
 [ 20.  15.]
 [ 20.  16.]
 [ 19.  17.]
 [ 12.  24.]
 [ 16.  22.]
 [ 14.  25.]
 [ 12.  27.]
 [ 20.  22.]
 [ 12.  33.]
 [ 11.  35.]
 [ 16.  32.]
 [ 13.  36.]
 [ 19.  30.]
 [ 14.  38.]
 [ 12.  40.]
 [ 21.  31.]
 [ 19.  33.]
 [ 22.  31.]
 [ 17.  38.]
 [ 14.  41.]
 [ 17.  52.]
 [ 29.  47.]
code was
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float)), family=sm.family.Binomial())
    res=glm.fit(data_weights = nbtot)
So, adding +1e-200 may cause the error here
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-py2.6.egg
\scikits\statsmodels\glm.py", line 378, in fit
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.
Do I have to inser a condition about presence/absence of a Nan (so
cheking for complete failure ou complete success within an experiment)
and to add or soustract +1e-200? Is it correct?*
Thanks again!
sol
Post by Skipper Seabold
Post by j***@public.gmane.org
Post by Sol
Hello,
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN
my guess is that this time it's the 0*log(0) is nan error, adding a
(familiess.family.Binomial.loglike)
y = np.asarray(y,float)
glm = sm.GLM(y.astype(float)+1e-200,
            sm.add_constant(XYP.astype(float), prepend=True),
            family=sm.families.Binomial())
res=glm.fit(data_weights = nbtot)
print res.params
yobs, xobs = convert_case2obs(y, XYP)
res2 = sm.GLM(yobs, sm.add_constant(xobs, prepend=True),
family=sm.families.Binomial()).fit()
print res2.params
resl = sm.Logit(yobs, sm.add_constant(xobs, prepend=True)).fit()
print resl.params
[ -2.24652828e+00  -9.87710451e-07  -9.08729819e-08  -1.31120473e-01]
[ -2.24652828e+00  -9.87710451e-07  -9.08729819e-08  -1.31120473e-01]
Optimization terminated successfully.
        Current function value: 229.672697
        Iterations 5
[ -2.24656656e+00  -9.87635018e-07  -9.08699857e-08  -1.31112889e-01]
The values differ a bit from R but they are at the default numerical
precision for the optimizers, and
none of the parameters is significant.
adding tol=1e-10 in glm fit didn't change the results.
Is there a difference in how R sets the convergence criteria, which
might be relevant in this case?
I tend to belief our numbers since Logit (direct implementation of MLE
with analytical Gradient and Hessian) returns the same as glm, but
that one also works most likely at the convergence tolerance levels.
Thank you for reporting these problems, 0*log0 shows up at several
places including scipy and stackoverflow, and we might have to find a
more general solution.
I'm not sure, about the statistically insignificant numerical
differences to R. I tried also the glm fit with demeaned XYP, but the
results don't change much and still differ from R.
I want to say that R does scaling by default during optimization, but
I'm not sure.
Post by j***@public.gmane.org
I will open a ticket for the Binomial loglike.
Josef
Yeah, that looks like it.  Thanks, I will have a look.
Skipper
j***@public.gmane.org
2010-11-24 15:48:43 UTC
Permalink
Post by Sol
Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS
From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?
If you copy the output of
Post by Sol
Post by Sol
Post by Sol
y
array([....

then it would be easier to copy your data, and I don't have to parse the list.

Josef
Post by Sol
Josef
Post by Sol
Post by Sol
hello,
again me :oS
Your proposition worked well for all cases till I have a complete
successfull experiment (see above line 2 --> 19 successes for 0
failures). It returned me the same error message about Nan value.
[ 14.   2.]
 [ 19.   0.]
 [ 16.   4.]
 [ 11.  11.]
 [ 13.  11.]
 [ 17.   7.]
 [ 24.   2.]
 [ 28.   1.]
 [ 18.  13.]
 [  9.  26.]
 [ 20.  15.]
 [ 20.  16.]
 [ 19.  17.]
 [ 12.  24.]
 [ 16.  22.]
 [ 14.  25.]
 [ 12.  27.]
 [ 20.  22.]
 [ 12.  33.]
 [ 11.  35.]
 [ 16.  32.]
 [ 13.  36.]
 [ 19.  30.]
 [ 14.  38.]
 [ 12.  40.]
 [ 21.  31.]
 [ 19.  33.]
 [ 22.  31.]
 [ 17.  38.]
 [ 14.  41.]
 [ 17.  52.]
 [ 29.  47.]
code was
glm = sm.GLM(y.astype(float)+1e-200,
sm.add_constant(XYP.astype(float)), family=sm.family.Binomial())
    res=glm.fit(data_weights = nbtot)
So, adding +1e-200 may cause the error here
File "C:\Python26\lib\site-packages\scikits.statsmodels-0.2.0-py2.6.egg
\scikits\statsmodels\glm.py", line 378, in fit
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.
Do I have to inser a condition about presence/absence of a Nan (so
cheking for complete failure ou complete success within an experiment)
and to add or soustract +1e-200? Is it correct?*
Thanks again!
sol
Post by Skipper Seabold
Post by j***@public.gmane.org
Post by Sol
Hello,
I have in some cases, a error message with the code that worked with
my previous example. In fact, I tried to permute observations between
groups in my glm, and in some cases, I get this error message about
NAN
my guess is that this time it's the 0*log(0) is nan error, adding a
(familiess.family.Binomial.loglike)
y = np.asarray(y,float)
glm = sm.GLM(y.astype(float)+1e-200,
            sm.add_constant(XYP.astype(float), prepend=True),
            family=sm.families.Binomial())
res=glm.fit(data_weights = nbtot)
print res.params
yobs, xobs = convert_case2obs(y, XYP)
res2 = sm.GLM(yobs, sm.add_constant(xobs, prepend=True),
family=sm.families.Binomial()).fit()
print res2.params
resl = sm.Logit(yobs, sm.add_constant(xobs, prepend=True)).fit()
print resl.params
[ -2.24652828e+00  -9.87710451e-07  -9.08729819e-08  -1.31120473e-01]
[ -2.24652828e+00  -9.87710451e-07  -9.08729819e-08  -1.31120473e-01]
Optimization terminated successfully.
        Current function value: 229.672697
        Iterations 5
[ -2.24656656e+00  -9.87635018e-07  -9.08699857e-08  -1.31112889e-01]
The values differ a bit from R but they are at the default numerical
precision for the optimizers, and
none of the parameters is significant.
adding tol=1e-10 in glm fit didn't change the results.
Is there a difference in how R sets the convergence criteria, which
might be relevant in this case?
I tend to belief our numbers since Logit (direct implementation of MLE
with analytical Gradient and Hessian) returns the same as glm, but
that one also works most likely at the convergence tolerance levels.
Thank you for reporting these problems, 0*log0 shows up at several
places including scipy and stackoverflow, and we might have to find a
more general solution.
I'm not sure, about the statistically insignificant numerical
differences to R. I tried also the glm fit with demeaned XYP, but the
results don't change much and still differ from R.
I want to say that R does scaling by default during optimization, but
I'm not sure.
Post by j***@public.gmane.org
I will open a ticket for the Binomial loglike.
Josef
Yeah, that looks like it.  Thanks, I will have a look.
Skipper
Skipper Seabold
2010-11-24 15:49:46 UTC
Permalink
Post by Sol
Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS
From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?
Does it work in trunk? This looks like the original boundary problem
in glm where the predicted value is equal to y (I think it does
something like log(1 - y/mu)), but I thought I fixed it for Binomial.
Maybe only for Negative Binomial.

I will have a look when I get to a machine where I can, but the
solution is to just add a clip in the family (I think). You can see
the fixes for the others to get an idea.

Skipper
Sol
2010-11-25 08:24:42 UTC
Permalink
Hello,
I tried yesterday evening different cases and it failed only when i
get 0 failure somewhere

example of y:

array([[ 16., 0.],
[ 11., 8.],
[ 18., 2.],
[ 13., 9.],
[ 19., 5.],
[ 22., 2.],
[ 10., 16.],
[ 10., 19.],
[ 21., 10.],
[ 21., 14.],
[ 16., 19.],
[ 20., 16.],
[ 14., 22.],
[ 21., 15.],
[ 4., 34.],
[ 18., 21.],
[ 23., 16.],
[ 16., 26.],
[ 12., 33.],
[ 16., 30.],
[ 21., 27.],
[ 19., 30.],
[ 17., 32.],
[ 26., 26.],
[ 22., 30.],
[ 21., 31.],
[ 14., 38.],
[ 13., 40.],
[ 11., 44.],
[ 21., 34.],
[ 23., 46.],
[ 9., 67.]])
Post by Sol
Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS
From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?
Does it work in trunk?  This looks like the original boundary problem
in glm where the predicted value is equal to y (I think it does
something like log(1 - y/mu)), but I thought I fixed it for Binomial.
Maybe only for Negative Binomial.
I will have a look when I get to a machine where I can, but the
solution is to just add a clip in the family (I think).  You can see
the fixes for the others to get an idea.
Skipper
Sol
2010-11-25 08:27:46 UTC
Permalink
sorry, yes XYP are always the same. It just changed success / failures
(so y)

thanks for your help.
Post by Sol
Hello,
I tried yesterday evening different cases and it failed only when i
get 0 failure somewhere
array([[ 16.,   0.],
       [ 11.,   8.],
       [ 18.,   2.],
       [ 13.,   9.],
       [ 19.,   5.],
       [ 22.,   2.],
       [ 10.,  16.],
       [ 10.,  19.],
       [ 21.,  10.],
       [ 21.,  14.],
       [ 16.,  19.],
       [ 20.,  16.],
       [ 14.,  22.],
       [ 21.,  15.],
       [  4.,  34.],
       [ 18.,  21.],
       [ 23.,  16.],
       [ 16.,  26.],
       [ 12.,  33.],
       [ 16.,  30.],
       [ 21.,  27.],
       [ 19.,  30.],
       [ 17.,  32.],
       [ 26.,  26.],
       [ 22.,  30.],
       [ 21.,  31.],
       [ 14.,  38.],
       [ 13.,  40.],
       [ 11.,  44.],
       [ 21.,  34.],
       [ 23.,  46.],
       [  9.,  67.]])
Post by Sol
Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS
From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?
Does it work in trunk?  This looks like the original boundary problem
in glm where the predicted value is equal to y (I think it does
something like log(1 - y/mu)), but I thought I fixed it for Binomial.
Maybe only for Negative Binomial.
I will have a look when I get to a machine where I can, but the
solution is to just add a clip in the family (I think).  You can see
the fixes for the others to get an idea.
Skipper
Sol
2010-11-25 08:29:15 UTC
Permalink
sorry, yes XYP are always the same. It just changed success / failures
(so y)

thanks for your help.
Post by Sol
Hello,
I tried yesterday evening different cases and it failed only when i
get 0 failure somewhere
array([[ 16.,   0.],
       [ 11.,   8.],
       [ 18.,   2.],
       [ 13.,   9.],
       [ 19.,   5.],
       [ 22.,   2.],
       [ 10.,  16.],
       [ 10.,  19.],
       [ 21.,  10.],
       [ 21.,  14.],
       [ 16.,  19.],
       [ 20.,  16.],
       [ 14.,  22.],
       [ 21.,  15.],
       [  4.,  34.],
       [ 18.,  21.],
       [ 23.,  16.],
       [ 16.,  26.],
       [ 12.,  33.],
       [ 16.,  30.],
       [ 21.,  27.],
       [ 19.,  30.],
       [ 17.,  32.],
       [ 26.,  26.],
       [ 22.,  30.],
       [ 21.,  31.],
       [ 14.,  38.],
       [ 13.,  40.],
       [ 11.,  44.],
       [ 21.,  34.],
       [ 23.,  46.],
       [  9.,  67.]])
Post by Sol
Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS
From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?
Does it work in trunk?  This looks like the original boundary problem
in glm where the predicted value is equal to y (I think it does
something like log(1 - y/mu)), but I thought I fixed it for Binomial.
Maybe only for Negative Binomial.
I will have a look when I get to a machine where I can, but the
solution is to just add a clip in the family (I think).  You can see
the fixes for the others to get an idea.
Skipper
Sol
2010-11-25 08:34:55 UTC
Permalink
about already fixed errors: I use codes from
scikits.statsmodels-0.2.0.zip
Does it include them?
Post by Sol
sorry, yes XYP are always the same. It just changed success / failures
(so y)
thanks for your help.
Post by Sol
Hello,
I tried yesterday evening different cases and it failed only when i
get 0 failure somewhere
array([[ 16.,   0.],
       [ 11.,   8.],
       [ 18.,   2.],
       [ 13.,   9.],
       [ 19.,   5.],
       [ 22.,   2.],
       [ 10.,  16.],
       [ 10.,  19.],
       [ 21.,  10.],
       [ 21.,  14.],
       [ 16.,  19.],
       [ 20.,  16.],
       [ 14.,  22.],
       [ 21.,  15.],
       [  4.,  34.],
       [ 18.,  21.],
       [ 23.,  16.],
       [ 16.,  26.],
       [ 12.,  33.],
       [ 16.,  30.],
       [ 21.,  27.],
       [ 19.,  30.],
       [ 17.,  32.],
       [ 26.,  26.],
       [ 22.,  30.],
       [ 21.,  31.],
       [ 14.,  38.],
       [ 13.,  40.],
       [ 11.,  44.],
       [ 21.,  34.],
       [ 23.,  46.],
       [  9.,  67.]])
Post by Sol
Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS
From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?
Does it work in trunk?  This looks like the original boundary problem
in glm where the predicted value is equal to y (I think it does
something like log(1 - y/mu)), but I thought I fixed it for Binomial.
Maybe only for Negative Binomial.
I will have a look when I get to a machine where I can, but the
solution is to just add a clip in the family (I think).  You can see
the fixes for the others to get an idea.
Skipper
j***@public.gmane.org
2010-11-25 13:58:23 UTC
Permalink
Post by Sol
about already fixed errors: I use codes from
scikits.statsmodels-0.2.0.zip
Does it include them?
No, there are some bugfixes that we did since we released 0.2.0.
Unfortunately, our current release is already two months behind
schedule.

The best would be if you could switch to 0.3-devel or the trunk
version. Then you wouldn't have to make the few adjustments that are
not backwards compatible, e.g. renaming family to families.

The trunk version might also still miss a few more recent changes. The
0.3.-devel version should be stable for the old parts that were also
present in 0.2.0, but the newer code is still moving quite a bit.

update:

I tried you zero failure example

print '\n case with zero failures'
mod_glm = sm.GLM(yn.astype(float)+1e-14,
sm.add_constant(XYP.astype(float), prepend=True),
family=sm.families.Binomial())

res = mod_glm.fit(data_weights = nbtot)
print res.params

--->

case with zero failures
[ -6.83285117e+00 -1.47701877e-06 3.19973011e-06 -3.81612790e-01]

----------
The zero failure bug is still present in 0.3-devel. Because in this
case the difference is compared to 1, the difference has to be
numerically significant for double precision. The smallest amount that
I had to add to get a working solution is 1e-14.

Essentially, the code for Binomial has a*log(b) at several different
places, and any of them could produce a nan for corner cases with
0*log(0).

For now, you have two options: you stay with 0.2.0 and work around
this corner case bugs by keeping away from zeros, or you switch to
0.3-devel, and we fix the bugs as fast as possible (and we try to keep
0.3.devel failure free). If you have bazaar installed the latter
version would be the most useful for you and for making statsmodels
more robust.

(I'm not sure whether it applies to glm Binomial, but in some cases
complete separation, that is perfect predictability of the outcome, is
a corner case that creates convergence problems and is not handled by
the models themselves.)

Josef
Post by Sol
Post by Sol
sorry, yes XYP are always the same. It just changed success / failures
(so y)
thanks for your help.
Post by Sol
Hello,
I tried yesterday evening different cases and it failed only when i
get 0 failure somewhere
array([[ 16.,   0.],
       [ 11.,   8.],
       [ 18.,   2.],
       [ 13.,   9.],
       [ 19.,   5.],
       [ 22.,   2.],
       [ 10.,  16.],
       [ 10.,  19.],
       [ 21.,  10.],
       [ 21.,  14.],
       [ 16.,  19.],
       [ 20.,  16.],
       [ 14.,  22.],
       [ 21.,  15.],
       [  4.,  34.],
       [ 18.,  21.],
       [ 23.,  16.],
       [ 16.,  26.],
       [ 12.,  33.],
       [ 16.,  30.],
       [ 21.,  27.],
       [ 19.,  30.],
       [ 17.,  32.],
       [ 26.,  26.],
       [ 22.,  30.],
       [ 21.,  31.],
       [ 14.,  38.],
       [ 13.,  40.],
       [ 11.,  44.],
       [ 21.,  34.],
       [ 23.,  46.],
       [  9.,  67.]])
Post by Sol
Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS
From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?
Does it work in trunk?  This looks like the original boundary problem
in glm where the predicted value is equal to y (I think it does
something like log(1 - y/mu)), but I thought I fixed it for Binomial.
Maybe only for Negative Binomial.
I will have a look when I get to a machine where I can, but the
solution is to just add a clip in the family (I think).  You can see
the fixes for the others to get an idea.
Skipper
j***@public.gmane.org
2010-11-25 14:40:01 UTC
Permalink
Post by j***@public.gmane.org
Post by Sol
about already fixed errors: I use codes from
scikits.statsmodels-0.2.0.zip
Does it include them?
No, there are some bugfixes that we did since we released 0.2.0.
Unfortunately, our current release is already two months behind
schedule.
The best would be if you could switch to 0.3-devel or the trunk
version. Then you wouldn't have to make the few adjustments that are
not backwards compatible, e.g. renaming family to families.
The trunk version might also still miss a few more recent changes. The
0.3.-devel version should be stable for the old parts that were also
present in 0.2.0, but the newer code is still moving quite a bit.
I tried you zero failure example
print '\n case with zero failures'
mod_glm = sm.GLM(yn.astype(float)+1e-14,
            sm.add_constant(XYP.astype(float), prepend=True),
            family=sm.families.Binomial())
res = mod_glm.fit(data_weights = nbtot)
print res.params
--->
 case with zero failures
[ -6.83285117e+00  -1.47701877e-06   3.19973011e-06  -3.81612790e-01]
----------
The zero failure bug is still present in 0.3-devel. Because in this
case the difference is compared to 1, the difference has to be
numerically significant for double precision. The smallest amount that
I had to add to get a working solution is 1e-14.
Essentially, the code for Binomial has a*log(b) at several different
places, and any of them could produce a nan for corner cases with
0*log(0).
For now, you have two options: you stay with 0.2.0 and work around
this corner case bugs by keeping away from zeros, or you switch to
0.3-devel, and we fix the bugs as fast as possible (and we try to keep
0.3.devel failure free). If you have bazaar installed the latter
version would be the most useful for you and for making statsmodels
more robust.
I fixed it in my branch and will merge to 0.3-devel later today.

http://bazaar.launchpad.net/~josef-pktd/statsmodels/statsmodels-josef-experimental-gsoc/revision/2194

Adding 1e-200 shouldn't change any numerical results, but I haven't
run the test suite yet.

Sol, can I add your example to the statsmodels test suite, then I
don't have to find my own numbers. Or, maybe it will be easy to adapt
an existing test, I haven't looked yet.

Josef
Post by j***@public.gmane.org
(I'm not sure whether it applies to glm Binomial, but in some cases
complete separation, that is perfect predictability of the outcome, is
a corner case that creates convergence problems and is not handled by
the models themselves.)
Josef
Post by Sol
Post by Sol
sorry, yes XYP are always the same. It just changed success / failures
(so y)
thanks for your help.
Post by Sol
Hello,
I tried yesterday evening different cases and it failed only when i
get 0 failure somewhere
array([[ 16.,   0.],
       [ 11.,   8.],
       [ 18.,   2.],
       [ 13.,   9.],
       [ 19.,   5.],
       [ 22.,   2.],
       [ 10.,  16.],
       [ 10.,  19.],
       [ 21.,  10.],
       [ 21.,  14.],
       [ 16.,  19.],
       [ 20.,  16.],
       [ 14.,  22.],
       [ 21.,  15.],
       [  4.,  34.],
       [ 18.,  21.],
       [ 23.,  16.],
       [ 16.,  26.],
       [ 12.,  33.],
       [ 16.,  30.],
       [ 21.,  27.],
       [ 19.,  30.],
       [ 17.,  32.],
       [ 26.,  26.],
       [ 22.,  30.],
       [ 21.,  31.],
       [ 14.,  38.],
       [ 13.,  40.],
       [ 11.,  44.],
       [ 21.,  34.],
       [ 23.,  46.],
       [  9.,  67.]])
Post by Sol
Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS
From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?
Does it work in trunk?  This looks like the original boundary problem
in glm where the predicted value is equal to y (I think it does
something like log(1 - y/mu)), but I thought I fixed it for Binomial.
Maybe only for Negative Binomial.
I will have a look when I get to a machine where I can, but the
solution is to just add a clip in the family (I think).  You can see
the fixes for the others to get an idea.
Skipper
Sol
2010-11-26 15:21:43 UTC
Permalink
I just have a look at your answers since I have a huge working
day...so I will explore your two possible solutions to weigh the pros
and cons ;o)

I answer you above all about my examples --> I give you the right to
use and abuse of them, no problem, it will be a pleasure to help you
evenif it seems that I give you more works than I help ;o) . As well
to diffuse the beginner tutorial I send you ^^

Have a lovely week end and thanks for your anwers
Post by j***@public.gmane.org
Post by j***@public.gmane.org
Post by Sol
about already fixed errors: I use codes from
scikits.statsmodels-0.2.0.zip
Does it include them?
No, there are some bugfixes that we did since we released 0.2.0.
Unfortunately, our current release is already two months behind
schedule.
The best would be if you could switch to 0.3-devel or the trunk
version. Then you wouldn't have to make the few adjustments that are
not backwards compatible, e.g. renaming family to families.
The trunk version might also still miss a few more recent changes. The
0.3.-devel version should be stable for the old parts that were also
present in 0.2.0, but the newer code is still moving quite a bit.
I tried you zero failure example
print '\n case with zero failures'
mod_glm = sm.GLM(yn.astype(float)+1e-14,
            sm.add_constant(XYP.astype(float), prepend=True),
            family=sm.families.Binomial())
res = mod_glm.fit(data_weights = nbtot)
print res.params
--->
 case with zero failures
[ -6.83285117e+00  -1.47701877e-06   3.19973011e-06  -3.81612790e-01]
----------
The zero failure bug is still present in 0.3-devel. Because in this
case the difference is compared to 1, the difference has to be
numerically significant for double precision. The smallest amount that
I had to add to get a working solution is 1e-14.
Essentially, the code for Binomial has a*log(b) at several different
places, and any of them could produce a nan for corner cases with
0*log(0).
For now, you have two options: you stay with 0.2.0 and work around
this corner case bugs by keeping away from zeros, or you switch to
0.3-devel, and we fix the bugs as fast as possible (and we try to keep
0.3.devel failure free). If you have bazaar installed the latter
version would be the most useful for you and for making statsmodels
more robust.
I fixed it in my branch and will merge to 0.3-devel later today.
http://bazaar.launchpad.net/~josef-pktd/statsmodels/statsmodels-josef...
Adding 1e-200 shouldn't change any numerical results, but I haven't
run the test suite yet.
Sol, can I add your example to the statsmodels test suite, then I
don't have to find my own numbers. Or, maybe it will be easy to adapt
an existing test, I haven't looked yet.
Josef
Post by j***@public.gmane.org
(I'm not sure whether it applies to glm Binomial, but in some cases
complete separation, that is perfect predictability of the outcome, is
a corner case that creates convergence problems and is not handled by
the models themselves.)
Josef
Post by Sol
Post by Sol
sorry, yes XYP are always the same. It just changed success / failures
(so y)
thanks for your help.
Post by Sol
Hello,
I tried yesterday evening different cases and it failed only when i
get 0 failure somewhere
array([[ 16.,   0.],
       [ 11.,   8.],
       [ 18.,   2.],
       [ 13.,   9.],
       [ 19.,   5.],
       [ 22.,   2.],
       [ 10.,  16.],
       [ 10.,  19.],
       [ 21.,  10.],
       [ 21.,  14.],
       [ 16.,  19.],
       [ 20.,  16.],
       [ 14.,  22.],
       [ 21.,  15.],
       [  4.,  34.],
       [ 18.,  21.],
       [ 23.,  16.],
       [ 16.,  26.],
       [ 12.,  33.],
       [ 16.,  30.],
       [ 21.,  27.],
       [ 19.,  30.],
       [ 17.,  32.],
       [ 26.,  26.],
       [ 22.,  30.],
       [ 21.,  31.],
       [ 14.,  38.],
       [ 13.,  40.],
       [ 11.,  44.],
       [ 21.,  34.],
       [ 23.,  46.],
       [  9.,  67.]])
Post by Sol
Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS
From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?
Does it work in trunk?  This looks like the original boundary problem
in glm where the predicted value is equal to y (I think it does
something like log(1 - y/mu)), but I thought I fixed it for Binomial.
Maybe only for Negative Binomial.
I will have a look when I get to a machine where I can, but the
solution is to just add a clip in the family (I think).  You can see
the fixes for the others to get an idea.
Skipper
Sol
2010-11-25 08:24:49 UTC
Permalink
Hello,
I tried yesterday evening different cases and it failed only when i
get 0 failure somewhere

example of y:

array([[ 16., 0.],
[ 11., 8.],
[ 18., 2.],
[ 13., 9.],
[ 19., 5.],
[ 22., 2.],
[ 10., 16.],
[ 10., 19.],
[ 21., 10.],
[ 21., 14.],
[ 16., 19.],
[ 20., 16.],
[ 14., 22.],
[ 21., 15.],
[ 4., 34.],
[ 18., 21.],
[ 23., 16.],
[ 16., 26.],
[ 12., 33.],
[ 16., 30.],
[ 21., 27.],
[ 19., 30.],
[ 17., 32.],
[ 26., 26.],
[ 22., 30.],
[ 21., 31.],
[ 14., 38.],
[ 13., 40.],
[ 11., 44.],
[ 21., 34.],
[ 23., 46.],
[ 9., 67.]])
Post by Sol
Post by Sol
I just tried to add +2e-200 to data_weights...it failed again (Nan
error)
...may be a silly try, but just to tell :oS
From my reading of the code this should have worked. Because of the
0*log(0) is nan problem only exact zeros are creating a problem.
you could also try adding replacing the zero failure with 1e-10, just
to see if it solves the nan problem.
I have to look in more details at the problem. Is the list below the Y
that creates the problem and the XYP is still the same as before?
Does it work in trunk?  This looks like the original boundary problem
in glm where the predicted value is equal to y (I think it does
something like log(1 - y/mu)), but I thought I fixed it for Binomial.
Maybe only for Negative Binomial.
I will have a look when I get to a machine where I can, but the
solution is to just add a clip in the family (I think).  You can see
the fixes for the others to get an idea.
Skipper
Continue reading on narkive:
Loading...