Discussion:
[pystatsmodels] SARIMAX fit speed
'Steffen Rolf-Pissarczyk' via pystatsmodels
2018-09-28 08:05:53 UTC
Permalink
Hello everyone,


my name is Steffen Rolf-Pissarczyk and I hope to get in here an answer for
my problem using statsmodels 0.9.0


I try to fit a SARIMA model (with a rather long seasonal lag of 96) to a
timeseries with 90000 entries.



I found that this task takes a rather long time ( couple of hours), using
SARIMAX and fit.



For comparison I used the ECOMETRIC TOOLBOX from Matlab.

Here I use regARIMA and the estimate() function and I just takes seconds
to get fitting parameters.



I already tried to search throughout the internet to find the reason for
this speed discrepancy without any luck.

(I tried already different methods or the enforcing_stationary and
enforce_invertibility to FALSE option , as well as simple_differencing )


Can you help me or give me a good advise to make the statsmodels fit speed
comparable ?


Thanks a lot
Steffen
Chad Fulton
2018-09-29 05:48:20 UTC
Permalink
Hi Steffen,

Not knowing too much about regARIMA I may not be able to give a very
satisfying answer (
https://www.mathworks.com/help/econ/regarima.estimate.html).

Statsmodels puts this model in state space form, and then estimates the
parameters of the model using exact maximum likelihood, where the
likelihood function is computed using the Kalman filter.

If the model is SARIMA(p, d, q)x(P, D, Q, s), then the size of the state
space increases linearly in `max(P, Q+1) * s`, so if you have a seasonal
period of 90, then you will have a potentially very large state space.
Since each recursion involves multiplying matrices with the dimension of
the state space, and we don't have sparse matrix multiplcations, this can
dramatically impact performance.

Based on their short description of their process, it looks like regARIMA
does not put the model in state space form (and so probably does not use
exact maximum likelihood estimation).

Best,
Chad

On Fri, Sep 28, 2018 at 4:20 AM 'Steffen Rolf-Pissarczyk' via pystatsmodels
Post by 'Steffen Rolf-Pissarczyk' via pystatsmodels
Hello everyone,
my name is Steffen Rolf-Pissarczyk and I hope to get in here an answer for
my problem using statsmodels 0.9.0
I try to fit a SARIMA model (with a rather long seasonal lag of 96) to a
timeseries with 90000 entries.
I found that this task takes a rather long time ( couple of hours), using
SARIMAX and fit.
For comparison I used the ECOMETRIC TOOLBOX from Matlab.
Here I use regARIMA and the estimate() function and I just takes seconds
to get fitting parameters.
I already tried to search throughout the internet to find the reason for
this speed discrepancy without any luck.
(I tried already different methods or the enforcing_stationary and
enforce_invertibility to FALSE option , as well as simple_differencing )
Can you help me or give me a good advise to make the statsmodels fit speed
comparable ?
Thanks a lot
Steffen
'Steffen Rolf-Pissarczyk' via pystatsmodels
2018-10-04 18:30:32 UTC
Permalink
Hi Chad,

thank you very much for the answer.
that makes sense.
And I guess there is no statsmodel function which does a less sophisticated
but faster fit ?

Best,
Steffen
Post by Chad Fulton
Hi Steffen,
Not knowing too much about regARIMA I may not be able to give a very
satisfying answer (
https://www.mathworks.com/help/econ/regarima.estimate.html).
Statsmodels puts this model in state space form, and then estimates the
parameters of the model using exact maximum likelihood, where the
likelihood function is computed using the Kalman filter.
If the model is SARIMA(p, d, q)x(P, D, Q, s), then the size of the state
space increases linearly in `max(P, Q+1) * s`, so if you have a seasonal
period of 90, then you will have a potentially very large state space.
Since each recursion involves multiplying matrices with the dimension of
the state space, and we don't have sparse matrix multiplcations, this can
dramatically impact performance.
Based on their short description of their process, it looks like regARIMA
does not put the model in state space form (and so probably does not use
exact maximum likelihood estimation).
Best,
Chad
On Fri, Sep 28, 2018 at 4:20 AM 'Steffen Rolf-Pissarczyk' via
Post by 'Steffen Rolf-Pissarczyk' via pystatsmodels
Hello everyone,
my name is Steffen Rolf-Pissarczyk and I hope to get in here an answer
for my problem using statsmodels 0.9.0
I try to fit a SARIMA model (with a rather long seasonal lag of 96) to a
timeseries with 90000 entries.
I found that this task takes a rather long time ( couple of hours), using
SARIMAX and fit.
For comparison I used the ECOMETRIC TOOLBOX from Matlab.
Here I use regARIMA and the estimate() function and I just takes
seconds to get fitting parameters.
I already tried to search throughout the internet to find the reason for
this speed discrepancy without any luck.
(I tried already different methods or the enforcing_stationary and
enforce_invertibility to FALSE option , as well as simple_differencing )
Can you help me or give me a good advise to make the statsmodels fit
speed comparable ?
Thanks a lot
Steffen
j***@gmail.com
2018-10-04 21:40:55 UTC
Permalink
On Thu, Oct 4, 2018 at 2:30 PM 'Steffen Rolf-Pissarczyk' via pystatsmodels <
Post by 'Steffen Rolf-Pissarczyk' via pystatsmodels
Hi Chad,
thank you very much for the answer.
that makes sense.
And I guess there is no statsmodel function which does a less
sophisticated but faster fit ?
In the "old" times I was working on an ARMA implementation using
scipy.signal.lfilter, which just filters the data and minimizes sum of
squared residuals. It hasn't been deleted yet and there is a miscmodel
example using t distribution as loglikelihood.
`miscmodels` are mostly my test and example cases for
GenericLikelihoodModel.

However, when ARMA and similar were taken over by kalman filter based
implementation, then I stopped working in that neigborhood, and it's still
just in prototype shape.

Also, ARMA has a acceleration for long time series, AFAIR, but doesn't have
seasonal structure.
To see whether there is a speed advantage you could try ARMA with a larger
number of lags to compare performance.

Testcases for my old models are in statsmodels.miscmodels.tests.test_arma
with
from statsmodels.miscmodels.tmodel import TArma
from statsmodels.tsa.arma_mle import Arma

Again, these are not finished models, but they might give an idea whether
or how much faster this kind of implementation would be for the long time
series, large max lag cases.
(Seasonal is not implemented, but I guess it would be just an hour of work
to add the necessary parameter transformation from params to full lag
polynomial required by lfilter.)

Josef
Post by 'Steffen Rolf-Pissarczyk' via pystatsmodels
Best,
Steffen
Am Sa., 29. Sep. 2018 um 07:48 Uhr schrieb Chad Fulton <
Post by Chad Fulton
Hi Steffen,
Not knowing too much about regARIMA I may not be able to give a very
satisfying answer (
https://www.mathworks.com/help/econ/regarima.estimate.html).
Statsmodels puts this model in state space form, and then estimates the
parameters of the model using exact maximum likelihood, where the
likelihood function is computed using the Kalman filter.
If the model is SARIMA(p, d, q)x(P, D, Q, s), then the size of the state
space increases linearly in `max(P, Q+1) * s`, so if you have a seasonal
period of 90, then you will have a potentially very large state space.
Since each recursion involves multiplying matrices with the dimension of
the state space, and we don't have sparse matrix multiplcations, this can
dramatically impact performance.
Based on their short description of their process, it looks like regARIMA
does not put the model in state space form (and so probably does not use
exact maximum likelihood estimation).
Best,
Chad
On Fri, Sep 28, 2018 at 4:20 AM 'Steffen Rolf-Pissarczyk' via
Post by 'Steffen Rolf-Pissarczyk' via pystatsmodels
Hello everyone,
my name is Steffen Rolf-Pissarczyk and I hope to get in here an answer
for my problem using statsmodels 0.9.0
I try to fit a SARIMA model (with a rather long seasonal lag of 96) to a
timeseries with 90000 entries.
I found that this task takes a rather long time ( couple of hours),
using SARIMAX and fit.
For comparison I used the ECOMETRIC TOOLBOX from Matlab.
Here I use regARIMA and the estimate() function and I just takes
seconds to get fitting parameters.
I already tried to search throughout the internet to find the reason for
this speed discrepancy without any luck.
(I tried already different methods or the enforcing_stationary and
enforce_invertibility to FALSE option , as well as simple_differencing )
Can you help me or give me a good advise to make the statsmodels fit
speed comparable ?
Thanks a lot
Steffen
Chad Fulton
2018-10-05 00:59:52 UTC
Permalink
Post by j***@gmail.com
On Thu, Oct 4, 2018 at 2:30 PM 'Steffen Rolf-Pissarczyk' via pystatsmodels
Post by 'Steffen Rolf-Pissarczyk' via pystatsmodels
Hi Chad,
thank you very much for the answer.
that makes sense.
And I guess there is no statsmodel function which does a less
sophisticated but faster fit ?
In the "old" times I was working on an ARMA implementation using
scipy.signal.lfilter, which just filters the data and minimizes sum of
squared residuals. It hasn't been deleted yet and there is a miscmodel
example using t distribution as loglikelihood.
`miscmodels` are mostly my test and example cases for
GenericLikelihoodModel.
However, when ARMA and similar were taken over by kalman filter based
implementation, then I stopped working in that neigborhood, and it's still
just in prototype shape.
Also, ARMA has a acceleration for long time series, AFAIR, but doesn't
have seasonal structure.
To see whether there is a speed advantage you could try ARMA with a larger
number of lags to compare performance.
Testcases for my old models are in statsmodels.miscmodels.tests.test_arma
with
from statsmodels.miscmodels.tmodel import TArma
from statsmodels.tsa.arma_mle import Arma
Again, these are not finished models, but they might give an idea whether
or how much faster this kind of implementation would be for the long time
series, large max lag cases.
(Seasonal is not implemented, but I guess it would be just an hour of work
to add the necessary parameter transformation from params to full lag
polynomial required by lfilter.)
Josef
Another possible approach - maybe regARIMA is using the innovations
algorithm from Brockwell and Davis. Kevin Sheppard just introduced this
algorithm into Statsmodels (although not for computing ARIMA coefficients
yet) and maybe knows more about how this approach works.

Chad

ADENOMON MONDAY OSAGIE
2018-10-04 21:03:25 UTC
Permalink
Please I need idea on Matlab especially on econometrics and time series
analysis

Dr. Adenomon M O.
Department of Statistics
Nasarawa State University, Keffi, Nigeria
On 28 Sep 2018 12:20, "'Steffen Rolf-Pissarczyk' via pystatsmodels" <
Post by 'Steffen Rolf-Pissarczyk' via pystatsmodels
Hello everyone,
my name is Steffen Rolf-Pissarczyk and I hope to get in here an answer for
my problem using statsmodels 0.9.0
I try to fit a SARIMA model (with a rather long seasonal lag of 96) to a
timeseries with 90000 entries.
I found that this task takes a rather long time ( couple of hours), using
SARIMAX and fit.
For comparison I used the ECOMETRIC TOOLBOX from Matlab.
Here I use regARIMA and the estimate() function and I just takes seconds
to get fitting parameters.
I already tried to search throughout the internet to find the reason for
this speed discrepancy without any luck.
(I tried already different methods or the enforcing_stationary and
enforce_invertibility to FALSE option , as well as simple_differencing )
Can you help me or give me a good advise to make the statsmodels fit speed
comparable ?
Thanks a lot
Steffen
Loading...