[pystatsmodels] Multiple time series forecasting

Discussion:

Maxime De Bois

2018-09-18 14:01:15 UTC

Hello,

I have an already fitted SARIMAX model and I need to test it on other time
series. So far, for every time series I need to test, I create a new model
and apply the 'filter' function with the fitted model parameters. While it
does not take long per time series, having thousands forecasting time
series to forecast, it ends up taking a very long time.

My question is : is there a way to create batches of SARIMAX models and do
the prediction all at once ? I have looked into the implementation of the
'forecast' function in order to reimplement it for my use, but it seems to
be quite tedious.

What do you think ?

Thank you !

Maxime

Chad Fulton

2018-09-18 21:48:01 UTC

Permalink

Hi Maxime,

It sounds to me like you're already doing the best that can be done within
the state space framework.

The only thing that I can immediately suggest is that you may not need to
filter over the entire length of each test series. You just need to filter
over "enough" observations so that the effect of initial conditions
disappears (this will depend on your model order).

If you'd like to post some example code, I can take a look and see if
anything obvious jumps out.

Best,
Chad

Post by Maxime De Bois
Hello,
I have an already fitted SARIMAX model and I need to test it on other time
series. So far, for every time series I need to test, I create a new model
and apply the 'filter' function with the fitted model parameters. While it
does not take long per time series, having thousands forecasting time
series to forecast, it ends up taking a very long time.
My question is : is there a way to create batches of SARIMAX models and do
the prediction all at once ? I have looked into the implementation of the
'forecast' function in order to reimplement it for my use, but it seems to
be quite tedious.
What do you think ?
Thank you !
Maxime

Maxime De Bois

2018-09-20 14:27:52 UTC

Permalink

Hello Chad,

Thank you for trying to help me out. When you say "enough observations",
you mean observations the size of the autoregressive order (p)? If so, I
have already been doing that.

An idea would be to multiprocess the forecasting part if the a single
forecasting does not take all the CPUs. I haven't check that, though.

Here two samples of code, first one to fit my model, second one to
predict/forecast (I am dealing with exogenous inputs as well, but the
problem is still here without). My parameters can take the following values:

- ar_order = p = 1..60
- derivative = d = 1 (needed, if not the model does not converge)
- ma_order = q = 0
- exogenous_order = e (not used as is in SARIMAX, it's just the number
of exogenous inputs per autoregressive sample) = 0..121

******************************************************** FIT
******************************************************************
endog, exog = reshape_ARIMAX_fit(data, hist, ph, e)
model = SARIMAX(endog=endog[i], order=(p, d, q),
enforce_stationarity=False).fit(method="powell",

maxiter=1000000, disp=0)
******************************************************* PREDICT
***********************************************************
endog_samples, exog_samples, exog_oos, y = reshape_ARIMAX_predict(data,
hist, ph, p, e, scalers)
preds = []
for i in range(len(endog_samples)):
model = SARIMAX(endog=endog_samples[i], exog=exog_samples[i], order=(p,
d, q)).filter(params)
preds.append(model.forecast(ph, exog=exog_oos[i])[-1])
********************************************************************************************************************************

Cheers,

Maxime

Post by Chad Fulton
Hi Maxime,
It sounds to me like you're already doing the best that can be done within
the state space framework.
The only thing that I can immediately suggest is that you may not need to
filter over the entire length of each test series. You just need to filter
over "enough" observations so that the effect of initial conditions
disappears (this will depend on your model order).
If you'd like to post some example code, I can take a look and see if
anything obvious jumps out.
Best,
Chad

Post by Maxime De Bois
Hello,
I have an already fitted SARIMAX model and I need to test it on other
time series. So far, for every time series I need to test, I create a new
model and apply the 'filter' function with the fitted model parameters.
While it does not take long per time series, having thousands forecasting
time series to forecast, it ends up taking a very long time.
My question is : is there a way to create batches of SARIMAX models and
do the prediction all at once ? I have looked into the implementation of
the 'forecast' function in order to reimplement it for my use, but it seems
to be quite tedious.
What do you think ?
Thank you !
Maxime

Chad Fulton

2018-09-21 12:20:34 UTC

Permalink

Post by Maxime De Bois
Hello Chad,
Thank you for trying to help me out. When you say "enough observations",
you mean observations the size of the autoregressive order (p)? If so, I
have already been doing that.
An idea would be to multiprocess the forecasting part if the a single
forecasting does not take all the CPUs. I haven't check that, though.
Here two samples of code, first one to fit my model, second one to
predict/forecast (I am dealing with exogenous inputs as well, but the
- ar_order = p = 1..60
- derivative = d = 1 (needed, if not the model does not converge)
- ma_order = q = 0
- exogenous_order = e (not used as is in SARIMAX, it's just the number
of exogenous inputs per autoregressive sample) = 0..121
******************************************************** FIT
******************************************************************
endog, exog = reshape_ARIMAX_fit(data, hist, ph, e)
model = SARIMAX(endog=endog[i], order=(p, d, q),
enforce_stationarity=False).fit(method="powell",
maxiter=1000000, disp=0)
******************************************************* PREDICT
***********************************************************
endog_samples, exog_samples, exog_oos, y = reshape_ARIMAX_predict(data,
hist, ph, p, e, scalers)
preds = []
model = SARIMAX(endog=endog_samples[i], exog=exog_samples[i],
order=(p, d, q)).filter(params)
preds.append(model.forecast(ph, exog=exog_oos[i])[-1])
********************************************************************************************************************************
Cheers,
Maxime

Hi Maxime,

Thanks for the code, and yes, there's nothing wrong with the way you're
doing it.

If performance really matters, you could probably get a little speed up if
you avoid constructing the results objects, which adds a little overhead
(of course if you need other things from the results object then this won't
help).

The key to do this is that forecasting with state space models is really
just equivalent to applying the Kalman filter to `endog` elements
containing `np.nan`, and that the filter method takes an argument
`return_ssm` which, if True, doesn't construct the full SARIMAXResults
object.

Here's a notebook that shows how to do it. It looks like the speedup is
about 2x or 3x (but I didn't test with thousands of cases, so I don't know
how it scales at that level):

https://gist.github.com/ChadFulton/f3889f76e75df7d190b707134b6b50ff

Best,
Chad

Maxime De Bois

2018-09-21 12:29:09 UTC

Permalink

This sounds very promising. I will for sure give it a try!

Thank you again, you're the best!

Kind regards,

Maxime

Post by Chad Fulton

Post by Maxime De Bois
Hello Chad,
Thank you for trying to help me out. When you say "enough observations",
you mean observations the size of the autoregressive order (p)? If so, I
have already been doing that.
An idea would be to multiprocess the forecasting part if the a single
forecasting does not take all the CPUs. I haven't check that, though.
Here two samples of code, first one to fit my model, second one to
predict/forecast (I am dealing with exogenous inputs as well, but the
- ar_order = p = 1..60
- derivative = d = 1 (needed, if not the model does not converge)
- ma_order = q = 0
- exogenous_order = e (not used as is in SARIMAX, it's just the
number of exogenous inputs per autoregressive sample) = 0..121
******************************************************** FIT
******************************************************************
endog, exog = reshape_ARIMAX_fit(data, hist, ph, e)
model = SARIMAX(endog=endog[i], order=(p, d, q),
enforce_stationarity=False).fit(method="powell",
maxiter=1000000, disp=0)
******************************************************* PREDICT
***********************************************************
endog_samples, exog_samples, exog_oos, y = reshape_ARIMAX_predict(data,
hist, ph, p, e, scalers)
preds = []
model = SARIMAX(endog=endog_samples[i], exog=exog_samples[i],
order=(p, d, q)).filter(params)
preds.append(model.forecast(ph, exog=exog_oos[i])[-1])
********************************************************************************************************************************
Cheers,
Maxime

Hi Maxime,
Thanks for the code, and yes, there's nothing wrong with the way you're
doing it.
If performance really matters, you could probably get a little speed up if
you avoid constructing the results objects, which adds a little overhead
(of course if you need other things from the results object then this won't
help).
The key to do this is that forecasting with state space models is really
just equivalent to applying the Kalman filter to `endog` elements
containing `np.nan`, and that the filter method takes an argument
`return_ssm` which, if True, doesn't construct the full SARIMAXResults
object.
Here's a notebook that shows how to do it. It looks like the speedup is
about 2x or 3x (but I didn't test with thousands of cases, so I don't know
https://gist.github.com/ChadFulton/f3889f76e75df7d190b707134b6b50ff
Best,
Chad