k***@gmail.com
2018-10-02 20:24:50 UTC
I am working with statsmodel version 0.8.0 and python 3.6. I have the
following pandas dataframe, df, with two columns: 'date' and 'count'. The
'date' column is has a datetime dtype and and 'count' has an integer dtype.
There is an observation/ (row) corresponding to each Monday between
2009-12-28 and 2018-09-24, and these Monday dates are the contents of the
'date' column:
df =
date count
2009-12-28 2
2010-01-04 19
2010-01-11 18
2010-01-18 8
2010-01-25 18
2010-02-01 23
.
2018-09-17 15
2018-09-24 7
I am able to successfully utilize the statsmodels.tsa.statespace.SARIMAX
class to produce past predictions of 'count' - using the .get_prediction()
method - and future predictions of 'count' - using the .get_forecast()
method - when the 'date' column contains the start date of each month
between 2010-01-01 and 2018-09-01; in this case, the day is always set to
'01'. The same code that is successful in this case fails, however, if the
'date' column contains the *last* day of each month and the day is variable
('30', '31', '28', or '29').
According to the documentation, the failure of SARIMAX to work when I used
the last day of each month in the 'date' column is somewhat expected since
the date - when converted to an index for use in SARIMAX - must be in
regular time intervals. It makes sense that since some months have more
days than others, the computer would fail to see that the unit of time is 1
month in that case.
However, in the weekly case the observations are all exactly seven days
apart so I expected that the algorithm to be able to self-detect the unit
of time to be 1 week/7 days. Is there any way for me to get the SARIMAX
object to train and predict on a time unit of 1 week?
Many thanks,
Kathryn
following pandas dataframe, df, with two columns: 'date' and 'count'. The
'date' column is has a datetime dtype and and 'count' has an integer dtype.
There is an observation/ (row) corresponding to each Monday between
2009-12-28 and 2018-09-24, and these Monday dates are the contents of the
'date' column:
df =
date count
2009-12-28 2
2010-01-04 19
2010-01-11 18
2010-01-18 8
2010-01-25 18
2010-02-01 23
.
2018-09-17 15
2018-09-24 7
I am able to successfully utilize the statsmodels.tsa.statespace.SARIMAX
class to produce past predictions of 'count' - using the .get_prediction()
method - and future predictions of 'count' - using the .get_forecast()
method - when the 'date' column contains the start date of each month
between 2010-01-01 and 2018-09-01; in this case, the day is always set to
'01'. The same code that is successful in this case fails, however, if the
'date' column contains the *last* day of each month and the day is variable
('30', '31', '28', or '29').
According to the documentation, the failure of SARIMAX to work when I used
the last day of each month in the 'date' column is somewhat expected since
the date - when converted to an index for use in SARIMAX - must be in
regular time intervals. It makes sense that since some months have more
days than others, the computer would fail to see that the unit of time is 1
month in that case.
However, in the weekly case the observations are all exactly seven days
apart so I expected that the algorithm to be able to self-detect the unit
of time to be 1 week/7 days. Is there any way for me to get the SARIMAX
object to train and predict on a time unit of 1 week?
Many thanks,
Kathryn