Seasonality is omnipresent in real life and hence is likely to show up in your data especially when trying to model consumer behavior. But it’s not a certainty and hence whether to include or not to include the effects of seasonality in your model needs to be tested out. So how do we do that?
Say our dataset has a column for the four quarters in a year. We want to know if the quarter variable by itself has an influence on the dependent variable Y. That is, do we expect a dip in sales in quarter 3 for example and a spike in sales in quarter 4? To analyze that, we convert the quarter column into seasonal dummy columns like below.
X1 = 1 if quarter 1 and 0 otherwise
X2 = 1 if quarter 2 and 0 otherwise
X3 = 1 if quarter 3 and 0 otherwise
And the model…
Y = β0 + β1.X1 + β2.X2 + β3.X3 + ε
X1, X2 and X3 variables are added to the dataset as new columns with the column for quarter 4 treated as a reference. That is, an increase or decrease in sales for quarter 1 with respect to quarter 4 is measured by β1. Similarly, quarter 2 sales with respect to quarter 4 is measured by β2 and so on. But do we actually care about the coefficients for these three quarters? Not really because this is not our main model and it does not include other variables besides seasonality. What we care about is the overall model fit with only seasonal dummies in the model. And assessing that model fit requires us to setup null and alternative hypothesis in the form of an F-test (more at That Venerable F-Test) and rejecting the null confirms that there is seasonality in the data. And we reject null when F-stat is significant with an associated p-value of < 0.05. If p-value is greater than 0.05, we then do not reject the null and hence there is no seasonality in the data and the quarter column can be eliminated from the analysis.
So that’s that.
Image credit – Paurian, Flickr