Time Series Feature Engineering¶
It is a process of extracting new features from raw data via data mining techniques. These features can be used to improve the performance of models.
Dataset contains 4 columns as below:
- Date - Date when product was sold
- Store - Store id from where product got sold
- Item - Item id
- Sales - Quantity of product sold
Create new feature from existing table to improve performance of models
Feature Engineering Workflow¶
Each column is a feature. But all features may not produce the best results from models, so feature engineering plays an important role in choosing the right features. A model will not entirely improve its prescient force, yet will offer the adaptability to utilize less unpredictable models that are quicker to run and more handily.
One step moving average
- Moving average is commonly used to streamline short-period fluctuations in time series data and feature long-term patterns.
- For one step, window size will be from -1 to 1 for sales data
Seven step moving average
- For seven step, window size will be from -7 to 7 for sales data
- Moving average output
Extract Date Time Features¶
- Break date and get the year, month, week of year, day of the month, hour, minute, second, etc.
- Output of Date Time Features
- Lag is used to make non-stationary data into stationary data
- Outliers are easily discernible on a lag plot
- acf and pacf plot is used to calcluate best lags
- The most commonly used lag is 1, called a first-order lag
- Window shift is one
- Window shift is seven