As the motivation behind earthquake prediction is to empower crisis measures to decrease demise and devastation, inability to give notice of a significant earthquake that happens, or possibly a satisfactory assessment of the hazard, can bring about legitimate risk, or even political cleansing.
Dataset contains 2 columns as below:
- Acoustic_data - Acoustic wave reading
- Time_to_failure - Time remaining before the next earthquake
Random Forest Regression Workflow for Earthquake Prediction¶
Random Forest Regression model belongs to family of bagging regression. It is a supervised learning model that uses ensemble learning method for regression. Ensemble learning method is a technique that combines predictions from multiple models to make prediction more accurately than a single model.
Features of Random Forest -
- Aggregates many decision trees
- Prevents overfitting
Prepare data for modeling¶
Follow workflow arrow
- ZipWithIndex- Creates new feature column from dataframe index as ID
- Group data- Creates new feature column as key obtained by ID divided by length of data
- Feature Engineering- Groups by data on key to create all statistical measures (min, max, mean, quartiles etc) as new feature
- Feature Vector - Merge multiple columns to form vector
- Before we create Random Forest Regression model, split data (80:20) into train and test for performance evaluation.
Random Forest Regression¶
- Sets feature vector corresponding to label(time_to_failure_label).
- Sets number of features for each split node of tree.
- For regression the measure of impurity is variant.
- In random forest, the impurity decrease from each feature can be averaged across trees to determine the final importance of the variable.
- The maxBins signifies the maximum number of bins used for splitting the features, where the suggested value is 100 to get better results.
- The maxDepth is the maximum depth of the tree (for example, depth 0 means one leaf node, depth 1 means one internal node plus two leaf nodes).
- Information gain is calculated by comparing the entropy of the dataset before and after a transformation.
- Multiple ways to evaluate regression model such as R square, Root mean square error(rmse), mean square error(mse)