# Earthquake Prediction¶

## Objective¶

As the motivation behind earthquake prediction is to empower crisis measures to decrease demise and devastation, inability to give notice of a significant earthquake that happens, or possibly a satisfactory assessment of the hazard, can bring about legitimate risk, or even political cleansing.

## Dataset¶

Dataset contains 2 columns as below:

- Acoustic_data - Acoustic wave reading
- Time_to_failure - Time remaining before the next earthquake

## Random Forest Regression Workflow for Earthquake Prediction¶

Random Forest Regression model belongs to family of bagging regression. It is a supervised learning model that uses ensemble learning method for regression. Ensemble learning method is a technique that combines predictions from multiple models to make prediction more accurately than a single model.

Features of Random Forest -

- Aggregates many decision trees
- Prevents overfitting

## Prepare data for modeling¶

Follow workflow arrow

**ZipWithIndex**- Creates new feature column from dataframe index as ID**Group data**- Creates new feature column as key obtained by ID divided by length of data

**Feature Engineering**- Groups by data on key to create all statistical measures (min, max, mean, quartiles etc) as new feature

**Feature Vector**- Merge multiple columns to form vector

## Data modeling¶

- Before we create Random Forest Regression model, split data (80:20) into train and test for performance evaluation.

### Random Forest Regression¶

- Sets feature vector corresponding to label(time_to_failure_label).
- Sets number of features for each split node of tree.
- For regression the measure of impurity is variant.
- In random forest, the impurity decrease from each feature can be averaged across trees to determine the final importance of the variable.
- The maxBins signifies the maximum number of bins used for splitting the features, where the suggested value is 100 to get better results.
- The maxDepth is the maximum depth of the tree (for example, depth 0 means one leaf node, depth 1 means one internal node plus two leaf nodes).
- Information gain is calculated by comparing the entropy of the dataset before and after a transformation.

## Model evaluation¶

- Multiple ways to evaluate regression model such as R square, Root mean square error(rmse), mean square error(mse)