Reading/Writing from S3

Fire is fully integrated with AWS S3. The Dataset Processors of Fire, can directly read data from S3 if the policies allow them to.

Dataset Processors

Dataset Processors include:

  • Read CSV
  • Read Parquet
  • Read JSON
  • Read XML

The path specified for reading from S3 would be s3://…

Reading from S3

Below is an example Workflow. It reads a CSV file from S3, parses it and prints out the first 10 records.

In the dialog box of the Read CSV processor the path is specified as s3a://sparkflow-sample-data/data/Clickthru.csv

S3 Workflow
S3 CSV Dialog
S3 CSV Output

Writing to S3

Below is an example Workflow. It reads a CSV file and save it to S3 path specified.

In the dailog box of the save CSV processor the path is specified as s3a://sparkflow-sample-data/write/

S3 Workflow
S3 Workflow

Execution Result

S3 Workflow

Once the above workflow successfully completed, the save data can be viewed using DATABROWSERS/AWS S3 Location with specified path

S3 Workflow

Saving ML Model to S3

Saving Spark ML Model

Below is an example workflow in sparkflows, where data is read from S3 and the final Spark ML model is saved to S3 location.

Workflow:

Configure ReadCSV

Spark ML Workflow

Configure SaveMlModel

Spark ML Workflow

Execution Result:

Spark ML Workflow

Saving H20 ML Model

Below is an example workflow in sparkflows, where final H20 ML model is saved to S3 location.

Workflow:

H20 ML Workflow

Configure Save H20 ML Model

H20 ML Workflow

Execution Result:

H20 ML Workflow