Fire can be easily installed on an AWS EMR Cluster. Fire can be installed on the master node of an EMR cluster. It would then submit the jobs to the EMR cluster.
Below are the overall steps for installing Fire Insights on EMR.
- ssh into the Master node
- Download Fire Insights from https://www.sparkflows.io/download
- Unzip it
- Create H2 Database
- Start Fire
Start your EMR cluster on AWS:
Start your EMR cluster on AWS if you do not already have it running.
Update the inbound rules for the Master Node:
- We would have Fire listening on ports 8085 and 8086 - Fire by default listens on 8080 and 8443. But EMR clusters have other processes listening on these ports. - So we will later change it to listen on ports 8085 and 8086 - Update the inbound rules for the Master Node to allow ports 8085 and 8086
ssh into the Master EMR node as the
ssh -i my.pem email@example.com
Download the fire tgz file by one of the following options:
tar xvf fire-x.y.z.tgz
cp /usr/lib/hadoop-lzo/lib/hadoop-lzo.jar /home/hadoop/fire-3.1.0/fire-user-lib
Configure Fire to listen on ports 8085 and 8086:
- cd <fire install_dir> - Edit conf/application.properties - Update the last two lines to below: http.port=8085 https.port=8086
Create H2 DB:
Fire stores its metadata into the embedded H2 database. You can also connect it to an external MySQL database. cd <fire install_dir> ./create-h2-db.sh
Launch Fire Server:
cd <fire install_dir> ./run-fire-server.sh start
Open your web browser and navigate to:
Login with the following default username and password:
username : admin password : admin
Connect Fire with the EMR Cluster:
- Go to Administration/Configuration - Click on 'Infer Hadoop Configs' - Save - If your EMR cluster is not running HIVE, update 'spark.sql-context = SQLContext'
hadoopuser in Fire:
- Under Administration/Users, add the 'hadoop' user
Loading Example Workflows¶
From the home page of Fire Insights, click on *Load Example Applications*
Upload the Fire examples data onto HDFS:
cd <fire install_dir> hadoop fs -put data /tmp
Install and Running Example Workflows¶
Start off with executing the example workflows:
- Fire comes pre-packaged with a number of example workflows - You can install them by clicking on the 'Install example workflows' link in the landing page when logged in as the `admin` user.
Logout from the current session and login again with the ‘hadoop’ user
- Execute the workflows
Adding a new user¶
Create the home directory on HDFS for the new user.
For example, for user ‘test’:
- hadoop fs -mkdir /user/test
- hadoop fs -chown test:test /user/test
Create the user in Fire Insights if not already created.
Extra configuration for running PySpark¶
EMR needs extra configurations when running PySpark. In the below the python 3.6 virtual environment is installed in the directory /home/hadoop/venv
- export SPARK_HOME=/usr/lib/spark/
- export PYSPARK_PYTHON=/home/hadoop/venv/bin/python
- export YARN_CONF_DIR=/etc/hadoop/conf