Databricks Python Integration Steps

Fire Insights integrates with Databricks and can submit Python jobs. It submits jobs to the Databricks clusters using the REST API of Databricks and have the results displayed back in Fire Insights.

Below are the steps for Integrating Fire Insights with your Databricks Clusters for running Python jobs.

NOTE: The Machine on which Fire Insights is installed should have Python 3.6.0 or above.

Python Installation Steps:

Install Fire Insights

Install Fire Insights on your machines. The machine has to be reachable from the Databricks cluster.

Upload Fire wheel file to Databricks

Fire Insights wheel file has to be uploaded to Databricks. Fire Insights jobs running on Databricks make use of this wheel file.

Upload fire-x.y.z/dist/fire-3.1.0-py3-none-any.whl to Databricks. Upload it under Workspace as a Library on to Databricks under DBFS or even in S3 Bucket which is accessible from the Databricks Cluster.

Wheel File

1. Login to Databricks Cluster

2. Click on workspace in the left side pane

Databricks

3. Create a new Library

you can select Library Source as DBFS, Library Type as Python Whl, provide any Library Name field, & add File Path of fire-3.1.0-py3-none-any.whl located in DBFS.

Databricks

On Clicking on Create button it will ask to install on specific databricks Cluster, select cluster on which you want to install.

Databricks
Databricks

On Successfull installation of wheel file on Databricks Cluster, it would be displayed under Libararies.

Databricks

You can upload fire-3.1.0-py3-none-any.whl file even in s3 Bucket which is accessible from Databricks Cluster.

Once you Upload fire-3.1.0-py3-none-any.whl file in s3 Bucket, Login to Databricks Cluster & inside Libraries tab.

Install New Library & select DBFS/S3 in Library Source, Python Whl in Library Type and copy paste the location of python wheel file available in s3 in File Path & Click on Install.

Databricks

Once it will Install Successfully, you can see the python wheel inside Library is up.

Databricks

Install Python dependencies

You need to install the python dependencies required by Fire Insights on the machine by running below Command from fire-x.y.z/dist/fire/ directory.

Run Command pip install -r requirements.txt

Databricks

Note: Make sure that pip etc. is already installed on that machine

Install dependency for AWS

Copy the jars hadoop-aws and aws-java-sdk to pyspark jar path.

Databricks

Install any specific package of python, if Need to use in Custom Processors on databricks Cluster aswellas Fire Insights Machine.

Use the command below to install it on the Fire Insights machines:

  • pip install scorecardpy
Databricks

Install it on your Databricks cluster with the below:

  • Open a Notebook and attach to Databricks Cluster.
  • %sh pip install scorecardpy
Databricks

Upload Fire workflowexecutedatabricks.py file to DBFS

For Python Job submission to Databricks Cluster.

Upload fire-x.y.z/dist/workflowexecutedatabricks.py, file to DBFS or even S3 Bucket too.

Databricks

You can UPLOAD it, using DBFS Browser too.

Databricks

Configure the Uploaded Library in Fire Insights

Configure the path of the uploaded fire python wheel package file & workflowexecutedatabricks.py under databricks.pythonFile & databricks.pythonPackages respectively in Fire Insights.

It can be two source either DBFS or S3 path.

If you have Uploaded in DBFS path.

Databricks

If you have Uploaded in S3 path.

Databricks

Job Submission using Pyspark Engine

Now You can submit pyspark jobs to Databricks Cluster from Fire Insights.

Submit Job