Databricks Python Integration Steps¶
Fire Insights integrates with Databricks and can submit Python jobs. It submits jobs to the Databricks clusters using the REST API of Databricks and have the results displayed back in Fire Insights.
Below are the steps for Integrating Fire Insights with your Databricks Clusters for running Python jobs.
NOTE: The Machine on which Fire Insights is installed should have Python 3.6.0 or above.
Python Installation Steps:
Install Fire Insights¶
Install Fire Insights on your machines. The machine has to be reachable from the Databricks cluster.
Upload Fire wheel file to Databricks¶
Fire Insights wheel file has to be uploaded to Databricks. Fire Insights jobs running on Databricks make use of this wheel file.
fire-x.y.z/dist/fire-3.1.0-py3-none-any.whl to Databricks. Upload it under Workspace as a Library on to Databricks under DBFS or even in S3 Bucket which is accessible from the Databricks Cluster.
1. Login to
2. Click on
workspace in the left side pane¶
3. Create a new Library¶
you can select Library Source as
DBFS, Library Type as
Python Whl, provide any
Library Name field, & add File Path of
fire-3.1.0-py3-none-any.whl located in DBFS.
On Clicking on
Create button it will ask to install on specific databricks Cluster, select cluster on which you want to install.
On Successfull installation of wheel file on Databricks Cluster, it would be displayed under
You can upload
fire-3.1.0-py3-none-any.whl file even in s3 Bucket which is accessible from Databricks Cluster.
Once you Upload
fire-3.1.0-py3-none-any.whl file in s3 Bucket, Login to Databricks Cluster & inside Libraries tab.
Install New Library & select
DBFS/S3 in Library Source,
Python Whl in Library Type and copy paste the location of python wheel file available in s3 in File Path & Click on Install.
Once it will Install Successfully, you can see the python wheel inside Library is up.
Install Python dependencies¶
You need to install the python dependencies required by Fire Insights on the machine by running below Command from
pip install -r requirements.txt
Note: Make sure that pip etc. is already installed on that machine
Install dependency for AWS¶
Copy the jars
aws-java-sdk to pyspark jar path.
Install any specific package of python, if Need to use in Custom Processors on databricks Cluster aswellas Fire Insights Machine.
Use the command below to install it on the Fire Insights machines:
pip install scorecardpy
Install it on your Databricks cluster with the below:
- Open a Notebook and attach to Databricks Cluster.
- %sh pip install scorecardpy
Upload Fire workflowexecutedatabricks.py file to DBFS¶
For Python Job submission to Databricks Cluster.
fire-x.y.z/dist/workflowexecutedatabricks.py, file to DBFS or even S3 Bucket too.
UPLOAD it, using DBFS Browser too.
Configure the Uploaded Library in Fire Insights¶
Configure the path of the uploaded fire python wheel package file & workflowexecutedatabricks.py under
databricks.pythonPackages respectively in Fire Insights.
It can be two source either
If you have Uploaded in
If you have Uploaded in
Job Submission using Pyspark Engine¶
Now You can submit pyspark jobs to Databricks Cluster from Fire Insights.