Reading and Writing from MongoDB

MongoDB is a document database with the scalability and flexibility that you want with the querying and indexing that you need. Here we are loading data from HDFS and Saving it into MongoDB.

Workflow for Loading data into MongoDB

The below workflow reads in the Sample Dataset which is in CSV format from HDFS.

It then saves the data into MongoDB.

SaveMongoDB

The below diagram shows the dialog box for the SaveMongoDB Processor.

SaveMongoDB

Workflow Execution

When we execute the Workflow, it reads in the dataset from HDFS and loads it into MongoDB.

SaveMongoDB

Workflow for Reading data from MongoDB

The below workflow reads Data in MongoDB.It then prints the data.

ReadMongoDB

The below diagram shows the dialog box for the ReadMongoDB Processor.

ReadMongoDB

In the above dialog, the ‘Refresh Schema’ button infers the schema of the collections. Thus it is able to pass down the output schema to the next Processor making it easy for us to build the workflow.

Workflow Execution

When we execute the Workflow, it reads in the Sample collection from MongoDB and displays the first few lines.

We see that the Sample data records we wrote to MongoDB in the first workflow is read back now.

ReadMongoDB