This workflow reads in a dataset. It then Tokenizes and then performs TF/IDF on text content.
Below is the workflow. It does the following:
- Reads data from a sample dataset.
- Tokenizes message column.
- Performs TF.
- Performs IDF.
- Prints the results.
Tokenizes message column¶
It Tokenizes message column generated by sample dataset file using Tokenizer Node.
It performs TF on text column using HashingTF Node.
It performs IDF on text column using IDF Node.