Understanding RASA NLU Pipeline

September 19, 2022

Understanding RASA NLU Pipeline

Creating pipelines and policies is one of the crucial steps required to be done in RASA. We as a developer often face the issue of incorrect predictions or lower accuracy rates while developing a chatbot. Many times it is seen that by changing your pipelines or policies, you can drastically improve the performance of your Bots. Let's move ahead to know more about it.

What is a Rasa Pipeline?

As we know that machine learning algorithms follow a sequence of steps such as tokenization, featurization etc. These tasks can be defined under pipelines. So we can say that a pipeline is a sequence of tasks which are executed while training a Rasa NLU model. Every such task is called a pipeline component. Pipelines are defined in config.yml file. Let's have a look at different pipeline components present of RASA.

Rasa NLU Pipeline Components

Tokenizers
Featurizers
Intent Classifier
Entity Extractors
Selectors

1. Tokenizers

To understand a sentence it is important to understand every word independently. For that we have to break the sentence into words. Tokenizers serves this purpose. Tokenizers break a sentence into smaller pieces which are known as Tokens. Here the input sentence is converted into a list of words.

A very basic example of Tokenizer in rasa is ‘WhitespaceTokenizer', it breaks the sentence with white spaces as selector in the sentence.

Eg,

“Hello World” —> “Hello”, “World”

Examples of Tokenizers:

WhitespaceTokenizers
JiebaTokenizer
MitieTokenizer
SpacyTokenizer

We can also use following flags with any tokeniser.

intent_tokenization_flag: It will tell us whether to tokenise intent labels or not. Its value can be either True or False
intent_split_symbol: It will tokenise your intent labels based on the symbol defined. Default value is underscore(“_”)

Pipeline:
- name: “WhitespaceTokenizor”
“intent_tokenization_flag”: True
“intent_split_symbol”: “-”

2. Featurizers

Now as we have broken down the sentence to tokens, the next step for our model is to understand meaning of each of these tokens. Featurizers convert these tokens into machine understandable format which is basically a vector of numbers/binary. That is why it is also called vectorization.

Text featurizers are divided into two types, sparse features and dense features.

Sparse features usually have a lot of missing values (eg. Zeros). It is a wastage of memory to store the unnecessary zeroes in the memory. So we store them in the form of sparse features, in which we just store the values having value as one.

Dense Features return feature vectors which contain pre-trained embedding. The length of such vectors varies from 50-300. Dense features work better for NLP problems because they capture the similarity between two words.

Eg,

“hi” —> [0,1,1,0,0,0,0,……,0,1]

Examples of different Featurizers in Rasa:

Different Featurizers in Rasa

3. Intent Classifier

Once we have the features of every token for all sentences, we pass it to the intent classifiers. Intent classifier assigns an intent to the input user query. We can use DIETClassifier which can classify both intents and entities of a sentence. DIETClassifier is Dual Intent Entity Transformer, which uses Transformer algorithm to classify Intents and Entities in user’s query.

Confidence

Once we have a list of all the Intents and Entities with us, it also have the confidence linked with every classified intent. Confidence is just a numeric score which tells us how much sure the model is in classifying that specific intent. For example, if the confidence of bot for intent A is 0.67, it means out of 1 the model considers the intent as 0.67 times likely to classify the query in intent A.

FallbackClassifier

There are situations when the bot was asked a question which lies outside of its scope. In those scenario we use FallbackClassifier to respond back. We define a threshold value for such cases. Once our Intent classifier gives us a list of all possible intents and their confidences, we check if the threshold value is greater than the highest confidence received. If yes, we classify that query as an out of scope query and shows a fallback value. Otherwise, we classify the highest confidence intent.

Examples of some Intent Classifiers or Rasa models:

MitieIntentClassifier
LogisticRegressionClassifier
SklearnIntentClassifier
KeywordIntentClassifier
DIETClassifier
FallbackClassifier

4. Entity Extractors

Once we have Intents with us, our next step is to extract the Entity present in the sentence. Entity extractors extracts the entity present in the sentence. An entity can be a name, location or phone number. Though we have discussed that DIETClassifiers can extract the entities but there can be a number of entities which the DIETClassifier can fail to identify. Sometimes we need a list of synonyms to extract the entities, sometimes there can be a specific regex which is required to extract the entities. We have different types of entity extractors for such situations.

Example, we have EntitySynonymMapper which can be used to map the synonyms together. Like UK, United Kingdom and uk can be considered as a synonym of each other. Similarly, Vacation, holiday and leave can be considered synonym in some situations.

We also have RegexEntityExtractor which can be used to extract the regex match from the sentence.

Examples of Entity Extractor

MitieEntityExtractor
SpacyEntityExtractor
CRFEntityExtractor
DucklingEntityExtractor
DIETClassifier
RegexEntityExtractor
EntitySynonymMapper

5. Selectors

Once we have the intent name and the entity with us. Now we have to select a suitable response from a set of candidate responses. The selection of a suitable response is done by considering the confidence of different intents.

ResponseSelector is used for this purpose. It outputs a set of dictionary with the key as the retrieval intent of the response selector and value containing predicted responses, confidence and the response key under the retrieval intent.

Let's have a look at a Rasa Pipeline Example:

pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: DIETClassifier
- name: EntitySynonymMapper
- name: ResponseSelector
- name: FallbackClassifier

Conclusion

In this blog we have discussed about what is a rasa pipeline and what are its components. It is important to know about it because it can help you in tuning your rasa model for better results.

Thank you for reading. To learn about RASA policies click here.

To read my previous blog on Rasa click here.

Happy learning!!

Search This Blog

Knowledge Smack

Featured

Let's Learn about RASA Policies

Understanding RASA NLU Pipeline

What is a Rasa Pipeline?

Rasa NLU Pipeline Components

1. Tokenizers

Examples of Tokenizers:

2. Featurizers

Examples of different Featurizers in Rasa:

Different Featurizers in Rasa

3. Intent Classifier

Confidence

FallbackClassifier

Examples of some Intent Classifiers or Rasa models:

4. Entity Extractors

Examples of Entity Extractor

5. Selectors

Conclusion

Comments

Post a Comment

Popular Posts

How can you create your RASA custom Actions as Asynchronous