Creating pipelines and policies is one of the crucial steps required to be done in RASA. We as a developer often face the issue of incorrect predictions or lower accuracy rates while developing a chatbot. Many times it is seen that by changing your pipelines or policies, you can drastically improve the performance of your Bots. Let's move ahead to know more about it.
What is a Rasa Pipeline?
As we know that machine learning algorithms follow a sequence of steps such as tokenization, featurization etc. These tasks can be defined under pipelines. So we can say that a pipeline is a sequence of tasks which are executed while training a Rasa NLU model. Every such task is called a pipeline component. Pipelines are defined in
config.yml file. Let's have a look at different pipeline components present of RASA.
Rasa NLU Pipeline Components
- Tokenizers
- Featurizers
- Intent Classifier
- Entity Extractors
- Selectors
1. Tokenizers
To understand a sentence it is important to understand every word independently. For that we have to break the sentence into words. Tokenizers serves this purpose. Tokenizers break a sentence into smaller pieces which are known as Tokens. Here the input sentence is converted into a list of words.
A very basic example of Tokenizer in rasa is ‘WhitespaceTokenizer', it breaks the sentence with white spaces as selector in the sentence.
Eg,
“Hello World” —> “Hello”, “World”
Examples of Tokenizers:
- WhitespaceTokenizers
- JiebaTokenizer
- MitieTokenizer
- SpacyTokenizer
We can also use following flags with any tokeniser.
- intent_tokenization_flag: It will tell us whether to tokenise intent labels or not. Its value can be either True or False
- intent_split_symbol: It will tokenise your intent labels based on the symbol defined. Default value is underscore(“_”)
Pipeline:
- name: “WhitespaceTokenizor”
“intent_tokenization_flag”: True
“intent_split_symbol”: “-”
2. Featurizers
Now as we have broken down the sentence to tokens, the next step for our model is to understand meaning of each of these tokens. Featurizers convert these tokens into machine understandable format which is basically a vector of numbers/binary. That is why it is also called vectorization.
Text featurizers are divided into two types, sparse features and dense features.
Sparse features usually have a lot of missing values (eg. Zeros). It is a wastage of memory to store the unnecessary zeroes in the memory. So we store them in the form of sparse features, in which we just store the values having value as one.
Dense Features return feature vectors which contain pre-trained embedding. The length of such vectors varies from 50-300. Dense features work better for NLP problems because they capture the similarity between two words.
Eg,
“hi” —> [0,1,1,0,0,0,0,……,0,1]
Examples of different Featurizers in Rasa:
|
Different Featurizers in Rasa |
3. Intent Classifier
Once we have the features of every token for all sentences, we pass it to the intent classifiers. Intent classifier assigns an intent to the input user query. We can use DIETClassifier which can classify both intents and entities of a sentence. DIETClassifier is Dual Intent Entity Transformer, which uses Transformer algorithm to classify Intents and Entities in user’s query.
Confidence
Once we have a list of all the Intents and Entities with us, it also have the confidence linked with every classified intent. Confidence is just a numeric score which tells us how much sure the model is in classifying that specific intent. For example, if the confidence of bot for intent A is 0.67, it means out of 1 the model considers the intent as 0.67 times likely to classify the query in intent A.
FallbackClassifier
There are situations when the bot was asked a question which lies outside of its scope. In those scenario we use FallbackClassifier to respond back. We define a threshold value for such cases. Once our Intent classifier gives us a list of all possible intents and their confidences, we check if the threshold value is greater than the highest confidence received. If yes, we classify that query as an out of scope query and shows a fallback value. Otherwise, we classify the highest confidence intent.
Examples of some Intent Classifiers or Rasa models:
- MitieIntentClassifier
- LogisticRegressionClassifier
- SklearnIntentClassifier
- KeywordIntentClassifier
- DIETClassifier
- FallbackClassifier
4. Entity Extractors
Once we have Intents with us, our next step is to extract the Entity present in the sentence. Entity extractors extracts the entity present in the sentence. An entity can be a name, location or phone number. Though we have discussed that DIETClassifiers can extract the entities but there can be a number of entities which the DIETClassifier can fail to identify. Sometimes we need a list of synonyms to extract the entities, sometimes there can be a specific regex which is required to extract the entities. We have different types of entity extractors for such situations.
Example, we have EntitySynonymMapper which can be used to map the synonyms together. Like UK, United Kingdom and uk can be considered as a synonym of each other. Similarly, Vacation, holiday and leave can be considered synonym in some situations.
We also have RegexEntityExtractor which can be used to extract the regex match from the sentence.
Examples of Entity Extractor
- MitieEntityExtractor
- SpacyEntityExtractor
- CRFEntityExtractor
- DucklingEntityExtractor
- DIETClassifier
- RegexEntityExtractor
- EntitySynonymMapper
5. Selectors
Once we have the intent name and the entity with us. Now we have to select a suitable response from a set of candidate responses. The selection of a suitable response is done by considering the confidence of different intents.
ResponseSelector is used for this purpose. It outputs a set of dictionary with the key as the retrieval intent of the response selector and value containing predicted responses, confidence and the response key under the retrieval intent.
Let's have a look at a Rasa Pipeline Example:
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: DIETClassifier
- name: EntitySynonymMapper
- name: ResponseSelector
- name: FallbackClassifier
Conclusion
In this blog we have discussed about what is a rasa pipeline and what are its components. It is important to know about it because it can help you in tuning your rasa model for better results.
Thank you for reading. To learn about RASA policies click here.
Happy learning!!
Comments
Post a Comment