首页 > 解决方案 > Rasa RegexFeaturizer 是基于令牌还是整个句子?

问题描述

- regex: regex features for intent classification
  examples: |
    - \bon road pric/i
    - \bonroad pric/i

我已经测试了上面的正则表达式,它们工作正常。因此我确信正则表达式没有问题

例子:

training-row-1] Please tell me on road price now.  
training-row-2] Please tell me price now.  

基于上述正则表达式模式,应该添加的正则表达式功能是:

training-row-1] Please tell me on road price now. ==> TRUE (because regex match)
training-row-2] Please tell me price now.         ==> FALSE (regex don't match)

我的问题是,在 RegexFeaturizer 中,正则表达式匹配是在整个句子上发生还是在每个标记上发生?把它放在整个句子上是有意义的。

我假设的上述特征化是否正确?

标签: rasa-nlurasa

解决方案


我在RegexFeaturizer.

"""
Given a sentence, returns a vector of {1,0} values indicating which
regexes did match. Furthermore, if the message is tokenized, the 
function will mark all tokens with a dict relating the name of the 
regex to whether it was matched.
"""

所以我认为它将整个句子作为输入。在 Rasa 中很难看到特征空间的内部,但我已经确认在使用RegexEntityExtractor时跨令牌拾取了正确的实体。通过在 NLU 数据中临时添加实体示例(确保它在意图中至少出现两次)并运行rasa interactive.


推荐阅读