solr - Solr query not working as expected when it contains the `@` character
问题描述
I have a field called email_txt
of type text_general
that holds a list of emails of type abc@xyz.com
,
and I'm trying to create a query that will only search the username and disregard the domain.
My query looks something like this:
email_txt:*abc*@*
This produces 0 results. I expect to receive results where the username contains abc
, like abcdefg@xyz.com
, fooabc@xyzbuzz.com
, barabcefg@fizzxyz.com
, abc@fizz.com
. And yes, I am confident that I have data of that type, it doesn't work even if I try email_txt:*@*
.
If I try something like:
email_txt:*abc*
It works, and produces multiple results, including the desired ones from above, but also cases where the domain contains abc
, like fizz@helpmeabc.com
, which is not desired.
I've had a look at the documentation (just in case I'm going crazy) and it confirms that @
is not a special character. Even so, I have tried to escape it like this (just in case, I am going crazy):
email_txt:*abc*\@*
- still, 0 results
Now the actual question. Is @
a special character? If so, how can it be escaped, if not what am I doing wrong in the query? I genuinely can't tell if there is a flaw in my logic, or if there is something that I am missing.
Notes: I'm using solr version 6.3.0, the doc is for 6.6 (the closest available)
解决方案
When you're using the StandardTokenizer (which the default field types text_general
, text_en
, etc. use by default), the content will be split into tokens when the @
sign occurs. That means that for your example, there are actually two or three tokens being stored, (izz
and helpmeabc.com
) or (izz
, helpmeabc
and com
).
A wildcard match is applied against the tokens by themselves (unless using the complex phrase query parser), where no tokenization and filtering taking place (except for multi term aware filters such as the lowercase filter).
The effect is that your query, *abc*@*
attempts to match a token containing @
, but since the processing when you're indexing splits on @
and separate the tokens based on that character, no tokens contain @
- and thus, giving you no hits.
You can use the string
field type or a KeywordTokenizer
paired with filters such as the lower case filter, etc. to get the original input more or less as a complete token instead.
推荐阅读
- graphviz - Graphviz/自定义边缘对齐
- javascript - 如何使用车把更改 json 中的字体颜色负数、正数数据?
- mysql - 同一张表中(不是和是)的Mysql WHERE条件
- android - 我想创建一个颤动页面来加载和播放存储在 Firebase 存储中的视频
- background - 后台奇怪的 JFrog Artifactory 进程(Ubuntu 18.04)
- android - Flutter:未来的构建
- c# - 我无法使用多个按钮将子项添加到 ListView
- ios - 使用 fastlane 远程构建 ios 应用程序时出错
- javascript - 如何使用 Bootstrap 设置父 div 的表格高度
- tensorflow - 有没有办法在 TensorFlow 中选择学习率和预热学习率(没有 Keras 的 API)?