sql - Oracle fuzzy searching with UTL functions
问题描述
I need to implement fuzzy search on database layer, but I am having some minor issues. Here is my SQL code for demonstration :
SELECT *
FROM (SELECT *
FROM TOOLS
WHERE UTL_MATCH.jaro_winkler_similarity(UPPER('sample tool'), UPPER(NAME)) > 80
ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('sample tool', NAME) DESC)
where ROWNUM <= 10;
I am selecting 10 tools that match best the criteria of jaro winkler and edit distance similarity utl functions. Struggle I am having is, that I am not getting exact matches first. For example when I type rich the best scored candidate is 'mich' and then are tools with name 'rich' for example 'rich 12', 'rich ax', ...
- Is it possible to get "exact matches" first with these utl functions or is there any function that fits my requirements better? Our fuzzy search should be focused more on skipping some characters rather than replacing them for another.
- Is it possible to not take into account word length with these functions? (for example when I type 'di' I want to get results as 'dinosaur', but the word doesn't match my score criteria only because its length and I am getting no results.
解决方案
首先对结果进行排名,获取排名最高的结果。像这样的东西(阅读代码中的注释):
SQL> with
2 tools (name) as
3 -- sample data
4 (select 'mich' from dual union all
5 select 'rich 12' from dual union all
6 select 'rich ax' from dual
7 ),
8 temp as
9 -- rank similirities first
10 (select name,
11 utl_match.jaro_winkler_similarity('&&par_tool', name) sim,
12 --
13 rank() over (order by
14 utl_match.jaro_winkler_similarity('&&par_tool', name) desc) rnk
15 from tools
16 )
17 -- finally, return the "top" similar values
18 select name, sim, rnk
19 from temp
20 where rnk = 1;
Enter value for par_tool: rich
NAME SIM RNK
---------- ---------- ----------
rich 12 91 1
rich ax 91 1
SQL> undefine par_tool
SQL> /
Enter value for par_tool: mick
NAME SIM RNK
---------- ---------- ----------
mich 88 1
SQL>
推荐阅读
- javascript - 反应链接不更新组件
- php - 如何删除多维数组中的唯一元素?
- c - CS50 pset4 恢复,出现分段错误
- sql-server - 如何避免快照隔离级别的更新丢失问题?
- asp.net-core - “ILoggingBuilder”不包含“AddEventLog”的定义
- java - 尝试使用另一种方法打印一种方法
- python - 如何使用 SSL 将 SQLAlchemy 连接到 GCP CLoud SQL 实例?
- javascript - JavaScript 十进制数学
- python - 根据其他数据帧的比较添加一个表示最大值的数据帧
- javascript - VueJS - 通过注销组件擦除后端服务器设置的 cookie