regex - Scalability of Regular Expressions (MarkLogic)
问题描述
I have been looking around for ways to do regular expression in MarkLogic for XQuery and SPARQL. But it seems that for XQuery fn:match
is the only way to approach this. It also seems that it is recommended to scale down the data with queries before running it through a for loop which can be seen in this thread. However, what If I am unable to scale it down and there is a need to loop through millions of data, is there a more scalable way to do this? I'm unsure if task bot is the option I should be looking at.
On the other hand in SPARQL there are two ways to approach this.
First Method
SELECT ?s ?p ?o
WHERE {?s ?p ?o
FILTER (regex (?o, ".*Name.*", "i"))
}
Second Method
PREFIX fn: <http://www.w3.org/2005/xpath-functions#>
SELECT ?s ?p ?o
WHERE {?s ?p ?o
FILTER (fn:matches(?o, ".*Name.*"))
}
Among these two options to take in SPARQL are they the same or one of them is slightly better then the other? I would also greatly appreciate any advise or better ways to approach this for both SPARQL and XQuery
解决方案
基本上你正在用你的搜索字符串做一个子字符串匹配"Name"
,对于那个fn:contains就足够了
fn:contains(?o, "Name")
一些忠告:
如果可以通过用简单的字符串搜索过滤器替换,请避免使用正则表达式
我曾经不得不使用不那么复杂的正则表达式在 Java 中重做整个项目,但即使是这几个环顾四周也让它变得非常慢。我不得不将这些正则表达式分解为多个级别的字符串搜索过滤器,这有什么不同。在 MarkLogic 中,fn:substring-before和fn:substring-after等函数可以帮助您在遍历字符串搜索过滤器级别时减少文本长度。
尽管如此,如果您必须使用正则表达式并且遇到性能问题,那么除了并行计算之外,最好将正则表达式匹配的责任委托给像 Perl 这样最擅长的语言/技术。
推荐阅读
- com - 如何在给定头文件的情况下实现接口?
- tensorflow - 在 conda 环境中运行 train.py 时没有名为“tensorflow”的模块
- php - 如何修复 PHP 中的“PHP 致命错误:找不到类 'ffmpeg_movie'”错误
- python-3.x - 如何在 Python 中使用 BeautifulSoup 从 HTML 链接解析嵌套表?
- css - 仅使用css就可以定义高度保持自然纵横比的动态图像列?
- google-cloud-platform - 如何连接到谷歌虚拟机实例上的 psql
- haskell - 记录中紧凑函数定义的语法扩展
- dfa - DFA 可以设计为接受任何语言吗?
- c - 使用不分配的二维指针
- angular - 联合类型的类型上不存在 typescript 属性“选项”