neo4j - 使用文本文件删除 neo4j 中的停用词
问题描述
我在 neo4j 中成功加载了 CSV 文件,我想删除数据集中的停用词。我在文本文件中有单独的停用词列表。我找到了一个使用停用词的示例代码。但我想用我的停用词列表替换它。我需要如何进行?我们可以在一个查询中加载 2 个数据集(kbv5.txt 和 stopwords.txt)吗?
我想在我的代码中包含停用词列表文件:
LOAD CSV FROM "file:///kbv5.txt" as row fieldterminator "."
with row
unwind row as text
with reduce(t=tolower(text), delim in
["","",",",".","!","?",'"',":",";","'","-"] | replace(t,delim,"")) as
normalized
with [w in split(normalized," ") | trim(w)] as words
unwind range(0,size(words)-2) as idx
MERGE (w1:Word {name:words[idx]})
ON CREATE SET w1.count = 1
ON MATCH SET w1.count = w1.count + 1
MERGE (w2:Word {name:words[idx+1]})
ON CREATE SET w2.count = 1
ON MATCH SET w2.count = w2.count + (case when idx = size(words)-2 then 1
else 0 end)
MERGE (w1)-[r:NEXT]->(w2)
ON CREATE SET r.count = 1 ON MATCH SET r.count = r.count +1
使用停用词的示例代码:
with "Great device, but the calls drop too frequently." as text
with replace(replace(tolower(text),".",""),",","") as normalized
with [w in split(normalized," ") | trim(w)] as words
with [w in words WHERE NOT w IN ["the","an","on"]] as words
UNWIND range(0,size(words)-2) as idx
MERGE (w1:Word {name:words[idx]})
MERGE (w2:Word {name:words[idx+1]})
MERGE (w1)-[:NEXT]->(w2)
提前致谢
解决方案
此代码演示了如何从文本中删除停用词。试试看; 它不会向您的数据库写入任何内容。您可以在导入后立即在代码顶部附近执行此操作。
WITH SPLIT('some of these words are unnecessary',' ') AS text,
SPLIT('are but of in the these',' ') AS stopwords
RETURN FILTER (word IN text WHERE NOT word IN stopwords)
推荐阅读
- javascript - 反应错误:警告超出最大更新深度
- autodesk-forge - Autodesk Forge 查看器 - GLTF 扩展
- android - 我正在尝试在 Flutter 中使用 Google 登录,问题是在填写登录凭据后它不会移动到下一页,而是返回登录
- spring-restcontroller - 具有不同查询参数的方法的 SpringDoc 问题
- java - 使用 OpenCSV v 5.3 写入 csv 文件时如何跳过引号?
- ios - KMM 应用程序无法在模拟器中运行
- javascript - 在新用户中插入空行未定义
- angular - Angular NgbTypeahead 有时不显示选项
- ssl - TLS 非法扩展 (gnutls)
- audio - 使用耳机和系统音频进行屏幕录制