javascript - 检查数组的某个元素是否与以下相同
问题描述
我正在创建一个将 pdf 解析为文本的服务。当我有该文本时,我必须匹配一组单词。每次匹配时,它都会增加一个计数器。到目前为止,一切都很好。困难在于,在解析为文本时,我无法检查我所在的 pdf 页面。我意识到,在拆分中,每次有两个连续的换行符(/n/n)就意味着有一个页面变化。
我想做的是检查页面是否已更改,并且除了计算一个单词总共被找到的次数之外,还要说出它在哪些页面上。
例子
let data = `resignations / resignations. adm. mancom .: berenguer llinares
appointments. adm. unique: calvo valenzuela. other concepts: change of the administrative body:
joint administrators to sole administrator. change of registered office. ptda colomer, 6
Official Gazette of the Commercial Registry
no. 182 Friday, September 18, 2020 p. 33755
cve: borme-a-2020-182-03 verifiable in
sarria). registry data. t 2257, f 100, s 8, h a 54815, i / a 4 (10.09.20) .`
let wordsToSearch = ['resignations', "administrators"]
wordsToSearch.forEach((word) => {
// inside of here would like to have track of the page as well
let stringArray = data.split(' ');
let count = 0;
let result = ""
for (var i = 0; i < stringArray.length; i++) {
let wordText = stringArray[i];
if (new RegExp(word).test(wordText)) {
count++
}
}
// the expected result would word has appeared count times in the pages etc
result += `${word} has appeared ${count} times\n`
console.log(result)
/*
resignations has appeared 2 times
administrators has appeared 1 times
*/
})
如果有人也想出另一种方法,那就太好了
解决方案
您可以在这些双换行符处拆分文本,然后单独分析每一页。这是我的做法:
let data = `resignations / Friday resignations. adm. mancom .: berenguer llinares
appointments. adm. unique: calvo Friday valenzuela. other concepts: change of the administrative body:
joint administrators to sole administrator. change of registered office. ptda colomer, 6, Friday
Official Gazette of the Commercial Registry
no. 182 Friday, September 18, 2020 p. 33755
cve: borme-a-2020-182-03 verifiable in
sarria). registry data. t 2257, f 100, s 8, h a 54815, i / a 4 (10.09.20) .`
function analyseText(text, wordsToFind) {
const pages = data.split("\n\n");
const result = {};
for (let pageIndex = 0; pageIndex < pages.length; pageIndex++) {
analysePage({
pageIndex,
pageText: pages[pageIndex]
}, wordsToFind, result);
}
return Object.keys(result).map(k => result[k]);
}
function analysePage(page, wordsToFind, result) {
const {
pageText,
pageIndex
} = page;
wordsToFind.forEach(word => {
const count = (pageText.match(new RegExp(word, 'g')) || []).length;
if (count > 0) {
if (!result[word]) {
result[word] = {
name: word,
pageIndices: [],
count: 0
};
}
result[word].pageIndices.push(pageIndex);
result[word].count += count;
}
});
}
const result = analyseText(data, ['resignations', "administrators", "Friday"]);
console.log(result);
在这个例子中,我只是打印每一页的结果,但是你当然可以建立一些结果对象来保存每一页的结果。
推荐阅读
- windows - 来自 docker 的 ORACLE 的 ORACLE_HOME 环境变量
- java - java中的group by、排序和排序
- python - 如何在 Python 中反转列表顺序并在返回 None 时停止 yield?
- web-scraping - 用 R 抓取网页?
- python - 如何在 pyautogui 中更改屏幕
- html - target="_blank" 链接到 iOS Safari 中的 MP4 文件错误
- php - 来自使用 mysql 和 php 的两个表的 mysql db 的优先级基本结果
- html - 将多个样式重叠的元素放置
- php - 为图像 woocommerce 产品添加属性
- ios - 获取Objective C对象的内存地址?