c++ - High CPU usage with std::vector of strings + std::search and std::boyer_moore_horspool_searcher
问题描述
I have a list of strings stored in a vector. These strings are std::wstring and have variable length, some are 20 characters and some are even 100.
The data being searched in is also std::wstring, but quite huge (some 500kb text, some even 10mb text).
Searching on the mighty Google and doing some tests, the fastest way to search for strings is std::boyer_moore_horspool_searcher, it cannot even be compared to std::find.
Everything is "cool" and working accordingly, despite the CPU usage. I really need to have a lower CPU usage, because at the moment (and I`m still not done adding needle strings) my software is using up to 10% CPU + 50-120 Mbps I/O on my 12 core CPU.
To give you an example of what I`m doing, I've put together this small example (can't post large data):
#include <iostream>
#include <string>
#include <algorithm>
#include <functional>
int main()
{
std::wstring Data = L"Lorem Ipsum is simply dummy text of the printing and typesetting industry. "
L"Lorem Ipsum has been the industry's standard dummy text ever since the 1500s"
L" when an unknown printer took a galley of type and scrambled it to make a type specimen book. "
L"It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged."
L" It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages,"
L" and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.";
std::vector<std::wstring> Strings;
Strings.reserve(5);
Strings.push_back(L"Contrary to popular belief, Lorem Ipsum is not simply random text.");
Strings.push_back(L"The standard chunk of Lorem Ipsum used since the 1500s is reproduced below for those interested.");
Strings.push_back(L"Banana");
Strings.push_back(L" It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages,");
Strings.push_back(L"It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout.");
for (auto const& String : Strings)
{
auto it = std::search(Data.begin(), Data.end(), std::boyer_moore_horspool_searcher(String.begin(), String.end()));
if (it != Data.end())
std::wcout << L"The string " << String << L" found at offset "
<< it - Data.begin() << '\n';
else
std::wcout << L"The string " << String << L" not found\n";
}
return 0;
}
Compiled with VS 2019 and ISO C++17 Standard (/std:c++17), Release and maximum optimizations.
Please give me some advice on how to reduce this CPU and I/O load. I`m even considering having compiled time needle strings if that helps, or even strstr.
Your help is much appreciated.
解决方案
推荐阅读
- ansible - 在 Ansible 中的列表中循环列表
- wget - 如何导出使用 Dotcms 制作的网站?
- javascript - 如何迭代数组(过滤器)?
- c# - 使用 Microsoft Graph API 的解决方法 NotFound?
- javascript - 在线将 ES6 javascript 文件转换为 ES5
- javascript - 当托管服务提供商需要 app.js 文件时,在 Phusion Passenger 上托管 NestJS 应用程序
- python - 无法打开 Docker 安装的文件夹/文件
- python - Ubuntu 服务器上的无头 Selenium
- c# - IAsyncResult 到 Byte[] 的转换
- typescript - vue3 typescript 问题,如何修复 ts(2305) 错误?