首页 > 解决方案 > High CPU usage with std::vector of strings + std::search and std::boyer_moore_horspool_searcher

问题描述

I have a list of strings stored in a vector. These strings are std::wstring and have variable length, some are 20 characters and some are even 100.

The data being searched in is also std::wstring, but quite huge (some 500kb text, some even 10mb text).

Searching on the mighty Google and doing some tests, the fastest way to search for strings is std::boyer_moore_horspool_searcher, it cannot even be compared to std::find.

Everything is "cool" and working accordingly, despite the CPU usage. I really need to have a lower CPU usage, because at the moment (and I`m still not done adding needle strings) my software is using up to 10% CPU + 50-120 Mbps I/O on my 12 core CPU.

To give you an example of what I`m doing, I've put together this small example (can't post large data):

#include <iostream>
#include <string>
#include <algorithm>
#include <functional>

int main()
{
    std::wstring Data = L"Lorem Ipsum is simply dummy text of the printing and typesetting industry. "
        L"Lorem Ipsum has been the industry's standard dummy text ever since the 1500s"
        L" when an unknown printer took a galley of type and scrambled it to make a type specimen book. "
        L"It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged."
        L" It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages,"
        L" and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.";
   
    std::vector<std::wstring> Strings;
    Strings.reserve(5);

    Strings.push_back(L"Contrary to popular belief, Lorem Ipsum is not simply random text.");
    Strings.push_back(L"The standard chunk of Lorem Ipsum used since the 1500s is reproduced below for those interested.");
    Strings.push_back(L"Banana");
    Strings.push_back(L" It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages,");
    Strings.push_back(L"It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout.");

    for (auto const& String : Strings)
    {
        auto it = std::search(Data.begin(), Data.end(), std::boyer_moore_horspool_searcher(String.begin(), String.end()));

        if (it != Data.end())
            std::wcout << L"The string " << String << L" found at offset "
            << it - Data.begin() << '\n';
        else
            std::wcout << L"The string " << String << L" not found\n";
    }

    return 0;
}

Compiled with VS 2019 and ISO C++17 Standard (/std:c++17), Release and maximum optimizations.

Please give me some advice on how to reduce this CPU and I/O load. I`m even considering having compiled time needle strings if that helps, or even strstr.

Your help is much appreciated.

标签: c++windowswinapic++17

解决方案


推荐阅读