首页 > 解决方案 > 如何读取文件或流直到找到字符串

问题描述

我正在编写一个字典程序,输入由文件指定并解析如下:

std::string savedDictionary(std::istreambuf_iterator<char>(std::ifstream(DICTIONARY_SAVE_FILE)), {});
// entire file loaded into savedDictionary
for (size_t end = 0; ;)
{
    size_t term = savedDictionary.find("|TERM|", end);
    size_t definition = savedDictionary.find("|DEFINITION|", term);
    if ((end = savedDictionary.find("|END|", definition)) == std::string::npos) break;
    // store term and definition here...
}

这会std::bad_alloc导致我的一些第三世界用户的机器没有足够的 RAM 来存储字典字符串 + 字典,因为它保存在我的程序中。如果我能做到这一点:

std::string term;
for (std::ifstream file(DICTIONARY_SAVE_FILE); file; std::getline(file, term, "|END|")
{
    // same as above
}

那么它会很棒,但std::getline不支持字符串作为分隔符。"|END|"那么,在我找到之前没有预先分配大量内存的情况下,读取文件的最惯用方法是什么?

标签: c++streamout-of-memory

解决方案


我们可以通过使用一个非常简单的代理类来实现所请求的功能。这样就很容易像往常一样使用 all std::algorithms 和 all s 。std::iterator

因此,我们定义了一个名为LineUntilEnd. 这可以与所有stream像 astd::ifstream或任何你喜欢的 s 一起使用。您可以特别简单地使用提取器运算符从输入流中提取一个值并将其放入所需的变量中。

    // Here we will store the lines until |END|
    LineUntilEnd lue;

    // Simply read the line until |END| 
    while (testInput >> lue)    {

它按预期工作。

如果我们有这样一个字符串,我们可以用简单的正则表达式操作来解析它。

我添加了一个小示例并将结果值放入 astd::multimap以构建演示字典。

请看以下代码

#include <iostream>
#include <string>
#include <iterator>
#include <regex>
#include <map>
#include <sstream>
#include <iterator>


// Ultra simple proxy class to read data until given word is found
struct LineUntilEnd
{
    // Overload the extractor operator
    friend std::istream& operator >>(std::istream& is, LineUntilEnd& lue);

    // Intermediate storage for result
    std::string data{};
};

// Read stream until "|END|" symbol has been found
std::istream& operator >>(std::istream& is, LineUntilEnd& lue)
{
    // Clear destination string
    lue.data.clear();

    // We will count, how many bytes of the search string have been matched
    size_t matchCounter{ 0U };

    // Read characters from stream
    char c{'\0'};
    while (is.get(c))
    {
        // Add character to resulting string
        lue.data += c;
        // CHeck for a match. All characters must be matched
        if (c == "|END|"[matchCounter]) {
            // Check next matching character
            ++matchCounter;
            // If there is a match for all characters in the searchstring
            if (matchCounter >= (sizeof "|END|" -1)) {
                // The stop reading
                break;
            }
        }
        else {
            // Not all charcters could be matched. Start from the begining
            matchCounter = 0U;
        }
    }
    return is;
}

// Input Test Data
std::istringstream testInput{ "|TERM|bonjour|TERM|hola|TERM|hi|DEFINITION|hello|END||TERM|Adios|TERM|Ciao|DEFINITION|bye|END|" };

// Regex defintions. Used to build up a dictionary
std::regex reTerm(R"(\|TERM\|(\w+))");
std::regex reDefinition(R"(\|DEFINITION\|(\w+)\|END\|)");

// Test code
int main() 
{
    // We will store the found values in a dictionay
    std::multimap<std::string, std::string> dictionary{};

    // Here we will store the lines until |END|
    LineUntilEnd lue;

    // Simply read the line until |END| 
    while (testInput >> lue)    {

        // Search for the defintion string
        std::smatch sm{};
        if (std::regex_search(lue.data, sm, reDefinition)) {

            // Definition string found
            // Iterate over all terms
            std::sregex_token_iterator tokenIter(lue.data.begin(), lue.data.end(), reTerm, 1);
            while (tokenIter != std::sregex_token_iterator()) {

                // STore values in dictionary
                dictionary.insert({ sm[1],*tokenIter++ });
            }
        }
    }

    // And show some result to the user
    for (const auto& d : dictionary) {
        std::cout << d.first << " --> " << d.second << "\n";
    }

    return 0;
}

推荐阅读