首页 > 解决方案 > 如何快速读取文件以检查签名/幻数?

问题描述

我是一名学生,我对 C++ 和安全性还很陌生。我被分配了一项关于检查文件中的签名/幻数的任务,但我在加快阅读时间方面遇到了一些问题。

我的想法是使用 ifstream 以二进制模式读取文件,将其数据存储在向量中,然后将其转换为十六进制字符串。最后,我将检查给定的签名是否存在于十六进制字符串中。

理论上一切都很好,除了分配向量内存、读取和转换文件数据的整个过程需要很长时间。只有读取部分需要 44ms。

我想知道如何改善这一点?这是我的代码

UINT CheckForSignature(CString source, CString dest_path) {

    // source is the HEX string need to find in file, dest_path is the destination of the file

    ifstream file(dest_path, ios::binary);
    if (file.is_open()) {

        // check for size of the file
        file.seekg(0, ios::end);
        int iFileSize = file.tellg();
        // if the file size exceed 50MB, pass
        if (iFileSize > 50000000) {
            // return -1, means file exceed 50MB, which do not need to be checked
            return -1; 
        }

        // read file and store data in hex string

        file.seekg(0, ios::beg);
        vector<char> memblock(iFileSize);
        file.read(((char*)memblock.data()), iFileSize); // 18ms alloc memory

        ostringstream ostrData; // 44ms read file
        // add to a total of 62ms
        // if consider the time need to translate all the memblock
        // then this will be long as hell
        // need to improve this
        for (int i = 0; i < memblock.size(); i++) {
            int z = memblock[i] & 0xff;
            ostrData << hex << setfill('0') << setw(2) << z;
        }

        string strDataHex = ostrData.str();
        string strHexSource = (CT2A)source;
        if (strDataHex.find(strHexSource) != string::npos) {
            // return 1, means there exits the signature in the file
            return 1;
        }
        else {
            // return 0; means there isn't the signature in the file
            return 0;
        }

    }
}

我愿意接受有关解决方案和代码改进的所有帮助和建议。非常感谢!

标签: c++performancemagic-numbers

解决方案


There are much more performant ways to read and examine file content.

Here I show one naive/simple way (just an example.)

I've created a 51M file with "0000" at the end (I've removed the size limit):

~/projects$ l data.bin 
-rw-r--r-- 1 manuel manuel 51M jul 27 02:51 data.bin

(Showing last two lines.)

~/projects$ tail data.bin | hexdump

0000b80 11b9 dddd 8fe9 bab1 134d 5645 eb74 81ce
0000b90 3030 3030 000a                         
0000b95

Running your code (20 runs):

~/projects$ ./runtest.sh 131072 20
0 2360 1 2333 2 2355 3 2360 4 2349 5 2350 6 2353 7 2346 8 2342 9 2381 10 2378 11 2394 12 2338 13 2363 14 2392 15 2374 16 2365 17 2433 18 2426 19 2397 
Average: 2369

Running my example (20 runs):

~/projects$ ./runtest.sh 131072 20 mio
0 105 1 103 2 104 3 104 4 104 5 105 6 104 7 104 8 104 9 102 10 102 11 104 12 104 13 103 14 102 15 103 16 103 17 105 18 104 19 104 
Average: 103

With 5M file.

Yours:

~/projects$ ./runtest.sh 131072 20
0 238 1 243 2 244 3 242 4 243 5 244 6 239 7 245 8 243 9 246 10 239 11 246 12 243 13 242 14 240 15 243 16 242 17 245 18 240 19 243 
Average: 242

Example:

~/projects$ ./runtest.sh 131072 20 mio
0 10 1 10 2 10 3 11 4 10 5 10 6 10 7 10 8 10 9 10 10 11 11 10 12 10 13 10 14 10 15 10 16 10 17 10 18 10 19 10 
Average: 10

Script to compile and run (you can try several buffer sizes for my example):

#! /bin/bash

n=10
mio=""
bs=1024

if [ "$1" != "" ]
then
    bs=$1
fi

if [ "$2" == "" ]
then
    echo "Ups. Repeating? Will try with 10"
else
    n=$2
fi

if [ "$3" != "" ]
then
    mio="-DMIO"
fi

rm -f main

g++ -Wall -Wextra -g main.cc -o main -Wpedantic -std=c++2a -DBLOCK_SIZE=$bs $mio

tot=0
run=0
while [ "$run" != "$n" ]
do
    text=$(./main)
    mic=$(echo $text | cut - -d' ' -f 4)
    echo -n "$run $mic "
    tot=$(($tot + $mic))
    run=$(($run + 1))
done
echo
tot=$(($tot / $run))

echo "Average: $tot"
int main()
{
    string dest_path{"data.bin"};
    const unsigned char hex[] = {0x30, 0x30, 0x30, 0x30, 0x00 }; //  what to look for
#ifdef MIO
    ifstream file(dest_path, ios::binary);
    int numblocks = 0;
    std::chrono::high_resolution_clock::time_point init;
    std::chrono::high_resolution_clock::time_point finish;
    bool found = false;
    bool you_bet = false;
    unsigned char memblock[BLOCK_SIZE];
    size_t posf = 0;
    size_t sizeofhex = sizeof(hex) - 1;
    
    if (file.is_open()) {
        init = std::chrono::high_resolution_clock::now();
        do {
            file.read((char *)memblock, BLOCK_SIZE);
            if (file.eof()) {
                you_bet = true;
            }
            for (long int i = 0; i < file.gcount(); ++i) {
                if (memblock[i] == hex[0] && std::memcmp(&memblock[i], hex, sizeofhex) == 0) {
                    finish = std::chrono::high_resolution_clock::now();
                    found = true;
                    posf = i;
                }
            }
            file.seekg(-sizeof(hex), ios::cur); // prevent between two blocks signature
            ++numblocks;
        } while (!you_bet || !found);
    }

    auto res = std::chrono::duration_cast<std::chrono::milliseconds>(finish - init).count();

    if (found) {
        cout << "Yep! Found! Milliseconds: " << res
             << " at page " << (numblocks/BLOCK_SIZE)
             << " byte " << posf
             << ", total " << ((numblocks * BLOCK_SIZE) + posf)
             << endl;
    } else {
        cout << "Hmm... not found"  << endl;
    }
#else
    std::chrono::high_resolution_clock::time_point init;
    std::chrono::high_resolution_clock::time_point finish;

    ifstream file(dest_path, ios::binary);
    if (file.is_open()) {

        // check for size of the file
        file.seekg(0, ios::end);
        int iFileSize = file.tellg();

        file.seekg(0, ios::beg);

        init = std::chrono::high_resolution_clock::now();
        vector<char> memblock(iFileSize);
        file.read(((char*)memblock.data()), iFileSize); // 18ms alloc memory

        ostringstream ostrData; // 44ms read file
        // add to a total of 62ms
        // if consider the time need to translate all the memblock
        // then this will be long as hell
        // need to improve this
        for (size_t i = 0; i < memblock.size(); i++) {
            int z = memblock[i] & 0xff;
            ostrData << hex << setfill('0') << setw(2) << z;
        }

        string strDataHex = ostrData.str();
        string strHexSource = "0000";
        if (strDataHex.find(strHexSource) != string::npos) {
            // return 1, means there exits the signature in the file
            finish = std::chrono::high_resolution_clock::now();
            auto res = std::chrono::duration_cast<std::chrono::milliseconds>(finish - init).count();
            cout << "Yep! Found! Microseconds: " << res
                 << endl;
            return 1;
        }
        else {
            // return 0; means there isn't the signature in the file
            return 0;
        }

    }
#endif
    return 1;
}

推荐阅读