c++ - 如何快速读取文件以检查签名/幻数?
问题描述
我是一名学生,我对 C++ 和安全性还很陌生。我被分配了一项关于检查文件中的签名/幻数的任务,但我在加快阅读时间方面遇到了一些问题。
我的想法是使用 ifstream 以二进制模式读取文件,将其数据存储在向量中,然后将其转换为十六进制字符串。最后,我将检查给定的签名是否存在于十六进制字符串中。
理论上一切都很好,除了分配向量内存、读取和转换文件数据的整个过程需要很长时间。只有读取部分需要 44ms。
我想知道如何改善这一点?这是我的代码
UINT CheckForSignature(CString source, CString dest_path) {
// source is the HEX string need to find in file, dest_path is the destination of the file
ifstream file(dest_path, ios::binary);
if (file.is_open()) {
// check for size of the file
file.seekg(0, ios::end);
int iFileSize = file.tellg();
// if the file size exceed 50MB, pass
if (iFileSize > 50000000) {
// return -1, means file exceed 50MB, which do not need to be checked
return -1;
}
// read file and store data in hex string
file.seekg(0, ios::beg);
vector<char> memblock(iFileSize);
file.read(((char*)memblock.data()), iFileSize); // 18ms alloc memory
ostringstream ostrData; // 44ms read file
// add to a total of 62ms
// if consider the time need to translate all the memblock
// then this will be long as hell
// need to improve this
for (int i = 0; i < memblock.size(); i++) {
int z = memblock[i] & 0xff;
ostrData << hex << setfill('0') << setw(2) << z;
}
string strDataHex = ostrData.str();
string strHexSource = (CT2A)source;
if (strDataHex.find(strHexSource) != string::npos) {
// return 1, means there exits the signature in the file
return 1;
}
else {
// return 0; means there isn't the signature in the file
return 0;
}
}
}
我愿意接受有关解决方案和代码改进的所有帮助和建议。非常感谢!
解决方案
There are much more performant ways to read and examine file content.
Here I show one naive/simple way (just an example.)
I've created a 51M file with "0000" at the end (I've removed the size limit):
~/projects$ l data.bin
-rw-r--r-- 1 manuel manuel 51M jul 27 02:51 data.bin
(Showing last two lines.)
~/projects$ tail data.bin | hexdump
0000b80 11b9 dddd 8fe9 bab1 134d 5645 eb74 81ce
0000b90 3030 3030 000a
0000b95
Running your code (20 runs):
~/projects$ ./runtest.sh 131072 20
0 2360 1 2333 2 2355 3 2360 4 2349 5 2350 6 2353 7 2346 8 2342 9 2381 10 2378 11 2394 12 2338 13 2363 14 2392 15 2374 16 2365 17 2433 18 2426 19 2397
Average: 2369
Running my example (20 runs):
~/projects$ ./runtest.sh 131072 20 mio
0 105 1 103 2 104 3 104 4 104 5 105 6 104 7 104 8 104 9 102 10 102 11 104 12 104 13 103 14 102 15 103 16 103 17 105 18 104 19 104
Average: 103
With 5M file.
Yours:
~/projects$ ./runtest.sh 131072 20
0 238 1 243 2 244 3 242 4 243 5 244 6 239 7 245 8 243 9 246 10 239 11 246 12 243 13 242 14 240 15 243 16 242 17 245 18 240 19 243
Average: 242
Example:
~/projects$ ./runtest.sh 131072 20 mio
0 10 1 10 2 10 3 11 4 10 5 10 6 10 7 10 8 10 9 10 10 11 11 10 12 10 13 10 14 10 15 10 16 10 17 10 18 10 19 10
Average: 10
Script to compile and run (you can try several buffer sizes for my example):
#! /bin/bash
n=10
mio=""
bs=1024
if [ "$1" != "" ]
then
bs=$1
fi
if [ "$2" == "" ]
then
echo "Ups. Repeating? Will try with 10"
else
n=$2
fi
if [ "$3" != "" ]
then
mio="-DMIO"
fi
rm -f main
g++ -Wall -Wextra -g main.cc -o main -Wpedantic -std=c++2a -DBLOCK_SIZE=$bs $mio
tot=0
run=0
while [ "$run" != "$n" ]
do
text=$(./main)
mic=$(echo $text | cut - -d' ' -f 4)
echo -n "$run $mic "
tot=$(($tot + $mic))
run=$(($run + 1))
done
echo
tot=$(($tot / $run))
echo "Average: $tot"
int main()
{
string dest_path{"data.bin"};
const unsigned char hex[] = {0x30, 0x30, 0x30, 0x30, 0x00 }; // what to look for
#ifdef MIO
ifstream file(dest_path, ios::binary);
int numblocks = 0;
std::chrono::high_resolution_clock::time_point init;
std::chrono::high_resolution_clock::time_point finish;
bool found = false;
bool you_bet = false;
unsigned char memblock[BLOCK_SIZE];
size_t posf = 0;
size_t sizeofhex = sizeof(hex) - 1;
if (file.is_open()) {
init = std::chrono::high_resolution_clock::now();
do {
file.read((char *)memblock, BLOCK_SIZE);
if (file.eof()) {
you_bet = true;
}
for (long int i = 0; i < file.gcount(); ++i) {
if (memblock[i] == hex[0] && std::memcmp(&memblock[i], hex, sizeofhex) == 0) {
finish = std::chrono::high_resolution_clock::now();
found = true;
posf = i;
}
}
file.seekg(-sizeof(hex), ios::cur); // prevent between two blocks signature
++numblocks;
} while (!you_bet || !found);
}
auto res = std::chrono::duration_cast<std::chrono::milliseconds>(finish - init).count();
if (found) {
cout << "Yep! Found! Milliseconds: " << res
<< " at page " << (numblocks/BLOCK_SIZE)
<< " byte " << posf
<< ", total " << ((numblocks * BLOCK_SIZE) + posf)
<< endl;
} else {
cout << "Hmm... not found" << endl;
}
#else
std::chrono::high_resolution_clock::time_point init;
std::chrono::high_resolution_clock::time_point finish;
ifstream file(dest_path, ios::binary);
if (file.is_open()) {
// check for size of the file
file.seekg(0, ios::end);
int iFileSize = file.tellg();
file.seekg(0, ios::beg);
init = std::chrono::high_resolution_clock::now();
vector<char> memblock(iFileSize);
file.read(((char*)memblock.data()), iFileSize); // 18ms alloc memory
ostringstream ostrData; // 44ms read file
// add to a total of 62ms
// if consider the time need to translate all the memblock
// then this will be long as hell
// need to improve this
for (size_t i = 0; i < memblock.size(); i++) {
int z = memblock[i] & 0xff;
ostrData << hex << setfill('0') << setw(2) << z;
}
string strDataHex = ostrData.str();
string strHexSource = "0000";
if (strDataHex.find(strHexSource) != string::npos) {
// return 1, means there exits the signature in the file
finish = std::chrono::high_resolution_clock::now();
auto res = std::chrono::duration_cast<std::chrono::milliseconds>(finish - init).count();
cout << "Yep! Found! Microseconds: " << res
<< endl;
return 1;
}
else {
// return 0; means there isn't the signature in the file
return 0;
}
}
#endif
return 1;
}
推荐阅读
- javascript - React Javascript - 根据内部嵌套数组值对数组进行排序
- c# - 通过 RESTful WCF 服务进行连续计时并发 SQL 表插入速度减慢,直到客户端收到 HTTP 超时
- python - g++ 无法在 Windows 上构建自定义 TensorFlow GPU 操作(来自已安装的二进制文件)
- android - 将base64字符串发送到服务器不正确的大小
- c# - 什么是 /invokerPRAID?
- java - 将具有多个结构的 C 标头带到 Java
- javascript - 通过 JS 应用图像背景的问题
- android - 如何在 DistanceMatrixAPI 谷歌地图中发送多个请求
- php - 在 Woocommerce 中通过变体 ID 获取属性 slug 值
- notepad++ - Notepad ++:在父行中查找特定文本并替换