c++ - C++ Fast way to load large txt file in vector
问题描述
I have a file ~12.000.000 hex lines and 1,6GB Example of file:
999CBA166262923D53D3EFA72F5C4E8EE1E1FF1E7E33C42D0CE8B73604034580F2
889CBA166262923D53D3EFA72F5C4E8EE1E1FF1E7E33C42D0CE8B73604034580F2
Example of code:
vector<string> buffer;
ifstream fe1("strings.txt");
string line1;
while (getline(fe1, line1)) {
buffer.push_back(line1);
}
Now the loading takes about 20 minutes. Any suggestions how to speed up? Thanks a lot in advance.
解决方案
Loading a large text file into std::vector<std::string>
is rather inefficient and wasteful because it allocates heap memory for each std::string
and re-allocates the vector multiple times. Each of these heap allocations requires heap book-keeping information under the hood (normally 8 bytes per allocation on a 64-bit system), and each line requires an std::string
object (8-32 bytes depending on the standard library), so that a file loaded this way takes a lot more space in RAM than on disk.
One fast way is to map the file into memory and implement iterators to walk over lines in it. This sidesteps the issues mentioned above.
Working example:
#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/mapped_region.hpp>
#include <boost/iterator/iterator_facade.hpp>
#include <boost/range/iterator_range_core.hpp>
#include <iostream>
class LineIterator
: public boost::iterator_facade<
LineIterator,
boost::iterator_range<char const*>,
boost::iterators::forward_traversal_tag,
boost::iterator_range<char const*>
>
{
char const *p_, *q_;
boost::iterator_range<char const*> dereference() const { return {p_, this->next()}; }
bool equal(LineIterator b) const { return p_ == b.p_; }
void increment() { p_ = this->next(); }
char const* next() const { auto p = std::find(p_, q_, '\n'); return p + (p != q_); }
friend class boost::iterator_core_access;
public:
LineIterator(char const* begin, char const* end) : p_(begin), q_(end) {}
};
inline boost::iterator_range<LineIterator> crange(boost::interprocess::mapped_region const& r) {
auto p = static_cast<char const*>(r.get_address());
auto q = p + r.get_size();
return {LineIterator{p, q}, LineIterator{q, q}};
}
inline std::ostream& operator<<(std::ostream& s, boost::iterator_range<char const*> const& line) {
return s.write(line.begin(), line.size());
}
int main() {
boost::interprocess::file_mapping file("/usr/include/gnu-versions.h", boost::interprocess::read_only);
boost::interprocess::mapped_region memory(file, boost::interprocess::read_only);
unsigned n = 0;
for(auto line : crange(memory))
std::cout << n++ << ' ' << line;
}
推荐阅读
- javascript - Firebase 云存储/功能 - 错误:上传图片时超出配额
- java - 如何使用休眠填充 JComboBox?
- android-studio - 适用于 Android 的 PJSIP (2019)
- php - 'new'关键字如何与已经包含'对象标识符'的'对象变量'一起使用到某个类的一些已经存在的'对象'?
- javascript - 如何在 Firebase 云函数之间传递数据
- python-3.x - 我无法使我的数据集适应 VGG-net,大小不匹配
- php - SQL 语法错误:更新查询
- javascript - 移动设备上的 React useEffect 挂钩未清除 setTimeout
- javascript - 在动态创建的多个文件中显示文件名
- asp.net-mvc - 将参数值从动作结果传递到同一控制器中的另一个