c++ - lzma totalread 比标头未压缩大小大 1
问题描述
我正在尝试使用 easylzma 库使用 lzma 解压缩文件,有些文件运行良好,但随机文件无法解压缩。
经过一些调试后,我发现总读取数比标头 uncompressedSize 大 1,并且标头流式传输为 0。
代码说没有页脚,但是当我从总读取中减去 1 以跳过错误时,文件被正确解压缩,但在文件末尾添加了一行,其中有几个字段为 0,单个字段有值。
这些文件是来自 dukascopy 的 .bi5。
我想确定错误是由于我使用的库中的一些错误逻辑引起的,还是文件错误,在这种情况下应该做什么。
使用的库是来自 github 的 easylzma-master 和 dukascopy-master,文件是从 dukascopy 服务器下载的。
正是 2020 年 9 月 30 日“9 月 8 日”的 13h_ticks.bi5 和 21_ticks.bi5 文件显示了这个问题。
更新:
我没有放代码,因为我现在正在询问指南,代码存在并且它显示了问题。但它是库代码。所以我想知道是否有人对 dukascopy bi5 类型的特定文件有同样的问题和这个 lzma 库。我现在只是在寻找一般规则“在 lzma 解压缩中,我们何时会得到总读取大于标头未压缩大小重复 1 的行为?这是否意味着有页脚,但在标头字节中未提及??”
更新:
这就是我打开文件的方式
int HTTPRequest::read_bi5_main(boost::filesystem::path p, ptime epoch)
{
boost::unique_lock<boost::mutex> read_bi5_to_bin_lock(mBOOST_LOGMutex,boost::defer_lock);
boost::unique_lock<boost::mutex> read_bi5_to_bin_lock2(m_read_bi5_to_binMutex, boost::defer_lock);
unsigned char *buffer;
size_t buffer_size;
int counter;
size_t raw_size = 0;
std::string filename_string = p.generic_string();
path p2 = p;
p2.replace_extension(".bin");
std::string filename_string_to_bin =p2.generic_string() ;
path p3 = p;
p3.replace_extension(".csv");
std::string filename_string_to_csv = p3.generic_string();
const char *filename = filename_string.c_str();
const char *filename_to_bin = filename_string_to_bin.c_str();
const char *filename_to_csv = filename_string_to_csv.c_str();
//22-9-2020 here I open the downloaded file if possible
if (fs::exists(p) && fs::is_regular(p))
{
buffer_size = fs::file_size(p);
buffer = new unsigned char[buffer_size];
}
else {
read_bi5_to_bin_lock.lock();
BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Error: couldn't access the data file. |"
<< filename << "|" << std::endl;
read_bi5_to_bin_lock.unlock();
return 2;
}
//22-9-2020 here I read the downloaded file into filestream
std::ifstream fin(filename, std::ifstream::binary);
fin.read(reinterpret_cast<char*>(buffer), buffer_size);
fin.close();
//22-9-2020 here I check if file is related to japanese yen so that I determine how to write its value
/*
if symbols_xxx has mHTTPRequest_Symbol_str then PV=0.001
else if symbols_xxxx has mHTTPRequest_Symbol_str then PV=0.0001
else if symbols_xxxx has mHTTPRequest_Symbol_str then PV=0.00001
*/
//28-9-2020 I will make 3 vectors in utils.h for 3,4,5 point value ,then I find symbol in vector,
//std::size_t pos = mHTTPRequest_Symbol_str.find("JPY");
double PV;
std::vector<std::string>::iterator it3 = std::find(point_value_xxx.begin(), point_value_xxx.end(), mHTTPRequest_Symbol_str);
std::vector<std::string>::iterator it4 = std::find(point_value_xxxx.begin(), point_value_xxxx.end(), mHTTPRequest_Symbol_str);
std::vector<std::string>::iterator it5 = std::find(point_value_xxxxx.begin(), point_value_xxxxx.end(), mHTTPRequest_Symbol_str);
if (it3 != point_value_xxx.end())
{
PV = 0.001;
}
else if (it4 != point_value_xxxx.end())
{
PV = 0.0001;
}
else if (it5 != point_value_xxxxx.end())
{
PV = 0.00001;
}
else
{
//10-1-2020throw;
PV = 0.001;
}
read_bi5_to_bin_lock2.lock();
unsigned char *data_bin_buffer = 0 ;
n47::tick_data *data = n47::read_bi5_to_bin(
buffer, buffer_size, epoch, PV, &raw_size, &data_bin_buffer);
//5-11-2020 here i will save binary file
std::string file_name_path_string=output_compressed_file_2(&data_bin_buffer, raw_size, filename_to_bin);
read_bi5_to_bin_lock2.unlock();
path file_name_path_2{ file_name_path_string };
buffer_size = 0;
if (fs::exists(file_name_path_2) && fs::is_regular(file_name_path_2))
{
read_bi5_to_bin_lock.lock();
BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << boost::this_thread::get_id() <<"\t we can access the data .bin file. |"
<< filename_to_bin << "| with size ="<< fs::file_size(file_name_path_2) << std::endl;
read_bi5_to_bin_lock.unlock();
}
else {
read_bi5_to_bin_lock.lock();
BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Error: couldn't access the data .bin file. |"
<< filename_to_bin << "|" << std::endl;
read_bi5_to_bin_lock.unlock();
return 2;
}
n47::tick_data_iterator iter;
//5-11-2020 here i will save file.csv from data which is pointer to vector to pointers to ticks
if (data == 0)
{
read_bi5_to_bin_lock.lock();
BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Failure: Failed to load the data!" << std::endl;
read_bi5_to_bin_lock.unlock();
}
//5-15-2020 take care that without else ,error happens with empty files because data is pointer to vector of pointers to ticks .so when data is made inside read_bi5 ,it is made as null pointer and later it is assigned to vector if file has ticks.if file does not have ticks ,then it is just returned as null pointer .so when dereferencing null pointer we got error
else if (data->size() != (raw_size / n47::ROW_SIZE))
{
read_bi5_to_bin_lock.lock();
BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Failure: Loaded " << data->size()
<< " ticks but file size indicates we should have loaded "
<< (raw_size / n47::ROW_SIZE) << std::endl;
read_bi5_to_bin_lock.unlock();
}
//22-9-2020 in last if and if else I checked if file is either empty or has error of data size So now I have good clean file to work with
//read_bi5_to_bin_lock.lock();
//BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "time, bid, bid_vol, ask, ask_vol" << std::endl;
//read_bi5_to_bin_lock.unlock();
counter = 0;
std::ofstream out_csv(filename_string_to_csv);
if (data == 0)
{
}
else if (data != 0)
{
for (iter = data->begin(); iter != data->end(); iter++) {
//5-11-2020 here i will save file.csv from data which is pointer to vector to pointers to ticks>>>>>>>here i should open file stream for output and save data to it
out_csv
//<< std::setfill('0')<<std::setw(sizeof((*iter)->epoch + (*iter)->td))<<std::fixed<<((*iter)->epoch + (*iter)->td) << ","
//<< std::setfill('0')<<std::setw(27)<<std::fixed<<((*iter)->epoch + (*iter)->td) << ","
<< std::setfill('0')<<((*iter)->epoch + (*iter)->td) << ","
<< std::setfill('0')<<std::setw(sizeof(*iter)->bid)<<std::fixed << (*iter)->bid << ","
<< std::setfill('0')<<std::setw(sizeof(*iter)->bidv)<<std::fixed << (*iter)->bidv << ","
<< std::setfill('0')<<std::setw(sizeof(*iter)->ask)<<std::fixed << (*iter)->ask << ","
<< std::setfill('0')<<std::setw(sizeof(*iter)->askv)<<std::fixed << (*iter)->askv << std::endl;
//??5-17-2020 isolate multithreaded error
/*
read_bi5_to_bin_lock.lock();
BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) <<
boost::this_thread::get_id() << "\t"<<((*iter)->epoch + (*iter)->td) << ", "
<< (*iter)->bid << ", " << (*iter)->bidv << ", "
<< (*iter)->ask << ", " << (*iter)->askv << std::endl;
BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) <<
boost::this_thread::get_id() << "\t"<< std::setfill('0')<< std::setw(sizeof((*iter)->epoch + (*iter)->td))<<((*iter)->epoch + (*iter)->td) << ","
<< std::setfill('0')<<std::setw(sizeof(*iter)->bid)<< (*iter)->bid << ","
<< std::setfill('0')<<std::setw(sizeof(*iter)->bidv)<< (*iter)->bidv << ","
<< std::setfill('0')<<std::setw(sizeof(*iter)->ask)<< (*iter)->ask << ","
<< std::setfill('0')<<std::setw(sizeof(*iter)->askv)<< (*iter)->askv << std::endl;
read_bi5_to_bin_lock.unlock();
*/
counter++;
}
////read_bi5_to_bin_lock.unlock();
}
out_csv.close();
//5-13-2020
//??5-17-2020 isolate multithreaded error
read_bi5_to_bin_lock.lock();
BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << ".end." << std::endl << std::endl
<< "From " << raw_size << " bytes we read " << counter
<< " records." << std::endl
<< raw_size << " / " << n47::ROW_SIZE << " = "
<< (raw_size / n47::ROW_SIZE) << std::endl;
read_bi5_to_bin_lock.unlock();
delete data;
delete[] buffer;
delete [] data_bin_buffer;
return 0;
}
这是我的杜高斯贝修改文件
//#include "stdafx.h"
/*
Copyright 2013 Michael O'Keeffe (a.k.a. ninety47).
This file is part of ninety47 Dukascopy toolbox.
The "ninety47 Dukascopy toolbox" is free software: you can redistribute it
and/or modify it under the terms of the GNU General Public License as
published by the Free Software Foundation, either version 3 of the License,
or any later version.
"ninety47 Dukascopy toolbox" is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public License along with
"ninety47 Dukascopy toolbox". If not, see <http://www.gnu.org/licenses/>.
*/
#include "ninety47/dukascopy.h"
#include <boost/date_time/posix_time/posix_time.hpp>
#include <algorithm>
#include <vector>
#include "ninety47/dukascopy/defs.h"
#include "ninety47/dukascopy/io.hpp"
#include "ninety47/dukascopy/lzma.h"
namespace n47 {
namespace pt = boost::posix_time;
tick *tickFromBuffer(
unsigned char *buffer, pt::ptime epoch, float digits, size_t offset) {
bytesTo<unsigned int, n47::BigEndian> bytesTo_unsigned;
bytesTo<float, n47::BigEndian> bytesTo_float;
unsigned int ts = bytesTo_unsigned(buffer + offset);
pt::time_duration ms = pt::millisec(ts);
unsigned int ofs = offset + sizeof(ts);
float ask = bytesTo_unsigned(buffer + ofs) * digits;
ofs += sizeof(ts);
float bid = bytesTo_unsigned(buffer + ofs) * digits;
ofs += sizeof(ts);
//28-9-2020 convert volume to million
float askv = bytesTo_float(buffer + ofs) *1000000;
ofs += sizeof(ts);
float bidv = bytesTo_float(buffer + ofs) *1000000;
return new tick(epoch, ms, ask, bid, askv, bidv);
}
tick_data* read_bin(
unsigned char *buffer, size_t buffer_size, pt::ptime epoch, float point_value) {
std::vector<tick*> *data = new std::vector<tick*>();
std::vector<tick*>::iterator iter;
std::size_t offset = 0;
while ( offset < buffer_size ) {
data->push_back(tickFromBuffer(buffer, epoch, point_value, offset));
offset += ROW_SIZE;
}
return data;
}
tick_data* read_bi5(
unsigned char *lzma_buffer, size_t lzma_buffer_size, pt::ptime epoch,
float point_value, size_t *bytes_read) {
tick_data *result = 0;
// decompress
int status;
unsigned char *buffer = n47::lzma::decompress(lzma_buffer,
lzma_buffer_size, &status, bytes_read);
//5-11-2020 here i will save binary file
if (status != N47_E_OK) {
bytes_read = 0;
} else {
// convert to tick data (with read_bin).
result = read_bin(buffer, *bytes_read, epoch, point_value);
delete [] buffer;
}
return result;
}
//5-11-2020
tick_data* read_bi5_to_bin(
unsigned char *lzma_buffer, size_t lzma_buffer_size, pt::ptime epoch,
float point_value, size_t *bytes_read, unsigned char** buffer_decompressed) {
tick_data *result = 0;
// decompress
int status;
*buffer_decompressed = n47::lzma::decompress(lzma_buffer,
lzma_buffer_size, &status, bytes_read);
if (status != N47_E_OK)
{
bytes_read = 0;
}
else {
// convert to tick data (with read_bin).
result = read_bin(*buffer_decompressed, *bytes_read, epoch, point_value);
//delete[] buffer;
}
return result;
}
tick_data* read(
const char *filename, pt::ptime epoch, float point_value, size_t *bytes_read) {
tick_data *result = 0;
size_t buffer_size = 0;
unsigned char *buffer = n47::io::loadToBuffer<unsigned char>(filename, &buffer_size);
if ( buffer != 0 ) {
if ( n47::lzma::bufferIsLZMA(buffer, buffer_size) ) {
result = read_bi5(buffer, buffer_size, epoch, point_value, bytes_read);
// Reading in as bi5 failed lets double check its not binary
// data in the buffer.
if (result == 0) {
result = read_bin(buffer, buffer_size, epoch, point_value);
}
} else {
result = read_bin(buffer, buffer_size, epoch, point_value);
*bytes_read = buffer_size;
}
delete [] buffer;
if (result != 0 && result->size() != (*bytes_read / n47::ROW_SIZE)) {
delete result;
result = 0;
}
}
return result;
}
} // namespace n47
解决方案
我深入研究了 lzma 工作的细节,这对我来说很重,所以我更改了库并使用了 7z cpp lzma 规范文件。
有用。
我认为这个问题与在 cpp 程序中使用 c 代码有关。
该库还声明它已针对 bsd 进行了测试,感谢您的帮助。任何具有相同案例的人下载 7zip 并使用 cpp 文件
推荐阅读
- javascript - 语法错误:标识符“i”已被声明为 JavaScript
- echarts - 在 echarts 中,有没有办法在没有格式化程序的情况下将 xAxis 上的格式“时间”设置为指定的时区?
- pandas - 使用 pandas groupby 绘制散点图进行跟踪
- visual-studio-code - 如何在 vscode 终端中使用 git bash 别名?
- android - Kotlin 片段:尝试在空对象引用上调用虚拟方法“java.lang.String android.content.Context.getPackageName()”
- sql - SQL 子查询行作为 GROUP BY 列
- git - 团队成员克隆远程存储库时使用 git flow init
- c# - 如何在 Xamarin 中将 SKImage 或 SKData 转换为 String 以便能够创建发布请求
- javascript - Javascript:如何过滤数组中的对象
- rust - 在 map 函数闭包中使用问号运算符的替代方法