首页 > 解决方案 > lzma totalread 比标头未压缩大小大 1

问题描述

我正在尝试使用 easylzma 库使用 lzma 解压缩文件,有些文件运行良好,但随机文件无法解压缩。
经过一些调试后,我发现总读取数比标头 uncompressedSize 大 1,并且标头流式传输为 0。
代码说没有页脚,但是当我从总读取中减去 1 以跳过错误时,文件被正确解压缩,但在文件末尾添加了一行,其中有几个字段为 0,单个字段有值。
这些文件是来自 dukascopy 的 .bi5。
我想确定错误是由于我使用的库中的一些错误逻辑引起的,还是文件错误,在这种情况下应该做什么。
使用的库是来自 github 的 easylzma-master 和 dukascopy-master,文件是从 dukascopy 服务器下载的。
正是 2020 年 9 月 30 日“9 月 8 日”的 13h_ticks.bi5 和 21_ticks.bi5 文件显示了这个问题。

更新:
我没有放代码,因为我现在正在询问指南,代码存在并且它显示了问题。但它是库代码。所以我想知道是否有人对 dukascopy bi5 类型的特定文件有同样的问题和这个 lzma 库。我现在只是在寻找一般规则“在 lzma 解压缩中,我们何时会得到总读取大于标头未压缩大小重复 1 的行为?这是否意味着有页脚,但在标头字节中未提及??”

更新:
这就是我打开文件的方式

int HTTPRequest::read_bi5_main(boost::filesystem::path p, ptime epoch)
{
    boost::unique_lock<boost::mutex> read_bi5_to_bin_lock(mBOOST_LOGMutex,boost::defer_lock);
    boost::unique_lock<boost::mutex> read_bi5_to_bin_lock2(m_read_bi5_to_binMutex, boost::defer_lock);

    unsigned char *buffer;
    size_t buffer_size;

    int counter;

    size_t raw_size = 0;

    std::string filename_string = p.generic_string();
    path p2 = p;
    p2.replace_extension(".bin");
    std::string filename_string_to_bin =p2.generic_string() ;

    path p3 = p;
    p3.replace_extension(".csv");
    std::string filename_string_to_csv = p3.generic_string();

    const char *filename = filename_string.c_str();
    const char *filename_to_bin = filename_string_to_bin.c_str();
    const char *filename_to_csv = filename_string_to_csv.c_str();

    //22-9-2020 here I open the downloaded file if possible
    if (fs::exists(p) && fs::is_regular(p))
    {
        buffer_size = fs::file_size(p);
        buffer = new unsigned char[buffer_size];
    }
    else {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Error: couldn't access the data file. |"
            << filename << "|" << std::endl;
        read_bi5_to_bin_lock.unlock();
        return 2;
    }

    //22-9-2020 here I read the downloaded file into filestream
    std::ifstream fin(filename, std::ifstream::binary);
    fin.read(reinterpret_cast<char*>(buffer), buffer_size);
    fin.close();

    //22-9-2020 here I check if file is related to japanese yen so that I determine how to write its value
    /*
    if symbols_xxx has mHTTPRequest_Symbol_str then PV=0.001
    else if symbols_xxxx has mHTTPRequest_Symbol_str then PV=0.0001
    else if symbols_xxxx has mHTTPRequest_Symbol_str then PV=0.00001
    */
    //28-9-2020 I will make 3 vectors in utils.h for 3,4,5 point value ,then I find symbol in vector,
    //std::size_t pos = mHTTPRequest_Symbol_str.find("JPY");

    double PV;

    std::vector<std::string>::iterator it3 = std::find(point_value_xxx.begin(), point_value_xxx.end(), mHTTPRequest_Symbol_str);

    std::vector<std::string>::iterator it4 = std::find(point_value_xxxx.begin(), point_value_xxxx.end(), mHTTPRequest_Symbol_str);

    std::vector<std::string>::iterator it5 = std::find(point_value_xxxxx.begin(), point_value_xxxxx.end(), mHTTPRequest_Symbol_str);
    if (it3 != point_value_xxx.end())
    {
        PV = 0.001;
    }
    else if (it4 != point_value_xxxx.end())
    {
        PV = 0.0001;
    }
    else if (it5 != point_value_xxxxx.end())
    {
        PV = 0.00001;
    }
    else
    {
        //10-1-2020throw;
        PV = 0.001;

    }
    read_bi5_to_bin_lock2.lock();
    unsigned char *data_bin_buffer = 0 ;
    n47::tick_data *data = n47::read_bi5_to_bin(
            buffer, buffer_size, epoch, PV, &raw_size, &data_bin_buffer);

    //5-11-2020 here i will save binary file
    std::string file_name_path_string=output_compressed_file_2(&data_bin_buffer, raw_size, filename_to_bin);
    read_bi5_to_bin_lock2.unlock();

    path file_name_path_2{ file_name_path_string };
    buffer_size = 0;
    if (fs::exists(file_name_path_2) && fs::is_regular(file_name_path_2))
    {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << boost::this_thread::get_id() <<"\t we can access the data .bin file. |"
            << filename_to_bin << "| with size ="<< fs::file_size(file_name_path_2) << std::endl;
        read_bi5_to_bin_lock.unlock();
    }
    else {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Error: couldn't access the data .bin file. |"
            << filename_to_bin << "|" << std::endl;
        read_bi5_to_bin_lock.unlock();
        return 2;
    }

    n47::tick_data_iterator iter;

    //5-11-2020 here i will save file.csv from data which is pointer to vector to pointers to ticks
    if (data == 0)
    {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Failure: Failed to load the data!" << std::endl;
        read_bi5_to_bin_lock.unlock();
    }
    //5-15-2020 take care that without else ,error happens with empty files because data is pointer to vector of pointers to ticks .so when data is made inside read_bi5 ,it is made as null pointer and later it is assigned to vector if file has ticks.if file does not have ticks ,then it is just returned as null pointer .so when dereferencing null pointer we got error
    else if (data->size() != (raw_size / n47::ROW_SIZE))
    {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Failure: Loaded " << data->size()
            << " ticks but file size indicates we should have loaded "
            << (raw_size / n47::ROW_SIZE) << std::endl;
        read_bi5_to_bin_lock.unlock();
    }
    //22-9-2020 in last if and if else I checked if file is either empty or has error of data size So now I have good clean file to work with
    //read_bi5_to_bin_lock.lock();
    //BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "time, bid, bid_vol, ask, ask_vol" << std::endl;
    //read_bi5_to_bin_lock.unlock();

    counter = 0;

    std::ofstream out_csv(filename_string_to_csv);
    if (data == 0)
    {

    }
    else if (data != 0)
    {
        for (iter = data->begin(); iter != data->end(); iter++) {
            //5-11-2020 here i will save file.csv from data which is pointer to vector to pointers to ticks>>>>>>>here i should open file stream for output and save data to it
            out_csv
            //<< std::setfill('0')<<std::setw(sizeof((*iter)->epoch + (*iter)->td))<<std::fixed<<((*iter)->epoch + (*iter)->td) << ","
            //<< std::setfill('0')<<std::setw(27)<<std::fixed<<((*iter)->epoch + (*iter)->td) << ","
            << std::setfill('0')<<((*iter)->epoch + (*iter)->td) << ","
            << std::setfill('0')<<std::setw(sizeof(*iter)->bid)<<std::fixed << (*iter)->bid << ","
            << std::setfill('0')<<std::setw(sizeof(*iter)->bidv)<<std::fixed << (*iter)->bidv << ","
            << std::setfill('0')<<std::setw(sizeof(*iter)->ask)<<std::fixed << (*iter)->ask << ","
            << std::setfill('0')<<std::setw(sizeof(*iter)->askv)<<std::fixed << (*iter)->askv << std::endl;
            //??5-17-2020 isolate multithreaded error
            /*
            read_bi5_to_bin_lock.lock();
            BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) <<
                boost::this_thread::get_id() << "\t"<<((*iter)->epoch + (*iter)->td) << ", "
                << (*iter)->bid << ", " << (*iter)->bidv << ", "
                << (*iter)->ask << ", " << (*iter)->askv << std::endl;
            BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) <<
                            boost::this_thread::get_id() << "\t"<< std::setfill('0')<< std::setw(sizeof((*iter)->epoch + (*iter)->td))<<((*iter)->epoch + (*iter)->td) << ","
                            << std::setfill('0')<<std::setw(sizeof(*iter)->bid)<< (*iter)->bid << ","
                            << std::setfill('0')<<std::setw(sizeof(*iter)->bidv)<< (*iter)->bidv << ","
                            << std::setfill('0')<<std::setw(sizeof(*iter)->ask)<< (*iter)->ask << ","
                            << std::setfill('0')<<std::setw(sizeof(*iter)->askv)<< (*iter)->askv << std::endl;
            read_bi5_to_bin_lock.unlock();
            */
            counter++;
        }
        ////read_bi5_to_bin_lock.unlock();

    }
    out_csv.close();
    //5-13-2020

    //??5-17-2020 isolate multithreaded error
    read_bi5_to_bin_lock.lock();

    BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << ".end." << std::endl << std::endl
        << "From " << raw_size << " bytes we read " << counter
        << " records." << std::endl
        << raw_size << " / " << n47::ROW_SIZE << " = "
        << (raw_size / n47::ROW_SIZE) << std::endl;
    read_bi5_to_bin_lock.unlock();


    delete data;
    delete[] buffer;
    delete [] data_bin_buffer;
    return 0;
}

这是我的杜高斯贝修改文件

//#include "stdafx.h"

/*
Copyright 2013 Michael O'Keeffe (a.k.a. ninety47).

This file is part of ninety47 Dukascopy toolbox.

The "ninety47 Dukascopy toolbox" is free software: you can redistribute it
and/or modify it under the terms of the GNU General Public License as
published by the Free Software Foundation, either version 3 of the License,
or any later version.

"ninety47 Dukascopy toolbox" is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General
Public License for more details.

You should have received a copy of the GNU General Public License along with
"ninety47 Dukascopy toolbox".  If not, see <http://www.gnu.org/licenses/>.
*/

#include "ninety47/dukascopy.h"
#include <boost/date_time/posix_time/posix_time.hpp>
#include <algorithm>
#include <vector>
#include "ninety47/dukascopy/defs.h"
#include "ninety47/dukascopy/io.hpp"
#include "ninety47/dukascopy/lzma.h"



namespace n47 {

namespace pt = boost::posix_time;


tick *tickFromBuffer(
        unsigned char *buffer, pt::ptime epoch, float digits, size_t offset) {
    bytesTo<unsigned int, n47::BigEndian> bytesTo_unsigned;
    bytesTo<float, n47::BigEndian> bytesTo_float;

    unsigned int ts = bytesTo_unsigned(buffer + offset);
    pt::time_duration ms = pt::millisec(ts);
    unsigned int ofs = offset + sizeof(ts);
    float ask = bytesTo_unsigned(buffer + ofs) * digits;
    ofs += sizeof(ts);
    float bid = bytesTo_unsigned(buffer + ofs) * digits;
    ofs += sizeof(ts);
    //28-9-2020 convert volume to million
    float askv = bytesTo_float(buffer + ofs) *1000000;
    ofs += sizeof(ts);
    float bidv = bytesTo_float(buffer + ofs) *1000000;

    return new tick(epoch, ms, ask, bid, askv, bidv);
}


tick_data* read_bin(
        unsigned char *buffer, size_t buffer_size, pt::ptime epoch, float point_value) {
    std::vector<tick*> *data = new std::vector<tick*>();
    std::vector<tick*>::iterator iter;

    std::size_t offset = 0;

    while ( offset < buffer_size ) {
        data->push_back(tickFromBuffer(buffer, epoch, point_value, offset));
        offset += ROW_SIZE;
    }

    return data;
}


tick_data* read_bi5(
        unsigned char *lzma_buffer, size_t lzma_buffer_size, pt::ptime epoch,
        float point_value, size_t *bytes_read) {
    tick_data *result = 0;

    // decompress
    int status;
    unsigned char *buffer = n47::lzma::decompress(lzma_buffer,
            lzma_buffer_size, &status, bytes_read);

    //5-11-2020 here i will save binary file


    if (status != N47_E_OK) {
        bytes_read = 0;
    } else {
        // convert to tick data (with read_bin).
        result = read_bin(buffer, *bytes_read, epoch, point_value);
        delete [] buffer;
    }

    return result;
}

//5-11-2020
tick_data* read_bi5_to_bin(
    unsigned char *lzma_buffer, size_t lzma_buffer_size, pt::ptime epoch,
    float point_value, size_t *bytes_read, unsigned char** buffer_decompressed) {
    tick_data *result = 0;

    // decompress
    int status;
    *buffer_decompressed = n47::lzma::decompress(lzma_buffer,
        lzma_buffer_size, &status, bytes_read);

    if (status != N47_E_OK) 
    {
        bytes_read = 0;
    }
    else {
        // convert to tick data (with read_bin).
        result = read_bin(*buffer_decompressed, *bytes_read, epoch, point_value);
        //delete[] buffer;
    }

    return result;
}


tick_data* read(
        const char *filename, pt::ptime epoch, float point_value, size_t *bytes_read) {
    tick_data *result = 0;
    size_t buffer_size = 0;
    unsigned char *buffer = n47::io::loadToBuffer<unsigned char>(filename, &buffer_size);

    if ( buffer != 0 ) {
        if ( n47::lzma::bufferIsLZMA(buffer, buffer_size) ) {
            result = read_bi5(buffer, buffer_size, epoch, point_value, bytes_read);
            // Reading in as bi5 failed lets double check its not binary
            // data in the buffer.
            if (result == 0) {
                result = read_bin(buffer, buffer_size, epoch, point_value);
            }
        } else {
            result = read_bin(buffer, buffer_size, epoch, point_value);
            *bytes_read = buffer_size;
        }
        delete [] buffer;

        if (result != 0 && result->size() != (*bytes_read / n47::ROW_SIZE)) {
            delete result;
            result = 0;
        }
    }
    return result;
}

}  // namespace n47

标签: c++lzma

解决方案


我深入研究了 lzma 工作的细节,这对我来说很重,所以我更改了库并使用了 7z cpp lzma 规范文件。
有用。
我认为这个问题与在 cpp 程序中使用 c 代码有关。
该库还声明它已针对 bsd 进行了测试,感谢您的帮助。任何具有相同案例的人下载 7zip 并使用 cpp 文件


推荐阅读