首页 > 解决方案 > 尝试读取 json 文件时编码不切换

问题描述

我有一个file.json编码为 KOI8-R 的 json 文件。

Boost Json 仅适用于 UTF-8 编码,因此我将文件从 KOI8-R 转换为 UTF-8:

boost::property_tree::ptree tree;

std::locale loc = boost::locale::generator().generate(ru_RU.UTF-8);
std::ifstream ifs("file.json", std::ios::binary);
ifs.imbue(loc)

boost::property_tree::read_json(ifs, tree);

但是,无法读取文件..我做错了什么?

更新:

我制作了一个 JSON 文件“test.txt”:

{
    "соплодие": "лысеющий",
    "обсчитавший": "перегнавший",
    "кариозный": "отдёргивающийся",
    "суверенен": "носившийся",
    "рецидивизм": "поляризуются"
}

并将其保存在 koi8-r 中。

我有一个代码:

#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/json_parser.hpp>

int main() {
    boost::property_tree::ptree pt;
    boost::property_tree::read_json("test.txt", pt);
}

编译,运行并得到以下错误:

terminate called after throwing an instance of 'boost::wrapexcept<boost::property_tree::json_parser::json_parser_error>'
  what():  test.txt(2): invalid code sequence
Aborted (core dumped)

然后我使用 boost 语言环境:

#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/json_parser.hpp>

#include <boost/locale/generator.hpp>
#include <boost/locale/encoding.hpp>


int main() {
    std::locale loc = boost::locale::generator().generate("ru_RU.utf8");
    std::ifstream ifs("test.txt", std::ios::binary);
    ifs.imbue(loc);
    
    boost::property_tree::ptree pt;
    boost::property_tree::read_json(ifs, pt);
}

编译(g++ main.cpp -lboost_locale),运行并得到以下错误:

terminate called after throwing an instance of 'boost::wrapexcept<boost::property_tree::json_parser::json_parser_error>'
  what():  <unspecified file>(2): invalid code sequence
Aborted (core dumped)

标签: c++boostlocaleboost-propertytreeboost-locale

解决方案


JSON 规范需要 UTF8

8.1。字符编码

 JSON text exchanged between systems that are not part of a closed
 ecosystem MUST be encoded using UTF-8 [RFC3629].

通用库只支持它是有意义的。有关更多上下文,请参见此处:JSON 字符编码 - 浏览器是否很好地支持 UTF-8 或者我应该使用数字转义序列?

无论如何怎么做

也许使用 libiconv 或 libicu,Boost 语言环境支持后者。

使用 Boost 语言环境/ICU

这要求您的库是在 ICU 支持下构建的,并且也许(?)您具有所需的语言环境,这很可能已经在您的系统上。

它还假设源代码采用 UTF8 编码,这也是可能的。

Live On 编译器资源管理器

#include <boost/locale.hpp>
#include <boost/locale/conversion.hpp>
#include <boost/json.hpp>
#include <boost/json/src.hpp>
#include <iostream>
#include <fstream>

namespace json = boost::json;

int main() {
    std::string koi8r = [] {
        std::ifstream ifs("input.txt", std::ios::binary);
        return std::string(std::istream_iterator<char>(ifs), {});
    }();

    json::value doc =
        json::parse(boost::locale::conv::to_utf<char>(koi8r, "KOI8-R"));

    std::cout << "Serialized back: " << doc << "\n";

    std::cout << "Extracting a single key: " << doc.as_object()["соплодие"] << "\n";
}

我组成了一个随机的 JSON:

{
    "соплодие": "лысеющий",
    "обсчитавший": "перегнавший",
    "кариозный": "отдёргивающийся",
    "суверенен": "носившийся",
    "рецидивизм": "поляризуются"
}

并在 koi8-r 中保存为"input.txt"

00000000: 7b0a 2020 2020 22d3 cfd0 cccf c4c9 c522  {.    "........"
00000010: 3a20 22cc d9d3 c5c0 ddc9 ca22 2c0a 2020  : "........",.  
00000020: 2020 22cf c2d3 dec9 d4c1 d7db c9ca 223a    "...........":
00000030: 2022 d0c5 d2c5 c7ce c1d7 dbc9 ca22 2c0a   "...........",.
00000040: 2020 2020 22cb c1d2 c9cf dace d9ca 223a      ".........":
00000050: 2022 cfd4 c4a3 d2c7 c9d7 c1c0 ddc9 cad3   "..............
00000060: d122 2c0a 2020 2020 22d3 d5d7 c5d2 c5ce  .",.    ".......
00000070: c5ce 223a 2022 cecf d3c9 d7db c9ca d3d1  ..": "..........
00000080: 222c 0a20 2020 2022 d2c5 c3c9 c4c9 d7c9  ",.    "........
00000090: dacd 223a 2022 d0cf ccd1 d2c9 dad5 c0d4  ..": "..........
000000a0: d3d1 220a 7d0a                           ..".}.

现在运行该程序显示:

Serialized back: {"соплодие":"лысеющий","обсчитавший":"перегнавший","кариозный":"отдёргивающий
ся","суверенен":"носившийся","рецидивизм":"поляризуются"}
Extracting a single key: "лысеющий"

推荐阅读