首页 > 解决方案 > Loading unicode json/serialized dictionary fails due to unicode error

问题描述

I'm given some input that I must parse and convert to a Dict. I don't control how the input is generated.

An example input is u'{u\'my_key\': u\'AB\\N\'}'. Notice the this should represent a serialized dictionary.

Parsing this dictionary string fails using a variety of methods. Using json.loads fails due to the structure of the string being malformed due to the nested u. Using ast.literal_eval fails with a (unicode error) 'unicodeescape' codec can't decode bytes in position 3-4: malformed \N character escape error.

I need to somehow sanitize the input so the \N won't be considered an ascii character when parsed with ast. Doing a simple replace('\\', '\\\\') seems error prone and probably has many edge cases.

Alternatively, I need a way to remove the u from the nested string so json.loads would work.

Thanks

标签: pythonjsonpython-2.7abstract-syntax-tree

解决方案


处理这种输入并不容易。事实上,我能找到的唯一解决方案是这个:

input_data = u'{u\'my_key\': u\'AB\\N\'}'

i = input_data\
    .replace('\'', '"')\
    .replace('u', '')\
    .replace('\\', '\\\\')

data = json.loads(i)
print(type(data))
# <type 'dict'>

它可能会解决您的具体示例,但我不鼓励在您的项目中使用它。

正如@snakecharmerb 所说,我还建议对输入执行某种策略并在发送之前验证json字符串,例如使用类似东西。


推荐阅读