首页 > 解决方案 > 使用python的split函数后奇怪的解码(例如:\x00)

问题描述

这是一个非常奇怪的情况,split 函数正在改变字符串格式。请看下面的代码,

代码:

COM_Port = serial.Serial(COM_PortName)
with COM_Port as port:
    while True:
         RxedData = port.readline()
         line = RxedData.decode('utf-8')
         print("Line 1: ", line)
         row = line.split(',')[1:-1]
         print("Line 2: ", row)

输出:

Line 1: "* , 0 0 0 0 0 5 7 5 , 2 3 : 0 3 : 4 7 , 1 1 / 0 2 / 2 0 , 1 2 . 3 4 5 , K P A , 0 0 0 0 6 . 8 3 , S L P M , T B ,                 , $ "

Line 2: ['\x000\x000\x000\x000\x000\x006\x002\x001\x00', '\x002\x000\x00:\x004\x006\x00:\x005\x001\x00', '\x001\x002\x00/\x000\x002\x00/\x002\x000\x00', '\x001\x002\x00.\x003\x004\x005\x00', '\x00K\x00P\x00A\x00', '\x000\x000\x000\x000\x000\x00.\x000\x000\x00', '\x00C\x00C\x00P\x00M\x00', '\x00T\x00G\x00', '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']

怎么Line 2进去的\x000\x000...?这种编码格式是什么?如何将其转换为正确的格式?

编辑1:

print([hex(i) for i in RxedData])

输出:

['0x2a', '0x0', '0x2c', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x31', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x31', '0x0', '0x3a', '0x0', '0x35', '0x0', '0x31', '0x0', '0x3a', '0x0', '0x35', '0x0', '0x30', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x33', '0x0', '0x2f', '0x0', '0x30', '0x0', '0x32', '0x0', '0x2f', '0x0', '0x32', '0x0', '0x30', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x32', '0x0', '0x2e', '0x0', '0x33', '0x0', '0x34', '0x0', '0x35', '0x0', '0x2c', '0x0', '0x4b', '0x0', '0x50', '0x0', '0x41', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x32', '0x0', '0x33', '0x0', '0x34', '0x0', '0x35', '0x0', '0x2e', '0x0', '0x36', '0x0', '0x36', '0x0', '0x2c', '0x0', '0x53', '0x0', '0x4c', '0x0', '0x50', '0x0', '0x48', '0x0', '0x2c', '0x0', '0x0', '0x0', '0x0', '0x0', '0x2c', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2c', '0x0', '0x24', '0x0', '0xa']

标签: pythonpython-3.xencodingdecoding

解决方案


好的,从接收到的字节的 hexdump 来看,似乎每个ASCII字符后跟一个 NULL 字节 ( \x00)。这只是字符的 UTF-16-LE 表示。UTF-8 解码只保留初始字节的代码点,因为所有字节都低于 128,留下所有交错的空值。而且您不能简单地将字节字符串解码为 UTF-16(实际上是什么),因为您通过 a 获得了它,readline它刚刚在换行符之后停止并且没有读取下面的 null 。

如果您可以阅读另一行,它可能会以该空字符开头,使该行显示为 UTF-16-BE 编码...

那可以做什么呢?

一个简单的解决方法就是去掉空字符。如果你可以确定你只会得到普通的 ASCII 字符(没有重音符号é,没有表情符号,没有希腊语或西里尔字母等),这就足够了:

     RxedData = port.readline()
     line = RxedData.replace(b'\x00', b'').decode('ascii')
     print("Line 1: ", line)
     row = line.split(',')[1:-1]
     print("Line 2: ", row)

使用该值:['0x2a', '0x0', '0x2c', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x31', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x31', '0x0', '0x3a', '0x0', '0x35', '0x0', '0x31', '0x0', '0x3a', '0x0', '0x35', '0x0', '0x30', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x33', '0x0', '0x2f', '0x0', '0x30', '0x0', '0x32', '0x0', '0x2f', '0x0', '0x32', '0x0', '0x30', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x32', '0x0', '0x2e', '0x0', '0x33', '0x0', '0x34', '0x0', '0x35', '0x0', '0x2c', '0x0', '0x4b', '0x0', '0x50', '0x0', '0x41', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x32', '0x0', '0x33', '0x0', '0x34', '0x0', '0x35', '0x0', '0x2e', '0x0', '0x36', '0x0', '0x36', '0x0', '0x2c', '0x0', '0x53', '0x0', '0x4c', '0x0', '0x50', '0x0', '0x48', '0x0', '0x2c', '0x0', '0x0', '0x0', '0x0', '0x0', '0x2c', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2c', '0x0', '0x24', '0x0', '0xa'],您应该获得:

Line 1:  *,00000001,11:51:50,13/02/20,12.345,KPA,12345.66,SLPH,,--------,$

Line 2:  ['00000001', '11:51:50', '13/02/20', '12.345', 'KPA', '12345.66', 'SLPH', '', '--------']

它的好处是简单而健壮,只要您只有纯 ASCII


编码一致等待将是在串行端口周围使用 TextIOWrapper,并在其中指定 UTF-16-LE 编码。我无法测试它(我的盒子上没有序列号,也不需要它)所以只能猜测应该做什么。

COM_Port = serial.Serial(COM_PortName)
with io.TextIOWrapper(io.BufferedRWPair(COM_Port, COM_Port), encoding = 'utf-16-le') as port:
    while True:
         line = port.readline()
         print("Line 1: ", line)
         row = line.split(',')[1:-1]
         print("Line 2: ", row)

在这里,TextIOWrapper 将处理换行字节之后的空字节,并直接为您提供真正的 unicode 字符串。


推荐阅读