首页 > 解决方案 > 如何从 Python 2.7 中的 unicode 字符串中删除 \r、\n、\t

问题描述

我有一些充满烦人的转义字符的抓取数据:

{"website": "http://www.zebrawebworks.com/zebra/bluetavern/day.cfm?&year=2018&month=7&day=10", "headliner": ["\"Roda Vibe\" with the Tallahassee Choro Society"], "data": [" \r\n    ", "\r\n\t\r\n\r\n\t", "\r\n\t\r\n\t\r\n\t", "\r\n\t", "\r\n\t", "\r\n\t", "8:00 PM", "\r\n\t\r\n\tFEE:  $2 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ", "\r\n\tEvery 2nd & 4th Tuesday of the month, the Choro Society returns to Blue Tavern with that subtly infectious Brazilian rhythm and beautiful melodies that will stay with you for days. The perfect antidote to Taylor Swift. $2 for musicians; tips appreciated. ", "\r\n\t", "\r\n\t\r\n\t", "\r\n\t", "\r\n\t", "\r\n\t\r\n\t\r\n\r\n\t\r\n\t", "\r\n\t\r\n\t\t", "\r\n", "\r\n", "\r\n", "\r\n"]},

我正在尝试编写一个函数来删除这些字符,但我的两种策略都不起作用:

    # strategy 1
    escapes = ''.join([chr(char) for char in range(1, 32)])
    table = {ord(char): None for char in escapes}
    for item in concert['data']:
        item = item.translate(table)
    # strategy 2
    for item in concert['data']:
        for char in item:
            char = char.replace("\r", "").replace("\t", "").replace("\n", "")

为什么我的数据仍然充满了我尝试了两种不同方法来删除的转义字符?

标签: pythonunicodehtml-escape-characters

解决方案


考虑以下:

lst = ["aaa", "abc", "def"]

for x in lst:
    x = x.replace("a","z")

print(lst)  # ['aaa', 'abc', 'def']

列表似乎没有改变。它是(不变的)。(重新)分配给 for 循环 ( x)中使用的变量在循环工作,但更改永远不会传播回lst.

反而:

for (i,x) in enumerate(lst):
    lst[i] = x.replace("a","z")

print(lst)  # ['zzz', 'zbc', 'def']

或者

for i in range(len(lst)):
    lst[i] = lst[i].replace("a","z")

print(lst)  # ['zzz', 'zbc', 'def']

编辑

由于您使用的是 assignment ( )x = ...,因此您必须使用类似.lst[i] = ...

对于不可变类型(包括字符串),这确实是您唯一的选择。 x.replace("a","z")不会改变x,它会返回一个带有指定替换的新字符串。

使用可变类型(例如列表),您可以对 iterand (?) 对象执行就地修改—— xin for x in lst:

所以类似下面的内容将看到x传播到的更改lst

lst = [[1],[2],[3]]

for x in lst:
    x.append('added')  # Example of in-place modification

print(lst)  # [[1, 'added'], [2, 'added'], [3, 'added']]

As x.append()(不像str.replace())确实改变了x对象。


推荐阅读