首页 > 解决方案 > 摆脱空字符串列表

问题描述

我有一个包含空字符串列表的数据框:

df.Answers.head()
0    ['In next 3 months', 'In next 6 months', 'In n...
1    ["Doctor's availability in hotel", 'Ventilator...
2    ['Buffet breakfast with social distancing', 'B...
3    ['1', '2', '3', '4', '5', '6', '7', '8', '9', ...
4                                                 ['']

我想摆脱他们。所以我尝试了:

def remove_empty_arrays(answers):
...     if answers in [[''], ["'"], []]:
...         print("Got an empty bracket")
...         return None
... df.Answers.map(remove_empty_arrays)  

但它从来没有用过,我从来没有打印过任何确认匹配的消息。

更新

    df.Answers.apply(lambda x: [v for v in x if v not in ('', '"')])
    0       [[, ', I, n,  , n, e, x, t,  , 3,  , m, o, n, ...
    1       [[, D, o, c, t, o, r, ', s,  , a, v, a, i, l, ...
    2       [[, ', B, u, f, f, e, t,  , b, r, e, a, k, f, ...
    3       [[, ', 1, ', ,,  , ', 2, ', ,,  , ', 3, ', ,, ...
    4                                            [[, ', ', ]]

    def remove_empty_arrays(answers):
    ...     if answers in [["''"],['""']]:
    ...         print("Got an empty bracket")
    ...         return None
    ... df.Answers = df.Answers.map(remove_empty_arrays)    
    df.Answers.head()
    0    None
    1    None
    2    None
    3    None
    4    None
    Name: Answers, dtype: object

标签: arrayspython-3.xlistdata-cleaning

解决方案


您可以使用列表推导来过滤列表项:

例如:

df = pd.DataFrame({'Answers':[ ['"', 'a', 'b', 'c'], ['d', 'e', ''], []]})

        Answers
0  [", a, b, c]
1      [d, e, ]
2            []

然后:

df.Answers = df.Answers.apply(lambda x: [v for v in x if v not in ('', '"')])    
print(df)

印刷:

     Answers
0  [a, b, c]
1     [d, e]
2         []

编辑:如果值只是字符串,而不是字符串列表,您可以执行以下操作:

import requests
from bs4 import BeautifulSoup
from ast import literal_eval


df = pd.DataFrame({'Answers':[ '''['"', 'a', 'b', 'c']''', '''['d', 'e', '']''', '''[]''']})

df.Answers = df.Answers.apply(lambda x: [v for v in literal_eval(x) if v not in ('', '"')])
print(df)

印刷:

     Answers
0  [a, b, c]
1     [d, e]
2         []

注意:最好从源头解决这个问题 - 所以不要将字符串放在数据框中,而是在之前解析它们(或者不要将它们保存为字符串而是字符串列表)。


推荐阅读