首页 > 解决方案 > 从字符串列表中提取不带任何空格的子字符串

问题描述

假设我有以下列表:

l1 = ['apples', ' bananas' , '  coconuts', '   dates figs guavas', 'lemons ', 'mangoes  ']

提取每个单词并丢弃多余空格的最佳方法是什么?

我追求的结果是:

l2 = ['apples', 'bananas', 'coconuts', 'dates', 'figs', 'guavas', 'lemons', 'mangoes']

到目前为止我尝试过的是:

    clean_l = []

    # Get rid of white spaces 
    for item in l1:
        clean = re.sub("(?m)^\s+", "", item)
        clean_l.append(clean)

但这会返回与l1.

标签: pythonstringlist

解决方案


采用:

l1 = ['apples', ' bananas' , '  coconuts', '   dates figs guavas', 'lemons ', 'mangoes  ']
res = [ei for e in l1 for ei in e.strip().split()]
print(res)

输出

['apples', 'bananas', 'coconuts', 'dates', 'figs', 'guavas', 'lemons', 'mangoes']

如果您坚持使用正则表达式,尽管我不建议针对这个特定问题使用它(请参阅此处),请使用:

import re

l1 = ['apples', ' bananas', '  coconuts', '   dates figs guavas', 'lemons ', 'mangoes  ']
res = [ei for e in l1 for ei in re.findall(r"\w+", e)]
print(res)

输出

['apples', 'bananas', 'coconuts', 'dates', 'figs', 'guavas', 'lemons', 'mangoes']

第三种选择(@WiktorStribiżew)是使用:

res = " ".join(l1).split()

计时

l1 = ['apples', ' bananas', '  coconuts', '   dates figs guavas', 'lemons ', 'mangoes  '] * 1000
import re
%timeit [ei for e in l1 for ei in e.strip().split()]
1.76 ms ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit " ".join(l1).split()
453 µs ± 3.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit [ei for e in l1 for ei in re.findall(r"\w+", e)]
7.77 ms ± 59.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

推荐阅读