首页 > 解决方案 > 清理不统一的短语列表

问题描述

我有一个看起来像这样的列表。

["['brill building pop",'quiet storm','ballad','easy listening',"motown'"," 'disco",'soul jazz',
'smooth jazz','soul','jazz','soft rock',"uk garage'"," 'chill-out",'german pop','salsa','r&b',
'chanson','rock',"pop'"," 'blues-rock",'vocal jazz','funk','oldies','pop rock',"downtempo'",
" 'hip hop",'classic rock','united states','germany',"adult contemporary'"," 'folk rock",'vocal',
'soundtrack','blues','female vocalist',"electronic'"," 'new wave",'urban','reggae','singer-songwriter',
 'swing','60s',"female'"," 'american",'80s','90s',"ambient']"]

它应该看起来像这样:

['brill building pop','quiet storm','ballad','easy listening','motown','disco','soul jazz',
'smooth jazz','soul','jazz','soft rock','uk garage','chill-out','german pop','salsa','r&b',
'chanson','rock','pop','blues-rock','vocal jazz','funk','oldies','pop rock','downtempo',
'hip hop','classic rock','united states','germany','adult contemporary','folk rock','vocal',
'soundtrack','blues','female vocalist','electronic','new wave','urban','reggae','singer-songwriter',
'swing','60s','female','american','80s','90s','ambient']

如您所见,有杂散的撇号、不完整的方括号、空格等。这些元素是短语,所以虽然我不想去掉单词中间的空格,但如果它们出现,我想删除它们开始或结束。是否有捷径可寻?

标签: pythondata-cleaning

解决方案


这种结构的方式,它已经是正确的列表,只是有很多额外的东西,所以你可以使用replace()and strip(),像这样:

zmod = [zz.replace('\'', '').replace('[', '').replace(']', '').strip() for zz in z]
zmod
['brill building pop',
 'quiet storm',
 'ballad',
 'easy listening',
...
 'american',
 '80s',
 '90s',
 'ambient']

当然有更短的正则表达式方法,但我发现这是最易读的。


推荐阅读