首页 > 解决方案 > 使用 python 从“Samuel L. JacksonJessica BielBrian Presley50 CentChristina RicciChad Michael Murray”中提取文本

问题描述

我有这样的字符串

"Samuel L. JacksonJessica BielBrian Presley50 CentChristina RicciChad Michael Murray"

我想要这样

Samuel L. Jackson,
Jessica Biel,
Brian Presley,
50 Cent,
Christina Ricci,
Chad Michael,
Murray,

使用蟒蛇

标签: pythonpython-3.xpandasbeautifulsoup

解决方案


在熊猫中,您可以这样做:

import pandas as pd

a= pd.Series("Samuel L. JacksonJessica BielBrian Presley50 CentChristina RicciChad Michael Murray").str.replace(r'([a-z])([A-Z0-9])', r'\1,\2')
a.to_list()[0]

# 'Samuel L. Jackson,Jessica Biel,Brian Presley,50 Cent,Christina Ricci,Chad Michael Murray' 

或者

a = pd.Series("Samuel L. JacksonJessica BielBrian Presley50 CentChristina RicciChad Michael Murray").str.replace(r'([a-z])([A-Z0-9])', r'\1,\n\2')                                              

print(a.to_list()[0])  

输出

Samuel L. Jackson,
Jessica Biel,
Brian Presley,
50 Cent,
Christina Ricci,
Chad Michael Murray

是不是这个意思:

import requests
import csv
from bs4 import BeautifulSoup

link='https://en.wikipedia.org/wiki/Home_of_the_Brave_(2006_film)'

result1 = requests.get(link)
src1 = result1.content
soup = BeautifulSoup(src1,'lxml')
table = soup.find_all('ul')[3]
names = table.find_all('a')
for item in names:
   print(item.text)

输出:

Samuel L. Jackson
Jessica Biel
Brian Presley
50 Cent
Chad Michael Murray
Christina Ricci
Victoria Rowell
Vyto Ruginis

推荐阅读