python - 在for循环中将空字符串附加到列表中的最后一个值
问题描述
我正在尝试对网页进行网络抓取,同时我希望提取特定信息,例如位置名称、纬度、经度和电影名称。但是,在跨多个网页提取此信息时,我不确定前三个值属于哪部电影。
我想了一种方法来克服这个问题,方法是在每部电影的前三个值的所有值之后创建一个空字符串,然后当它到达一个空字符串时,我可以将它们拆分为每部电影的列表。
虽然,我在尝试正确获取空字符串时遇到了困难,这就是我所做的:
test = ['https://www.latlong.net/location/10-things-i-hate-about-you-locations-250',
'https://www.latlong.net/location/12-angry-men-locations-818',
'https://www.latlong.net/location/12-monkeys-locations-501']
for i in range(0, len(test), 1):
r = requests.get(test[i])
testone = {'location name':[],'film':[]}
soup = BeautifulSoup(r.content, 'lxml')
for th in soup.select("td"):
testone['location name'].append(th.text.strip())
testone['location name'].append('')
for h in soup.select_one("h3"):
testone['film'].append(h)
但是,这似乎在每个值之后附加了一个空字符串:
'location name': ["1117 Broadway (Gil's Music Shop)",
'',
'47.252495',
'',
'-122.439644',
'',
"2715 North Junett St (Kat and Bianca's House)",
'',
'47.272591',
'',
'-122.474480', ....
我的期望:
'location name': ["1117 Broadway (Gil's Music Shop)",
'47.252495',
'-122.439644',
"2715 North Junett St (Kat and Bianca's House)",
'47.272591',
'-122.474480',
'Aurora Bridge',
'47.646713',
'-122.347435',
'Buckaroo Tavern (closed)',
'47.657841',
'-122.350327',
'Century Ballroom',
'47.615028',
'-122.319855',
'Fremont Place Books (closed)',
'47.650452',
'-122.350510',
'Fremont Troll',
'47.651093',
'-122.347435',
'Gas Works Park',
'47.645561',
'-122.334496',
'Kerry Park',
'47.629402',
'-122.360008',
'Kingdome',
'47.595993',
'-122.333649',
'Paramount Theatre',
'47.613235',
'-122.331451',
'Seattle',
'47.601871',
'-122.341248',
'Stadium High School',
'47.265991',
'-122.448570',
'Tacoma',
'47.250828',
'-122.449135',
'',
'New York City',
'40.742298',
'-73.982559',
'New York County Courthouse',
'40.714310',
'-74.001930',
'', ................],
'film': ['10 Things I Hate About You Locations Map','12 Angry Men Locations Map'...]}
解决方案
使用extned()
代替append()
; 由于该strip()
函数返回 a list
,并且您想将列表的所有项目附加到testone['location name']
试试这个:
for i in range(0, len(test), 1):
r = requests.get(test[i])
testone = {'location name':[],'film':[]}
soup = BeautifulSoup(r.content, 'lxml')
for th in soup.select("td"):
testone['location name'].extend(th.text.strip())
# Do nothing
for h in soup.select_one("h3"):
testone['film'].append(h)
推荐阅读
- python - 检查两个数据框(数据透视表)的相似性
- scala - 斯卡拉去年
- postgresql - 在 Mac OS 上安装 Postgresql 时出现问题。重启系统后。创建菜单快捷方式时发生非致命错误
- javascript - 如何在 VuePress Vue 组件中动态加载 YAML 文件作为对象?
- java - 我将如何做到这一点,以便每当输入一个非整数值时,它会告诉用户输入一个整数值?Java - NetBeans
- powershell - 带有时间戳的 Powershell 数组
- angular - 从另一个组件访问一个组件的属性会导致角度未定义的错误
- javascript - 如何防止 React 生产中的缩小脚本?
- python - 如何基于多个子集在同一个图中绘制多条线
- reactjs - 何时使用 WordPress,何时使用单个 paga 应用程序框架?