python - 如何一次打印 Beautiful Soup 的所有结果?
问题描述
我有一个推特用户名列表。我需要得到他们的追随者数量。我使用了BS和请求。但是,我每次只收到一个帐户。
from bs4 import BeautifulSoup
import requests
import pandas as pd
purcsv = pd.read_csv('pureeng.csv', engine= 'python')
followers = purcsv['username']
followers.head(10)
handle = purcsv['username'][0:40]
temp = ("https://twitter.com/"+handle)
temp = temp.tolist()
for url in temp:
page = requests.get(url)
bs = BeautifulSoup(page.text,'lxml')
follow_box = bs.find('li',{'class':'ProfileNav-item ProfileNav-item--followers'})
followers = follow_box.find('a').find('span',{'class':'ProfileNav-value'})
print("Number of followers: {} ".format(followers.get('data-count')))
解决方案
那是因为您首先遍历 url 并在page
此处获取相同变量中每个 url 的内容:
for url in temp:
page = requests.get(url)
因此页面将始终包含访问的最后一个 url 页面,要解决此问题,您需要在获取页面后处理
followers_list = []
for url in temp:
page = requests.get(url)
bs = BeautifulSoup(page.text, "html.parser")
follow_box = bs.find('li',{'class':'ProfileNav-item ProfileNav-item--followers'})
followers = follow_box.find('a').find('span',{'class':'ProfileNav-value'})
print("Number of followers: {} ".format(followers.get('data-count')))
followers_list.append(followers.get('data-count'))
print(followers_list)
这是一个完整的例子来验证
from bs4 import BeautifulSoup
import requests
import pandas as pd
purcsv = pd.read_csv('pureeng.csv')
followers = purcsv['username']
handles = purcsv['username'][0:40].tolist()
followers_list = []
for handle in handles:
url = "https://twitter.com/" + handle
try:
page = requests.get(url)
except Exception as e:
print(f"Failed to fetch page for url {url} due to: {e}")
continue
bs = BeautifulSoup(page.text, "html.parser")
follow_box = bs.find('li',{'class':'ProfileNav-item ProfileNav-item--followers'})
followers = follow_box.find('a').find('span',{'class':'ProfileNav-value'})
print("Number of followers: {} ".format(followers.get('data-count')))
followers_list.append(followers.get('data-count'))
print(followers_list)
输出:
Number of followers: 13714085
Number of followers: 4706511
['13714085', '4706511']
如果你有两个,你可以考虑使用async
函数来获取和处理这些 url。
推荐阅读
- android - Android 架构组件:ViewModel/Repository 与绑定到 Service/IntentService
- jquery - 使用 jQuery 选择具有通配符模式的属性名称存在的元素
- python - Python tkinter-删除列表框项和相应的列表项
- asp.net-core-2.0 - 在对象绑定的开头丢失 0
- facebook - 检测来自 Facebook Instant Games 的后台
- python - 在 Django 中存储测验答案信息的 Cookie
- c++ - 可以进行static_cast吗
从双,分配到双被优化掉? - asp.net - 处理数据库请求的正确方法
- java - 如何在java二叉搜索树中将值打印为长字符串?
- java - 在调用时需要但未调用