首页 > 解决方案 > 将 beautifulsoup 网页抓取保存到 json

问题描述

python noob here,我已经设法从 Wikipedia 中抓取了公司列表,如何将输出保存为 JSON 文件?

import requests
from bs4 import BeautifulSoup
import JSON

url = "https://en.wikipedia.org/wiki/List_of_companies_traded_on_the_JSE"
responce = requests.get(url)
soup = BeautifulSoup(responce.text, 'html.parser')
tables = soup.findAll('table', {'class':"wikitable sortable"})

for table in soup.find_all('table', {'class':"wikitable sortable"}):
         print(table.text

标签: pythonjsonpython-3.xbeautifulsoup

解决方案


用这个:

import requests
from bs4 import BeautifulSoup
import json

url = "https://en.wikipedia.org/wiki/List_of_companies_traded_on_the_JSE"
responce = requests.get(url)
soup = BeautifulSoup(responce.text, 'html.parser')
table = soup.findAll('table', {'class':"wikitable sortable"})
tables = [str(x.text) for x in table]
json_text = json.dumps(tables)

with open('companies.json', 'w') as json_file:
    json_file.write(json_text)

这应该够了吧。虽然我不确定你将如何处理它,因为这是表中所有数据的列表。


推荐阅读