python - 需要帮助抓取 WSJ Markets 数据
问题描述
我相对较新,正在尝试使用 Python 来抓取数据。这是我的代码:
import requests
import pandas as pd
from bs4 import BeautifulSoup
URL = 'https://www.wsj.com/market-data/stocks/asia?mod=md_usstk_view_asia'
HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36"
}
page = requests.get(URL, headers=HEADERS)
soup = BeautifulSoup(page.content, 'html.parser')
table = soup.find("table", attrs={"class": "WSJTables--table--1QzSOCfq"})
print(table)
我已经添加了标题,但是输出没有显示任何值。任何帮助将不胜感激,谢谢!
解决方案
您要查找的数据是通过 Ajax 从外部源加载的。您可以使用下一个示例如何使用requests
模块加载它:
import json
import requests
url = "https://www.wsj.com/market-data/stocks/asia"
params = {
"id": '{"application":"WSJ","instruments":[{"symbol":"INDEX/HK//HSI","name":"Hong Kong: Hang Seng"},{"symbol":"INDEX/JP//NIK","name":"Japan: Nikkei 225"},{"symbol":"INDEX/CN//SHCOMP","name":"China: Shanghai Composite"},{"symbol":"INDEX/IN//1","name":"India: S&P BSE Sensex"},{"symbol":"INDEX/AU//XJO","name":"Australia: S&P/ASX"},{"symbol":"INDEX/KR//SEU","name":"S. Korea: KOSPI"},{"symbol":"INDEX/US//GDOW","name":"Global Dow"},{"symbol":"FUTURE/US//DJIA FUTURES","name":"DJIA Futures"}]}',
"type": "mdc_quotes",
}
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0"
}
data = requests.get(url, params=params, headers=headers).json()
# uncomment to see all data:
# print(json.dumps(data, indent=4))
for instrument in data["data"]["instruments"]:
print(
"{:<30} {:<10}".format(
instrument["formattedName"], instrument["lastPrice"]
)
)
印刷:
Hong Kong: Hang Seng 28458.44
Japan: Nikkei 225 28317.83
China: Shanghai Composite 3486.56
India: S&P BSE Sensex 50540.48
Australia: S&P/ASX 7030.3
S. Korea: KOSPI 3156.42
Global Dow 4022.82
DJIA Futures 34208
将其加载为 panda 的 DataFrame:
df = pd.json_normalize(data["data"]["instruments"])
print(df)
印刷:
country dailyHigh dailyLow exchangeIsoCode formattedName lastPrice mantissa name priceChange percentChange requestSymbol ticker timestamp type url bluegrassChannel.channel bluegrassChannel.type
0 HK 28584.34 28286.92 XHKG Hong Kong: Hang Seng 28458.44 2 Hang Seng Index 8.15 0.03 INDEX/HK//HSI HSI 2021-05-21T16:08:32+08:00 Index https://www.wsj.com/market-data/quotes/index/H... /zigman2/quotes/210598030/delayed DelayedChannel
1 JP 28411.56 28193.03 XTKS Japan: Nikkei 225 28317.83 2 NIKKEI 225 Index 219.58 0.78 INDEX/JP//NIK NIK 2021-05-21T15:15:02+09:00 Index https://www.wsj.com/market-data/quotes/index/J... /zigman2/quotes/210597971/delayed DelayedChannel
2 CN 3518.38 3479.67 XSHG China: Shanghai Composite 3486.56 2 Shanghai Composite Index -20.39 -0.58 INDEX/CN//SHCOMP SHCOMP 2021-05-21T15:01:13+08:00 Index https://www.wsj.com/market-data/quotes/index/C... /zigman2/quotes/210598127/delayed DelayedChannel
3 IN 50591.12 49832.72 XBOM India: S&P BSE Sensex 50540.48 2 S&P BSE Sensex Index 975.62 1.97 INDEX/IN//1 1 2021-05-21T15:30:50+05:30 Index https://www.wsj.com/market-data/quotes/index/I... /zigman2/quotes/210597966/delayed DelayedChannel
4 AU 7056.40 6999.60 XASX Australia: S&P/ASX 7030.3 1 S&P/ASX 200 Benchmark Index 10.7 0.15 INDEX/AU//XJO XJO 2021-05-21T17:20:23+10:00 Index https://www.wsj.com/market-data/quotes/index/A... /zigman2/quotes/210598100/delayed DelayedChannel
5 KR 3198.01 3149.46 Korea Exchange S. Korea: KOSPI 3156.42 2 KOSPI Composite Index -5.86 -0.19 INDEX/KR//SEU 180721 2021-05-21T15:33:00+09:00 Index https://www.wsj.com/market-data/quotes/index/K... /zigman2/quotes/210598069/delayed DelayedChannel
6 US 4040.16 4012.70 S&P Dow Jones Global Dow 4022.82 2 Global Dow Realtime USD 7.66 0.19 INDEX/US//GDOW GDOW 2021-05-21T18:43:17-04:00 Index https://www.wsj.com/market-data/quotes/index/U... /zigman2/quotes/210599024/realtime DelayedChannel
7 US 34372.00 34017.00 XCBT DJIA Futures 34208 0 E-Mini Dow Continuous Contract 55 0.16 FUTURE/US//DJIA FUTURES YM00 2021-05-21T15:59:59.595-05:00 Future https://www.wsj.com/market-data/quotes/futures... /zigman2/quotes/210407078/delayed DelayedChannel
推荐阅读
- python - 在pytorch中优化输入而不是网络
- javascript - Howl 未定义 Vue.js 2
- android - 带有 compileSdkVersion 21 的 PeriodicWorkRequest
- excel - 设置动态打印区域
- django - 我不能在 Django 中发帖
- highcharts - Highcharts - 轴 minPadding 和 maxPadding 不起作用
- sql - Oracle 不显示/移动到视图中的结束字段
- entity-framework-core - EF Core - 与相同依赖实体类型的附加导航属性的一对多关系
- reactjs - 我需要每天发送一次博览会通知,但它会向同一用户发送许多推送
- c++ - 这个程序应该显示输入等于倒数