首页 > 解决方案 > 需要帮助抓取 WSJ Markets 数据

问题描述

我相对较新,正在尝试使用 Python 来抓取数据。这是我的代码:

import requests
import pandas as pd
from bs4 import BeautifulSoup

URL = 'https://www.wsj.com/market-data/stocks/asia?mod=md_usstk_view_asia'

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36"
}

page = requests.get(URL, headers=HEADERS)
soup = BeautifulSoup(page.content, 'html.parser')

table = soup.find("table", attrs={"class": "WSJTables--table--1QzSOCfq"})
print(table)

我已经添加了标题,但是输出没有显示任何值。任何帮助将不胜感激,谢谢!

标签: pythonweb-scraping

解决方案


您要查找的数据是通过 Ajax 从外部源加载的。您可以使用下一个示例如何使用requests模块加载它:

import json
import requests

url = "https://www.wsj.com/market-data/stocks/asia"
params = {
    "id": '{"application":"WSJ","instruments":[{"symbol":"INDEX/HK//HSI","name":"Hong Kong: Hang Seng"},{"symbol":"INDEX/JP//NIK","name":"Japan: Nikkei 225"},{"symbol":"INDEX/CN//SHCOMP","name":"China: Shanghai Composite"},{"symbol":"INDEX/IN//1","name":"India: S&P BSE Sensex"},{"symbol":"INDEX/AU//XJO","name":"Australia: S&P/ASX"},{"symbol":"INDEX/KR//SEU","name":"S. Korea: KOSPI"},{"symbol":"INDEX/US//GDOW","name":"Global Dow"},{"symbol":"FUTURE/US//DJIA FUTURES","name":"DJIA Futures"}]}',
    "type": "mdc_quotes",
}
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0"
}

data = requests.get(url, params=params, headers=headers).json()

# uncomment to see all data:
# print(json.dumps(data, indent=4))

for instrument in data["data"]["instruments"]:
    print(
        "{:<30} {:<10}".format(
            instrument["formattedName"], instrument["lastPrice"]
        )
    )

印刷:

Hong Kong: Hang Seng           28458.44  
Japan: Nikkei 225              28317.83  
China: Shanghai Composite      3486.56   
India: S&P BSE Sensex          50540.48  
Australia: S&P/ASX             7030.3    
S. Korea: KOSPI                3156.42   
Global Dow                     4022.82   
DJIA Futures                   34208     

将其加载为 panda 的 DataFrame:

df = pd.json_normalize(data["data"]["instruments"])
print(df)

印刷:

  country  dailyHigh  dailyLow exchangeIsoCode              formattedName lastPrice  mantissa                            name priceChange percentChange            requestSymbol  ticker                      timestamp    type                                                url            bluegrassChannel.channel bluegrassChannel.type
0      HK   28584.34  28286.92            XHKG       Hong Kong: Hang Seng  28458.44         2                 Hang Seng Index        8.15          0.03            INDEX/HK//HSI     HSI      2021-05-21T16:08:32+08:00   Index  https://www.wsj.com/market-data/quotes/index/H...   /zigman2/quotes/210598030/delayed        DelayedChannel
1      JP   28411.56  28193.03            XTKS          Japan: Nikkei 225  28317.83         2                NIKKEI 225 Index      219.58          0.78            INDEX/JP//NIK     NIK      2021-05-21T15:15:02+09:00   Index  https://www.wsj.com/market-data/quotes/index/J...   /zigman2/quotes/210597971/delayed        DelayedChannel
2      CN    3518.38   3479.67            XSHG  China: Shanghai Composite   3486.56         2        Shanghai Composite Index      -20.39         -0.58         INDEX/CN//SHCOMP  SHCOMP      2021-05-21T15:01:13+08:00   Index  https://www.wsj.com/market-data/quotes/index/C...   /zigman2/quotes/210598127/delayed        DelayedChannel
3      IN   50591.12  49832.72            XBOM      India: S&P BSE Sensex  50540.48         2            S&P BSE Sensex Index      975.62          1.97              INDEX/IN//1       1      2021-05-21T15:30:50+05:30   Index  https://www.wsj.com/market-data/quotes/index/I...   /zigman2/quotes/210597966/delayed        DelayedChannel
4      AU    7056.40   6999.60            XASX         Australia: S&P/ASX    7030.3         1     S&P/ASX 200 Benchmark Index        10.7          0.15            INDEX/AU//XJO     XJO      2021-05-21T17:20:23+10:00   Index  https://www.wsj.com/market-data/quotes/index/A...   /zigman2/quotes/210598100/delayed        DelayedChannel
5      KR    3198.01   3149.46  Korea Exchange            S. Korea: KOSPI   3156.42         2           KOSPI Composite Index       -5.86         -0.19            INDEX/KR//SEU  180721      2021-05-21T15:33:00+09:00   Index  https://www.wsj.com/market-data/quotes/index/K...   /zigman2/quotes/210598069/delayed        DelayedChannel
6      US    4040.16   4012.70   S&P Dow Jones                 Global Dow   4022.82         2         Global Dow Realtime USD        7.66          0.19           INDEX/US//GDOW    GDOW      2021-05-21T18:43:17-04:00   Index  https://www.wsj.com/market-data/quotes/index/U...  /zigman2/quotes/210599024/realtime        DelayedChannel
7      US   34372.00  34017.00            XCBT               DJIA Futures     34208         0  E-Mini Dow Continuous Contract          55          0.16  FUTURE/US//DJIA FUTURES    YM00  2021-05-21T15:59:59.595-05:00  Future  https://www.wsj.com/market-data/quotes/futures...   /zigman2/quotes/210407078/delayed        DelayedChannel

推荐阅读