首页 > 解决方案 > Web 抓取表到 Pandas 数据框


在使用 Pandas 时,我是初学者。但我想在这里获取 Nvidia 网站上的 G-Sync 游戏监视器表:https ://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/并将其转换为数据用于 Python 的 Pandas 中的框架。


import pandas as pd
df = pd.read_html('https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/')

但这似乎不起作用。我得到一个ValueError: No tables found


import requests
import lxml.html as lh
page = requests.get('https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/')

但不知何故我得到了ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: wrong header check'))


标签: pythonpandasweb-scraping


数据通过 json 请求动态加载。

此脚本将 json 数据加载到数据框中并打印出来:

import re
import json
import pandas as pd

url = 'https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}

html_txt = requests.get(url, headers=headers).text

json_url =  'https://www.nvidia.com' + re.search(r"'url': '(.*?)'", html_txt).group(1)

data = requests.get(json_url, headers=headers).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

def fn(x):
    out = []
    for v in x:
        if isinstance(v, dict):
    return out

df = pd.json_normalize(data['data'], max_level=0).apply(fn)


                  type manufacturer      model  hdr     size lcd type        resolution variable refresh rate range variable overdrive variable refresh input    driver needed
0      G-SYNC ULTIMATE         Acer    CP7271K  Yes       27      IPS    3840x2160 (4K)                     1-144Hz                Yes           Display Port              N/A
1      G-SYNC ULTIMATE         Acer        X27  Yes       27      IPS    3840x2160 (4K)                     1-144Hz                Yes           Display Port              N/A
2      G-SYNC ULTIMATE         Acer        X32  Yes       32      IPS    3840x2160 (4K)                     1-144Hz                Yes           Display Port              N/A
3      G-SYNC ULTIMATE         Acer        X35  Yes       35       VA  3440x1440 (WQHD)                     1-200Hz                Yes           Display Port              N/A
4      G-SYNC ULTIMATE         Asus       PG65  Yes       65       VA    3840x2160 (4K)                     1-144Hz                Yes           Display Port              N/A
..                 ...          ...        ...  ...      ...      ...               ...                         ...                ...                    ...              ...
159  G-SYNC Compatible           LG    2020 ZX  Yes   77, 88     OLED    7680x4320 (8K)                    40-120Hz                 No                   HDMI  445.51 or newer
160  G-SYNC Compatible          MSI   MAG251RX  Yes     24.5      IPS   1920x1080 (FHD)                    48-240Hz                 No           Display Port  441.66 or newer
161  G-SYNC Compatible        Razer  Raptor 27  Yes       27      IPS   2560x1440 (QHD)                    48-144Hz                 No           Display Port  431.60 or newer
162  G-SYNC Compatible      Samsung       CRG5   No       27       VA   1920x1080 (FHD)                    48-240Hz                 No           Display Port  430.86 or newer
163  G-SYNC Compatible    ViewSonic      XG270   No       27      IPS   1920x1080 (FHD)                    48-240Hz                 No           Display Port  441.41 or newer

[164 rows x 11 columns]
