首页 > 解决方案 > 使用 pandas read_html() 时遇到问题:ValueError

问题描述

from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests

url = "https://finance.naver.com/item/sise_day.nhn?code=068270&page=1"
headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}
res = requests.get(url, verify=True, headers=headers)


with urlopen(url) as doc:
    html = BeautifulSoup(res.text, 'lxml') 
    pgrr = html.find('td', class_='pgRR') 
    s = str(pgrr.a['href']).split('=')
    last_page = s[-1]


df = pd.DataFrame()
sise_url = 'http://finance.naver.com/item/sise_day.nhn?code=068270'


for page in range(1, int(last_page)+1): 
    page_url = '{}&page={}'.format(sise_url, page)  
    df = df.append(pd.read_html(page_url, encoding='euc-kr', header='0')[0])

df = df.dropna() # 값이 빠진 행을 제거한다.
print(df)

我在 Naver Finance 中抓取每日股票数据时遇到此值错误。我可以毫无问题地获取 url,但如果我使用 read_html() 我会Value Error:Table not found遇到问题df = df.append(pd.read_html(page_url, encoding='euc-kr', header='0')[0])。请给一些建议。

标签: pythonpandas

解决方案


我不读韩文......但是pd.read_html()得到一个错误页面。requests.get()通过标题解决了这个问题。然后传递res.textread_html()

from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
import pandas as pd

url = "https://finance.naver.com/item/sise_day.nhn?code=068270&page=1"
headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}
res = requests.get(url, verify=True, headers=headers)


with urlopen(url) as doc:
    html = BeautifulSoup(res.text, 'lxml') 
    pgrr = html.find('td', class_='pgRR') 
    s = str(pgrr.a['href']).split('=')
    last_page = s[-1]


df = pd.DataFrame()
sise_url = 'http://finance.naver.com/item/sise_day.nhn?code=068270'

for page in range(1, int(last_page)+1): 
    page_url = '{}&page={}'.format(sise_url, page)  
    res = requests.get(page_url, verify=True, headers=headers)
    df = df.append(pd.read_html(res.text, encoding='euc-kr')[0])


推荐阅读