首页 > 解决方案 > 如何获取下载按钮的 url 并在 Python 中读取 CSV 文件?

问题描述

我正在使用 Python Google Colab 并尝试从此链接读取 csv 文件:https ://www.macrotrends.net/stocks/charts/AAPL/apple/stock-price-history

如果您向下滚动一点,您将能够看到下载按钮。我想通过使用 selenium 或 bs 获取链接并阅读 csv 文件。我正在尝试做这样的事情,

# install packages
!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin

# load packages
import pandas as pd
from selenium import webdriver
import sys

# run selenium and read the csv file
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
driver.get('https://www.macrotrends.net/stocks/charts/AAPL/apple/stock-price-history')#put here the adress of your page
btn = driver.find_element_by_tag_name('button')
btn.click()
df = pd.read_csv('##.csv')

它似乎一直有效,btn.click()但之后出现错误,因为它没有告诉我下载按钮的链接或文件名。你能帮忙吗?那将不胜感激。

标签: pythonseleniumcsvselenium-webdriverweb-scraping

解决方案


不需要硒。数据嵌入<script>标签中。

import requests
from bs4 import BeautifulSoup
import json
import pandas as pd

t = 'AAPL'
url = 'https://www.macrotrends.net/assets/php/stock_price_history.php?t={}'.format(t)

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

scripts = soup.find_all('script',{'type':'text/javascript'})
for script in scripts:
    if 'var dataDaily' in str(script):
        jsonStr = '[' + str(script).split('[',1)[-1].split('];')[0] + ']'
        jsonData = json.loads(jsonStr)
        
df = pd.DataFrame(jsonData)
df = df.rename(columns={'o':'open','h':'high','l':'low','c':'close','d':'date','v':'volume'})
df.to_csv('MacroTrends_Data_Download_{}.csv'.format(t), index=False)

输出:

print(df)
             date      open      high  ...   volume     ma50    ma200
0      1980-12-12    0.1012    0.1016  ...  469.034      NaN      NaN
1      1980-12-15    0.0964    0.0964  ...  175.885      NaN      NaN
2      1980-12-16    0.0893    0.0893  ...  105.728      NaN      NaN
3      1980-12-17    0.0910    0.0915  ...   86.442      NaN      NaN
4      1980-12-18    0.0937    0.0941  ...   73.450      NaN      NaN
          ...       ...       ...  ...      ...      ...      ...
10135  2021-02-25  124.6800  126.4585  ...  148.200  131.845  112.241
10136  2021-02-26  122.5900  124.8500  ...  164.560  131.838  112.460
10137  2021-03-01  123.7500  127.9300  ...  116.308  131.840  112.716
10138  2021-03-02  128.4100  128.7200  ...  102.261  131.790  112.957
10139  2021-03-03  124.8100  125.7100  ...  111.514  131.661  113.184

[10140 rows x 8 columns]

推荐阅读