首页 > 解决方案 > 使用 Python 抓取表格

问题描述

我正在尝试用 python刮掉这张桌子( https://futures.tradingcharts.com/marketquotes/ZC.html )。我已经根据这篇文章尝试了一些东西,但是当我手动检查网站的来源时,我没有看到表格。我该如何刮这张桌子?

<div class="mq_page_wrapper">
 <script type="text/javascript">
    $(document).ready(function(){
      generateTCPSLink();
    });

    function generateTCPSLink(){
      var root = location.protocol + '//' + location.host;

      var url_param = {
        action:'tcps_logged_in',
        timestamp: (new Date()).getTime()
      };

      $.getJSON(root+'/widgets/footer_ajax/footer_common_functions.php?'+$.param(url_param),function(data){
        if(data.logged_in){
          $('span#tcps_link').html("Logout:<br>&nbsp;<a href='"+root+"/premium_subscriber/tcps_logout.php"+"'>Premium Subscriber</a><br>");
        }else{
          $('span#tcps_link').html("Login:<br>&nbsp;<a href='"+root+"/premium_subscriber/login_subscribe.php?premium_link"+"'>Premium Subscriber</a><br>");
        }
      });
    }
 </script>
 <div id="members_classic">
   <span id="tcps_link"></span>
 </div>

from selenium import webdriver
import time
import os
from bs4 import BeautifulSoup
chrome_path = r"C:\Users\Desktop\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get('https://futures.tradingcharts.com/marketquotes/ZC.html')
time.sleep(80)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html,'html.parser')
print soup

在此处输入图像描述

标签: pythonseleniumweb-scrapinghtml-table

解决方案


这是一种将数据转换为 json 格式的方法:

import requests
import json

headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
        'Accept': 'application/json, text/javascript, */*; q=0.01',
        'Accept-Language': 'en-US,en;q=0.5',
        'Content-Type': 'application/x-www-form-urlencoded',
        'Origin': 'https://futures.tradingcharts.com',
        'Connection': 'keep-alive',
        'Referer': 'https://futures.tradingcharts.com/futures/quotes/ZC.html',
        'Cache-Control': 'max-age=0',
        'TE': 'Trailers',
    }

data = {
  'apikey': '2d8b3b803594b13e02a7dc827f4a63f8',
  'fields': 'settlement,previousClose,previousOpenInterest',
  'symbols': 'ZCY00,ZC*1,ZC*2,ZC*3,ZC*4,ZC*5,ZC*6,ZC*7,ZC*8,ZC*9,ZC*10,ZC*11,ZC*12,ZC*13,ZC*14,ZC*15,ZC*16,ZC*17,ZC*18,ZC*19,ZC*20,ZC*21,ZC*22,ZC*23,ZC*24,ZC*25,ZC*26,ZC*27,ZC*28,ZC*29,ZC*30,ZC*31,ZC*32,ZC*33,ZC*34,ZC*35,ZC*36,ZC*37,ZC*38,ZC*39,ZC*40,ZC*41,ZC*42,ZC*43,ZC*44,ZC*45,ZC*46,ZC*47,ZC*48,ZC*49,ZC*50'
}

response = requests.post('https://ondemand.websol.barchart.com/getQuote.json', headers=headers, data=data)

data = json.loads(response.text)
data['results']

你可以从那里拿走它。


推荐阅读