首页 > 解决方案 > 使用 Beautifulsoap 提取特定值

问题描述

我正在使用 python 模块请求抓取这个网站(https://www.ivolatility.com/options/RVX/ )。上面是使用 beautifulsoap 选择第一个表的输出。现在,在第一个表中,我试图从从 python 模块请求获得的这个汤中获取一个特定的值(19.17)。

我想使用 Beautifulsoap 来实现它,我不知道如何专门选择保存它的单元格。

你们有什么建议吗?

请求的输出:

<table border="0" bordercolor="red" cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="3"><script language="JavaScript">



function submitCalcForm(event) {
event.preventDefault();

            var form = document.getElementById('basicOptionsForm');
var action = form.action;
var regions = ['', 'USA', 'Europe', 'Asia', 'Canada'];
var regionsOptions = form[1];
var selectedRegion = regionsOptions.options[regionsOptions.selectedIndex].value;
var symbol = form[0].value.trim();
var location = (window.location.href.indexOf('.j')>-1) 
    ? (form.action + '?' + form[0].name + '=' + form[0].value + '&' + form[1].name + '=' + selectedRegion)
    : ('/options/'+ ((symbol == '') ? '-' : symbol ) +'/'+regions[selectedRegion]); 

            window.location.href= location;

}


function goToLookup() {
    window.location.href= "/options/-/";
}


</script>
<form action="/options.j" id="basicOptionsForm" method="get" onsubmit="submitCalcForm(event);">
<table bgcolor="#ffffff" border="0" cellpadding="0" cellspacing="0">
<tr>
<td>
<table bgcolor="#999999" border="0" cellpadding="0" cellspacing="1">
<tr>
<td bgcolor="#567abb">
<table border="0" cellpadding="1" cellspacing="0" class="table-action">
<tr>
<td><span class="s1w" style="color: #fff;"> Symbol: </span></td><td><input class="s2" name="ticker" size="5" type="text" value="RVX"/></td><td><select class="s2" name="R"><option selected="" value="0">
                ALL
            </option><option value="1">
                USA
            </option><option value="2">
                Europe
            </option><option value="4">
                Canada
            </option></select></td><td><span class="s2"> </span></td><td><button style="background: #0C6EF8;    font-weight: bold;    border: 1px solid black;" type="submit">GO!</button></td><td><span class="s2"> </span></td><td><button onclick="goToLookup();" style="background: #0C6EF8;    font-weight: bold;    border: 1px solid black; color: white; white-space: nowrap;" type="button">
                                                        Symbol Lookup</button></td><td><span class="s2"> </span></td>
</tr>
</table>
</td>
</tr>
</table>
</td><td><img border="0" height="1" src="/design/images/0.gif" width="5"/></td><td nowrap=""><b><span class="s4">Russell 2000 Volatility Index</span></b></td><td width="100%"> </td>
</tr>
</table>
</form>
</td>
</tr>
<tr>
<td colspan="3"><img alt="." border="0" height="10" src="/design/images/0.gif" width="1"/></td>
</tr>
<tr valign="top">
<td width="100%"><script type="text/javascript">
  <!--
    function wr(s) {
      document.write(s);
    }
    var d = new Array(10);
    d[20]='N/A';d[25]='-94.06%';d[30]='32.03%';d[35]='34.74';d[56]='N/A';d[61]='N/A';d[66]='10-Apr';d[71]='84.49%';d[97]='N/A';d[102]='03-Oct';d[107]='29-Mar';d[112]='1.43';d[133]='N/A';d[138]='N/A';d[143]='148.97%';d[148]='98.46%';d[174]='N/A';d[179]='-46.88%';d[184]='198.21%';d[189]='0.27';d[210]='N/A';d[215]='N/A';d[220]='25-May';d[225]='110.30%';d[251]='N/A';d[256]='-68.76%';d[261]='75.38%';d[266]='0';d[287]='N/A';d[292]='N/A';d[297]='39.85%';d[302]='120.02%';d[328]='N/A';d[333]='-67.09%';d[338]='69.94%';d[343]='19.17';d[364]='N/A';d[369]='N/A';d[374]='06-Apr';d[379]='06/14/2018';d[405]='N/A';d[410]='-82.49%';d[415]='74.41%';d[441]='N/A';d[446]='N/A';d[451]='164.16%';d[456]='12.93';d[482]='N/A';d[487]='24-May';d[492]='77.70%';d[518]='N/A';d[523]='03-May';d[528]='21-May';d[533]='12/24/2018';d[559]='N/A';d[564]='59.42%';d[569]='84.78%';

    wr('<table class="table-data" cellpadding=1 cellspacing=1 border=0 width=100%>');
    wr('<tr bgcolor="#cccccc" align=right height=20>');
    wr('<td align="center"><font class=s1>Price</font></td>');
    wr('<td align="center"><font class=s1>Change&nbsp;(%)</font><img src="/design/images/0.gif" width=4 height=1 border=0/></td>');
    wr('<td align="center"><font class=s1>52&nbsp;wk&nbsp;High</font><img src="/design/images/0.gif" width=4 height=1 border=0/></td>');
    wr('<td align="center"><font class=s1>52&nbsp;wk&nbsp;Low</font><img src="/design/images/0.gif" width=4 height=1 border=0/></td>');


    wr('<td align="center"><font class=s1>Stock volume</font>');



    wr('<a href="javascript:openHelp(14)" alt="Open Help">');
    wr('<img src="/design/images/ico/q_zn.gif" width=8 height=10 border=0 alt="Open Help"/>');
    wr('</a><img src="/design/images/0.gif" width=4 height=1 border=0/></td>');



    wr('</tr>');
    wr('<tr bgcolor="#FFFFFF" align=right height=20>');
    wr('<td align="center"><font class=s1>');
    wr(d[343]);
    wr('</font></td>');
    wr('<td align="center"><font class=s1><nobr>&nbsp;&nbsp;');


      wr('<img src="/design/images/ico/up.gif" alt="+" border=0 align="absmiddle" width=7 height=9/>&nbsp;+');



    wr(d[189]);
    wr('&nbsp;(+');
    wr(d[112]);
    wr('%)</nobr></font></td>');
    wr('<td align="center"><font class=s1><nobr>&nbsp;&nbsp;');
    wr(d[35]);
    wr('&nbsp;');
    wr(d[533]);
    wr('</nobr></font></td><td align="center"><font class=s1><nobr>&nbsp;&nbsp;');
    wr(d[456]);
    wr('&nbsp;');
    wr(d[379]);
    wr('</nobr></font></td>');


    wr('<td align="center"><font size=-2 class=s1>');
    wr(d[266]);
    wr('</font></td>');



    wr('</tr></table>');
  //--> 

  </script><img border="0" height="10" src="/design/images/0.gif" width="1"/><table border="0" cellpadding="0" cellspacing="0" class="table-data" width="100%">
<tr align="center" bgcolor="
        #cccccc
    " height="20">
<td align="center" colspan="2"><font class="s2">Current</font></td><td><font class="s2">1 WK AGO</font></td><td><font class="s2">1 MO AGO</font></td><td><font class="s2">52 wk Hi/Date</font></td><td><font class="s2">52 wk Low/Date</font></td>
</tr>
<tr>
<td align="center" bgcolor="
        #FFFFFF
    " colspan="5" height="20"><font class="s2" color="">  HISTORICAL VOLATILITY <a alt="Open Help" href="javascript:openHelp(4)"><img alt="Open Help" border="0" height="10" src="/design/images/ico/q_zn.gif" width="8"/></a></font></td>
</tr>
<tr align="center" bgcolor="#ffffff">
<td align="right"><font class="s2">10 days</font></td><td><font class="s2">120.02%</font></td><td><font class="s2">84.49%</font></td><td><font class="s2">74.41%</font></td><td><font class="s2">198.21% - 29-Mar</font></td><td><font class="s2">32.03% - 21-May</font></td>
</tr>
<tr align="center" bgcolor="#eeeeee">
<td align="right"><font class="s2">20 days</font></td><td><font class="s2">110.30%</font></td><td><font class="s2">84.78%</font></td><td><font class="s2">69.94%</font></td><td><font class="s2">164.16% - 06-Apr</font></td><td><font class="s2">39.85% - 25-May</font></td>
</tr>
<tr align="center" bgcolor="#ffffff">
<td align="right"><font class="s2">30 days</font></td><td><font class="s2">98.46%</font></td><td><font class="s2">77.70%</font></td><td><font class="s2">75.38%</font></td><td><font class="s2">148.97% - 10-Apr</font></td><td><font class="s2">59.42% - 24-May</font></td>
</tr>
<tr>
<td align="center" bgcolor="
        #FFFFFF
    " colspan="5" height="20"><font class="s2" color="">  IMPLIED VOLATILITY <a href="javascript:openHelp(12)"><img alt="Open Help" border="0" height="10" src="/design/images/ico/q_zn.gif" width="8"/></a></font></td>
</tr>
<tr align="center" bgcolor="#ffffff">
<td align="right"><font class="s2">IV Index call <a href="javascript:openHelp(9)"><img alt="Open Help" border="0" height="10" src="/design/images/ico/q_zn.gif" width="8"/></a></font></td><td><font class="s2">N/A</font></td><td><font class="s2">N/A</font></td><td><font class="s2">N/A</font></td><td><font class="s2">N/A - N/A</font></td><td><font class="s2">N/A - N/A</font></td>
</tr>
<tr align="center" bgcolor="#eeeeee">
<td align="right"><font class="s2">IV Index put <a href="javascript:openHelp(10)"><img alt="Open Help" border="0" height="10" src="/design/images/ico/q_zn.gif" width="8"/></a></font></td><td><font class="s2">N/A</font></td><td><font class="s2">N/A</font></td><td><font class="s2">N/A</font></td><td><font class="s2">N/A - N/A</font></td><td><font class="s2">N/A - N/A</font></td>
</tr>
<tr align="center" bgcolor="#ffffff">
<td align="right"><font class="s2">IV Index mean <a href="javascript:openHelp('ivxmean')"><img alt="Open Help" border="0" height="10" src="/design/images/ico/q_zn.gif" width="8"/></a></font></td><td><font class="s2">N/A</font></td><td><font class="s2">N/A</font></td><td><font class="s2">N/A</font></td><td><font class="s2">N/A - N/A</font></td><td><font class="s2">N/A - N/A</font></td>
</tr>
<tr>
<td align="center" bgcolor="
        #FFFFFF
    " colspan="5" height="20"><font class="s2" color="">HISTORICAL 30-DAYS CORRELATION AGAINST S&amp;P 500 Index (SPX)<a href="javascript:openHelp(30)"><img alt="Open Help" border="0" height="10" src="/design/images/ico/q_zn.gif" width="8"/></a></font></td>
</tr>
<tr align="center" bgcolor="#ffffff">
<td align="right"><font class="s2">30 days</font></td><td><font class="s2">-82.49%</font></td><td><font class="s2">-67.09%</font></td><td><font class="s2">-68.76%</font></td><td><font class="s2">-46.88% - 03-Oct</font></td><td><font class="s2">-94.06% - 03-May</font></td>
</tr>
</table>
</td>
</tr>
</table>

标签: pythonbeautifulsoup

解决方案


该页面是动态的,因此您需要先使用 Selenium 之类的东西来呈现页面。

此外,您可以使用 BeautfifulSoup 甚至 Selenium 来解析 html。但我注意到它位于<table>标签内。每当我看到一个<table>标签时,我通常会选择 pandas' .read_html(),因为它会为你完成艰苦的工作。

.read_html()将返回一个数据框列表,然后只需找到您想要的数据,或根据需要操作表格。您想要的数据在索引位置的数据框中找到4,(它也在位置0,但我选择使用它,4因为它就在那里,第 2 行,第 1 列)。然后只需对该数据框进行切片以获取特定于帽子的单元格:

from selenium import webdriver
import pandas as pd

driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
url = 'https://www.ivolatility.com/options/RVX/'
driver.get(url) 

tables = pd.read_html(driver.page_source)

price = tables[4][0][1]

driver.close()

输出:

print (price)
19.17

推荐阅读