首页 > 解决方案 > Panda:无法为从网站导出的表分配索引

问题描述

在以下导出表的代码中,我试图将此表放入 Panda Data 框架中进行分析,但索引不是行或列唯一的,并且对 Contract、Change、Last、Settle 列重复

from urllib.request import urlopen
  from bs4 import BeautifulSoup
  import pandas as pd
  import requests
  import numpy as np

res = requests.get('https://shared.websol.barchart.com/quotes  /quote.php?page=quote&sym=ng&x=13&y=8&domain=if&display_ice=1&enabled_ice_exchanges=&tz=0&ed=0')

soup = BeautifulSoup(res.text, 'lxml')
soup.prettify()
Header = soup.findAll('tr', limit=2)[1].findAll('th')

 column_headers = [th.getText() for th in soup.findAll('tr', limit=2)[1].findAll('th')]

 print(column_headers)

 data_rows = soup.findAll('tr')[2:]
 i = range(len(data_rows))
 # for cell in data_rows
  for td in data_rows:
  row = td.get_text().replace('\\n', '').strip()
  df = pd.DataFrame(columns= column_headers, index range(0,len(data_rows)), data = row)
   print(df)

常规输出如下。

['Contract', 'Last', 'Change', 'Open', 'High', 'Low', 'Volume', 'Prev. Stl.', 'Time', 'Links']
   Cash (NGY00)    2.910s    +0.010    0.000    2.910    2.910    0    2.900    06/04/18    Q / C / O
    000    2.528    2.528    0    2.539    06/04/18    Q / C / O
    May \'21 (NGK21)    2.503s    -0.011    2.500    2.503    2.500    1    2.514    06/04/18    Q / C / O
    Jun \'21 (NGM21)    2.529s    -0.011    0.000    2.529    2.529    0    2.540    06/04/18    Q / C / O
    Jul \'21 (NGN21)    2.557s    -0.011    0.000    2.557    2.557    0    2.568    06/04/18    Q / C / O
    Aug \'21 (NGQ21)    2.567s    -0.011    0.000    2.567    2.567    0    2.578    06/04/18    Q / C / O
    Sep \'21 (NGU21)    2.565s    -0.011    0.000    2.565    2.565    0    2.576    06/04/18    Q / C / O
    Oct \'21 (NGV21)    2.580    -0.013    2.580    2.580    2.580    30    2.593    13:42    Q / C / O
    Nov \'21 (NGX21)    2.653s    -0.011    0.000    2.653    2.653    0    2.664    06/04/18    Q / C / O
    Dec \'21 (NGZ21)    2.797s    -0.011    2.805    2.805    2.797    3    2.808    06/04/18    Q / C / O
    Jan \'22 (NGF22)    2.902s    -0.011    2.900    2.902    2.900    1    2.913    06/04/18    Q / C / O
    Feb \'22 (NGG22)    2.872s    -0.011    2.885    2.885    2.872    3    2.883    06/04/18    Q / C / O
    Mar \'22 (NGH22)    2.799s    -0.011    0.000    2.799    2.799    0    2.810    

熊猫输出不应该是这样的。每一行和每一列都应该有一个专门的索引用于分析。例如。我想比较特定列或最后价格等更改价格之间的差异。

标签: pandas

解决方案


推荐阅读