首页 > 解决方案 > 如何将网站中的表格捕获到熊猫数据框中

问题描述

我想从网站中给出的表中捕获数据,并将其存储到带有预定义列的 Pandas 数据框中。我尝试捕获相同的内容,但我无法按列分隔数据。以下是我对此的尝试;

import pandas as pd
import lxml.html as lh    
site= 'https://gadgets.ndtv.com/mobiles/guide/phone-under-10000-best-mobile-india-price-realme-redmi-samsung-vivo-camera-battery-2240177'
    docc= lh.fromstring((requests.get(site)).content)
    tr_= docc.xpath('//tr')
    df = pd.DataFrame(columns=['Phones', 'rating (out of 10)', 'Price in India'])
    for t in range(0,len(tr_)):
        row= tr_[t]
        for (value) in row.iterdescendants():
            phone= (value.text)
            #print(phone)
            dataset = df.append({'Phones':str(phone)},ignore_index=True)

但在这里我无法捕获“电话”“评级(满分 10)”“印度价格”的数据

标签: pythonpandasdataframebeautifulsoupstring-formatting

解决方案


import requests
import pandas as pd


def main(url):
    r = requests.get(url)
    df = pd.read_html(r.content)[0]
    print(df)
    df.to_csv("data.csv")


main("https://gadgets.ndtv.com/mobiles/guide/phone-under-10000-best-mobile-india-price-realme-redmi-samsung-vivo-camera-battery-2240177")

输出:

  Phones under Rs. 10,000  Gadgets 360 rating (out of 10) Price in India (as recommended)
0               Realme C3                               8                       Rs. 7,999    
1        Realme Narzo 10A                               8                       Rs. 8,499    
2                 Redmi 8                               7                       Rs. 9,499    
3                Realme 5                               8                       Rs. 9,999    
4                Vivo U10                               7                       Rs. 9,990    
5               Realme U1                               8                       Rs. 8,499    
6      Samsung Galaxy M30                               8                      Rs. 10,035 

推荐阅读