首页 > 解决方案 > 使用 Python 抓取具有完全相同的类信息的信息

问题描述

我正在使用 BeautifulSoup 从这个网站上抓取信息https://www.gurufocus.com/insider/summary

有两个价格列具有不同的价格值,但它们的类和元素完全相同。以下是班级信息:

<td data-v-575fbbfb="" class="right-align number-field" data-column="Price" row-idx="0">
<span style="color: ">$2.12</span></td>

这是我的代码的一部分

from bs4 import BeautifulSoup
import requests
import pandas as pd 
price = []
for pr in soup.find_all('td',{'class': 'right-align number-field','data-column': 'Price'}):
    price.append(pr.text)

有谁知道如何区分这两个价格并将它们刮成两个数组?

标签: pythonweb-scrapingbeautifulsoup

解决方案


您还可以使用直接获取表格并使用列名:

import pandas as pd
import requests

r = requests.get("https://www.gurufocus.com/insider/summary")

data = pd.read_html(r.text, attrs = {'class': 'data-table'})[0]

data.columns = [
    'Ticker', 'Links', 'Company', 'Price1', 'Insider Name', 'Insider Position', 
    'Date', 'Buy/Sell', 'Insider Trading Shares', 'Shares Change', 'Price2', 
    'Cost(000)', 'Final Share', 'Price Change Since Insider Trade (%)', 
    'Dividend Yield %', 'PE Ratio', 'Market Cap ($M)', 'None'
]

print(data[["Price1","Price2"]])

输出:

     Price1   Price2
0     $2.05    $2.12
1    $15.42   $14.79
2     $0.02    $0.02
3     $0.64    $0.63
4    $73.13   $76.89
5   $298.75  $308.05
6   $512.74  $517.77
7   $341.27     $357
8   $300.99  $311.13
9    $38.34   $39.02
10   $20.79   $21.72
11    $5.65    $5.37
12   $14.30   $14.43
13   $37.93   $36.24
14  $174.90  $177.79
15   $79.58   $83.49
16   $79.58   $83.49
17   $63.91   $66.56
18   $25.31   $25.90
19   $93.04   $95.37
20   $67.73   $72.59
21   $67.73   $71.59
22   $67.71   $71.55
23   $11.31   $10.93
24   $58.67   $60.62
25   $22.64   $25.21
26    $3.98    $4.01
27    $6.47    $6.25
28    $9.08    $8.84
29   $23.69   $23.79
30  $174.23  $178.10
31  $100.07   $99.75
32   $11.89   $12.01
33    $0.83    $0.83
34   $41.15      $25
35   $41.15      $25
36   $41.15      $25
37    $7.23    $4.73
38   $23.04   $21.27
39   $37.97   $35.57

推荐阅读