首页 > 解决方案 > 如何在 Python 中使用 selenium 提取并将表数据解析为 DataFrame Pandas

问题描述

我有一个脚本可以提取表格的所有赔率。但结果在一个数组中。

这是代码:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import numpy as np
from concurrent.futures import ThreadPoolExecutor

options = Options()
options.headless = True
options.add_argument("window-size=1400,800")
options.add_argument("--no-sandbox")
options.add_argument("--disable-gpu")
options.add_argument("start-maximized")
options.add_argument("enable-automation")
options.add_argument("--disable-infobars")
options.add_argument("--disable-dev-shm-usage")

driver = webdriver.Chrome(options=options)
driver.get('https://www.coteur.com/match/cotes-fc-noah-fc-van-rid1159745.html')

odds = [my_elem.text for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, '//button[contains(@class, "btn btn-default btn-xs btncote")]')))]

odds = [float(i) for i in odds]
odds = np.array(odds)
print(odds, '\n')
        
driver.close()
driver.quit()

这是输出:

[1.68 3.2  3.95 1.65 3.25 4.   1.62 3.2  4.   1.65 3.1  3.8  1.58 3.2
 4.   1.58 3.2  3.95 1.57 3.15 3.95] 

但就我而言,我想直接在 DataFrame 中获得所有可能性。也就是说,用 selenium 提取数据,直接将数据放入一个 3 列 (1, N, 2) 的 DataFrame

标签: python-3.xpandasdataframeselenium-webdriver

解决方案


您真的不需要像numpyor之类的大炮selenium。整个表来自一个端点。

这是一个示例:

from datetime import datetime

import pandas as pd
import requests
from tabulate import tabulate


mapper = {"0": "Null", "1": "Fc Noah", "2": "Fc Van"}

end_point = "https://www.coteur.com/includes/ajax/getTendance.php?renc_id=1159745&defaultcote=1n2"
data = requests.get(end_point).json()

re_mapped = {
    datetime.fromtimestamp(int(date_point)): {
        mapper[key]: value for key, value in data_values.items()
    } for date_point, data_values in data.items()
}

df = pd.DataFrame(re_mapped)
print(tabulate(df))
df.to_csv("table.csv", index=False)

输出:

-------  ----  ----  ----  ----  ----  ----  ----  ----  ----  ----  ----  ----  ----
Fc Noah  1.69  1.7   1.68  1.65  1.65  1.63  1.62  1.62  1.62  1.62  1.62  1.62  1.62
Null     3.17  3.17  3.18  3.17  3.18  3.19  3.19  3.2   3.19  3.19  3.19  3.19  3.18
Fc Van   3.84  3.83  3.85  3.86  3.87  3.91  3.94  3.96  3.94  3.95  3.96  3.95  3.95
-------  ----  ----  ----  ----  ----  ----  ----  ----  ----  ----  ----  ----  ----

.csv

在此处输入图像描述

编辑:一旦你有了数据,你就可以绘制它。

from datetime import datetime

import matplotlib.pyplot as plt
import pandas as pd

mapper = {"0": "Null", "1": "Fc Noah", "2": "Fc Van"}
data = {'1613624300': {'1': '1.69', '0': '3.17', '2': '3.84'}, '1613625148': {'1': '1.70', '0': '3.17', '2': '3.83'}, '1613627005': {'1': '1.68', '0': '3.18', '2': '3.85'}, '1613633148': {'1': '1.65', '0': '3.17', '2': '3.86'}, '1613635106': {'1': '1.65', '0': '3.18', '2': '3.87'}, '1613638877': {'1': '1.63', '0': '3.19', '2': '3.91'}, '1613640164': {'1': '1.62', '0': '3.19', '2': '3.94'}, '1613642647': {'1': '1.62', '0': '3.20', '2': '3.96'}, '1613643799': {'1': '1.62', '0': '3.19', '2': '3.94'}, '1613644813': {'1': '1.62', '0': '3.19', '2': '3.95'}, '1613653368': {'1': '1.62', '0': '3.19', '2': '3.96'}, '1613666242': {'1': '1.62', '0': '3.19', '2': '3.95'}, '1613667102': {'1': '1.62', '0': '3.18', '2': '3.95'}}

re_mapped = {
    datetime.fromtimestamp(int(date_point)): {
        mapper[key]: float(value) for key, value in data_values.items()
    } for date_point, data_values in data.items()
}

df = pd.DataFrame(re_mapped)
df.T.plot(kind="line")
plt.show()

输出:

在此处输入图像描述


推荐阅读