首页 > 解决方案 > 如何连接三个没有行不匹配的熊猫数据框

问题描述

我在将三个 DataFrame 与 Pandas 连接时遇到了一些问题。我的 DataFrame 之一的行与其他两个不一致(请参阅下面的代码和输出):

import requests
import pandas as pd
from bs4 import BeautifulSoup

List = ['LU0526609390:EUR', 'IE00BHBX0Z19:EUR', 'LU1076093779:EUR', 'LU1116896363:EUR']
df = pd.DataFrame(List, columns=['List'])
urls = 'https://markets.ft.com/data/funds/tearsheet/summary?s='+ df['List']

dfs =[]
results = pd.DataFrame()
for url in urls:
    print(url)
    r = requests.get(url).content
    soup = BeautifulSoup(r, 'html.parser')
    elemList = soup.find('title')
    df0 = pd.DataFrame(elemList, columns = ['Fund Name'])
    df0["Fund Name"] = df0["Fund Name"].str.replace("summary - FT.com", "", regex=True)
    table1 = soup.find_all('table')[0]
    table2 = soup.find_all('table')[1]
    df1 = pd.read_html(str(table1), index_col=0)[0].T
    df2 = pd.read_html(str(table2), index_col=0)[0].T
    df = pd.concat([df0, df1, df2], axis=1)
    dfs.append(df)

pd.concat(dfs).to_csv(r'/Users/Test.csv', index=False)    

我的输出如下:

在此处输入图像描述

看起来我的 df0 DataFrame(列:'Fund Name')上的行与我的其他 DataFrame 的行不一致。如果有人能让我知道为什么会这样,将不胜感激。谢谢!

标签: pythonpandasbeautifulsoupconcatenation

解决方案


想法是添加Fund Name列,如第一列DataFrame.insert

dfs =[]
results = pd.DataFrame()
for url in urls:
    print(url)
    r = requests.get(url).content
    soup = BeautifulSoup(r, 'html.parser')
    elemList = soup.find('title')
    
    table1 = soup.find_all('table')[0]
    table2 = soup.find_all('table')[1]
    df1 = pd.read_html(str(table1), index_col=0)[0].T
    df2 = pd.read_html(str(table2), index_col=0)[0].T
    # print (df2)
    df = pd.concat([df1, df2], axis=1)
    df.insert(0, 'Fund Name', elemList)
    df["Fund Name"] = df["Fund Name"].str.replace("summary - FT.com", "", regex=True)
    dfs.append(df)

推荐阅读