首页 > 解决方案 > 演示 pandas 中的简单连接

问题描述

我正在制定一个教学大纲,用于教授一些同事 pandas 和 SQL,并希望演示 SQL 连接,然后演示如何使用 pandas 进行类似的连接。

我的 SQL 查询:

SELECT    
town.id,
town.name,
county.name AS county,
nation.name AS nation
FROM town
LEFT JOIN county ON county.id = town.county_id
LEFT JOIN nation ON nation.id = county.nation_id;

df.join() 和 df.merge() 的文档让我有些困惑。有人可以解释我如何使用来自以下查询的数据在 pandas 中执行相同的连接。我想在 town.county_id 上离开加入县。

镇:

SELECT    
town.id,
town.name,
town.county_id
FROM town

县:

SELECT    
county.id,
county.name,
county.nation_id
FROM county

国家:

SELECT    
nation.id,
nation.name
FROM nation

我尝试了几种方法,但得到了奇怪的结果。这个看起来最有希望,但我得到了一个奇怪的输出。

output = pd.merge(
  town, 
  county, 
  left_on='id', 
  right_on='county_id'
)

输出:

    name_x nuts_region_x  county_id_x          name_y nuts_region_y  county_id_y
0     Aaron's Hill    South East            1    Aaron's Hill    South East            1
1     Aaron's Hill    South East            1      Abbey Mead    South East            1
...            ...           ...          ...             ...           ...          ...
1795      Aberdeen      Scotland           39          Ashton      Scotland           96
1796      Aberdeen      Scotland           39      Auchenback      Scotland           96

非常感谢任何帮助。

安德鲁

标签: mysqlpandas

解决方案


我找到了以下解决方案:

town_query = """
    SELECT    
    town.id,
    town.name,
    town.nuts_region,
    town.county_id
    FROM town
"""

county_query = """
    SELECT    
    county.id,
    county.name AS county_name,
    county.nation_id 
    FROM county
"""

nation_query = """
    SELECT    
    nation.id,
    nation.name AS nation_name
    FROM nation
"""

我们需要将索引设置为 id 列。然后 Pandas 会自动将其用于连接。

town = pd.read_sql_query(town_query, database_connection).set_index("id")
county = pd.read_sql_query(county_query, database_connection).set_index("id")
nation = pd.read_sql_query(nation_query, database_connection).set_index("id")

这种双重合并成功了,给了我一个很好的输出。

output = town.merge(
  county,
  left_on='county_id',
  right_index=True,
  suffixes=('_town', '_county')
).merge(
  nation,
  left_on='nation_id',
  right_index=True,
  suffixes=('_county', '_nation')
)

输出:

                name  nuts_region  county_id  county_name  nation_id  nation_name
id                                                                               
1       Aaron's Hill   South East          1       Surrey          1      England
25        Abbey Mead   South East          1       Surrey          1      England
54        Abbotswood   South East          1       Surrey          1      England
160   Abinger Bottom   South East          1       Surrey          1      England
161   Abinger Common   South East          1       Surrey          1      England
...              ...          ...        ...          ...        ...          ...
2677         Andreas  Isle of Man         81  Isle of Man          5  Isle of Man
3991         Agneash  Isle of Man         81  Isle of Man          5  Isle of Man
4474         Andreas  Isle of Man         81  Isle of Man          5  Isle of Man
5788         Agneash  Isle of Man         81  Isle of Man          5  Isle of Man
6271         Andreas  Isle of Man         81  Isle of Man          5  Isle of Man

推荐阅读