首页 > 解决方案 > How to use pandas.to_sql but only add row if row doesn't exist yet

问题描述

I have some experience with python but very new to the SQL thing and trying to use pandas.to_sql to add table data into my database, but when I add I want it to check if the data exists before append

This are my 2 dataframes

>>> df0.to_markdown()
|    |   Col1 |   Col2 |
|---:|-------:|-------:|
|  0 |      0 |     00 |
|  1 |      1 |     11 |

>>> df1.to_markdown()
|    |   Col1 |   Col2 |
|---:|-------:|-------:|
|  0 |      0 |     00 |
|  1 |      1 |     11 |
|  2 |      2 |     22 |

So here I use the pandas to_sql

>>> df0.to_sql(con=con, name='test_db', if_exists='append', index=False)
>>> df1.to_sql(con=con, name='test_db', if_exists='append', index=False)

Here I check my data inside the database file

>>> df_out = pd.read_sql("""SELECT * FROM test_db""", con)
>>> df_out.to_markdown()
|    |   Col1 |   Col2 |
|---:|-------:|-------:|
|  0 |      0 |      0 |
|  1 |      1 |     11 |
|  2 |      0 |      0 | # Duplicate
|  3 |      1 |     11 | # Duplicate
|  4 |      2 |     22 | 

But I want my database to look like this, so I don't want to add the duplicate data to my database

|    |   Col1 |   Col2 |
|---:|-------:|-------:|
|  0 |      0 |      0 |
|  1 |      1 |     11 |
|  3 |      2 |     22 | 

Is there any option I can set or some line of code to add to make this happend?

Thankyou!

edit: There are some SQL code to only pull unique data, but what I want to do is don't add the data to the database in the first place

标签: pythonmysqlpandaspandas-to-sql

解决方案


不要使用 to_sql 一个简单的查询可以工作

query = text(f""" INSERT INTO test_db VALUES {','.join([str(i) for i in list(df0.to_records(index=False))])} ON CONFLICT ON CONSTRAINT test_db_pkey DO NOTHING""")

self.engine.connect().execute(query)

对于每个 DataFrame 将 df0 更改为 df1

点击这些链接以获得更好的理解


推荐阅读