首页 > 解决方案 > 如何根据其他数据框计算数据框中的缺失值

问题描述

我想查找数据框中的值,而不是另一个数据框中的值。例如,我有一个这样的数据框:

|---------------------|
|      Col1           |
|---------------------|
|         1111        |
|---------------------|
|         2222        |
|---------------------|
|         3333        |
|---------------------|
|         4444        |
|---------------------|
|         5555        |
|---------------------|
|         6666        |
|---------------------|
|         7777        |
|---------------------|

和另一个像这样的数据框:

|---------------------|
|      Col2           |
|---------------------|
|         1111        |
|---------------------|
|         2222        |
|---------------------|
|         3333        |
|---------------------|
|         4444        |
|---------------------|

我想要这样的结果

|---------------------|
|      missing values |
|---------------------|
|         5555        |
|---------------------|
|         6666        |
|---------------------|
|         7777        |
|---------------------|

已编辑

esn_datafeed_df = pd.read_sql('SELECT * FROM [myDB].[dbo].[esn_datafeed]', engine)
esn_inter_intra_merge_df = pd.read_sql('SELECT * FROM [myDB].[dbo].[esn_inter_intra_merge]', engine)


merged = esn_datafeed_df.merge(esn_inter_intra_merge_df, how='left', indicator=True)
merged.query("_merge == 'left_only'")[["st_umts_df_relation_key"]]

错误:

Traceback (most recent call last):
  File "C:/Users/haroo501/PycharmProjects/tool_check_nbr/my_missing_result.py", line 16, in <module>
    merged = esn_datafeed_df.merge(esn_inter_intra_merge_df, how='left', indicator=True)
  File "C:\Users\haroo501\PycharmProjects\tool_check_nbr\venv\lib\site-packages\pandas\core\frame.py", line 7336, in merge
    return merge(
  File "C:\Users\haroo501\PycharmProjects\tool_check_nbr\venv\lib\site-packages\pandas\core\reshape\merge.py", line 83, in merge
    return op.get_result()
  File "C:\Users\haroo501\PycharmProjects\tool_check_nbr\venv\lib\site-packages\pandas\core\reshape\merge.py", line 642, in get_result
    join_index, left_indexer, right_indexer = self._get_join_info()
  File "C:\Users\haroo501\PycharmProjects\tool_check_nbr\venv\lib\site-packages\pandas\core\reshape\merge.py", line 859, in _get_join_info
    (left_indexer, right_indexer) = self._get_join_indexers()
  File "C:\Users\haroo501\PycharmProjects\tool_check_nbr\venv\lib\site-packages\pandas\core\reshape\merge.py", line 837, in _get_join_indexers
    return _get_join_indexers(
  File "C:\Users\haroo501\PycharmProjects\tool_check_nbr\venv\lib\site-packages\pandas\core\reshape\merge.py", line 1320, in _get_join_indexers
    return join_func(lkey, rkey, count, **kwargs)
  File "pandas/_libs/join.pyx", line 97, in pandas._libs.join.left_outer_join
MemoryError: Unable to allocate 298. MiB for an array with shape (39003243,) and data type int64

我该如何解决?希望有人帮我解决这个问题

标签: pythonpandas

解决方案


您正在寻找 LEFT JOIN。你可以使用mergePandas 中的函数来做到这一点。使用可以让您查看哪些值仅在您正在寻找的indicator=True一个数据帧中(由指示变量标识)。'left_only'

>>> df1 = pd.DataFrame([1, 2, 3, 4, 5, 6], columns=["my_column"])
>>> df2 = pd.DataFrame([1, 2, 3], columns=["my_column"])
>>> merged = df1.merge(df2, how="left", indicator=True)
>>> merged.query("_merge == 'left_only'")[["my_column"]]

    my_column
3   4
4   5
5   6 

推荐阅读