python - 用熊猫根据另一列的值删除一列中的值
问题描述
假设我有一个这样的数据框
full_path name created modified
0 C:\T1\1.txt 1.txt 14:04:30 NaN
1 C:\T1\1.txt 1.txt NaN 14:04:30
2 C:\T1\T2\1.txt 1.txt 14:10:30 NaN
3 C:\T1\T2\1.txt 1.txt NaN 14:10:30
4 C:\T1\T2\T3\1.txt 1.txt 14:15:30 NaN
5 C:\T1\T2\T3\1.txt 1.txt NaN 14:15:30
6 C:\T1\T2\T3\T4\1.txt 1.txt 14:20:30 NaN
我使用此代码创建一个数据框:
from pathlib import PurePath
import numpy as np
import pandas as pd
df = pd.DataFrame({
'full_path': {0: 'C:\\T1\\1.txt', 1: 'C:\\T1\\1.txt',
2: 'C:\\T1\\T2\\1.txt', 3: 'C:\\T1\\T2\\1.txt',
4: 'C:\\T1\\T2\\T3\\1.txt',
5: 'C:\\T1\\T2\\T3\\1.txt',
6: 'C:\\T1\\T2\\T3\\T4\\1.txt'},
'name': {0: '1.txt', 1: '1.txt', 2: '1.txt', 3: '1.txt',
4: '1.txt', 5: '1.txt', 6: '1.txt'},
'created': {0: '14:04:30', 1: np.nan, 2: '14:10:30', 3: np.nan,
4: '14:15:30', 5: np.nan, 6: '14:20:30'},
'modified': {0: np.nan, 1: '14:04:30', 2: np.nan, 3: '14:10:30',
4: np.nan, 5: '14:15:30', 6: np.nan}
})
df['folder'] = df['full_path'].apply(lambda x: PurePath(x).parent.name)
g = df.groupby('name')
df['full_path'] = g['full_path'].transform('last')
df['c_m'] = df['created'].combine_first(df['modified'])
index_cols = ['full_path', 'name']
df = df.pivot_table(index=index_cols,
columns='folder',
values='c_m',
aggfunc='first')
summary_cols = ['created', 'modified']
df = df.reset_index() \
.merge(g[summary_cols].agg({'created': 'first', 'modified': 'last'}),
on='name')
df = df[[*index_cols,
*summary_cols,
*df.columns.difference(summary_cols + index_cols)]] \
.rename_axis(None, axis=1)
print(df)
这是输出数据框:
full_path name created modified T1 T2 T3 T4
C:\T1\T2\T3\T4\1.txt 1.txt 14:04:30 14:20:30 14:04:30 14:10:30 14:15:30 14:20:30
我想要的是,例如,如果文件 1.txt 返回到文件夹 T3,然后删除列 T4 中的时间戳。所以,如果我有这样的数据框:
full_path name created modified
0 C:\T1\1.txt 1.txt 14:04:30 NaN
1 C:\T1\1.txt 1.txt NaN 14:04:30
2 C:\T1\T2\1.txt 1.txt 14:10:30 NaN
3 C:\T1\T2\1.txt 1.txt NaN 14:10:30
4 C:\T1\T2\T3\1.txt 1.txt 14:15:30 NaN
5 C:\T1\T2\T3\1.txt 1.txt NaN 14:15:30
6 C:\T1\T2\T3\T4\1.txt 1.txt 14:20:30 NaN
7 C:\T1\T2\T3\1.txt 1.txt 14:30:30 NaN
我希望输出数据框是这样的:
full_path name created modified T1 T2 T3 T4
C:\T1\T2\T3\T4\1.txt 1.txt 14:04:30 14:20:30 14:04:30 14:10:30 14:30:30 NaN
如何修改代码以获得此结果?因此,该文件位于文件夹 T4 中,我在那里放置了一个时间戳,但随后它又移回了 T3,我还想删除 T4 中的时间戳,因为该文件不再存在。
解决方案
推荐阅读
- ios - 如何缩小 WKWebView?
- r - 尝试在 R 中使用转置矩阵函数,但结果是 t 检验
- javascript - React - 有大的静态/资产文件夹可以吗?
- imbalanced-data - 阈值移动以找到不平衡数据集分类的最佳成本
- python - 如何求和找到几个日期时间的平均值
- kql - Kusto 查询如何将表中的每一行作为参数进行迭代以在另一个表中查询?
- java - 如何修复 Java Swing JFrame 的大小错误
- c++ - 嵌套类的模板模板参数的可变类型模板参数和非类型模板参数如何相互约束?
- php - 如何在 PHP 中加二?
- prolog - Prolog递归迭代列表失败