python - Python:将 Dataframe 从日期列表转换为 Date From & Date To 格式
问题描述
我有一个如下所示的数据框:
+------------+------+-------+
| Date | Item | Value |
+------------+------+-------+
| 2020-01-01 | A | 100 |
+------------+------+-------+
| 2020-01-01 | B | 80 |
+------------+------+-------+
| 2020-01-01 | C | 70 |
+------------+------+-------+
| 2020-01-02 | A | 102 |
+------------+------+-------+
| 2020-01-02 | B | 82 |
+------------+------+-------+
| 2020-01-02 | C | 65 |
+------------+------+-------+
| 2020-01-05 | B | 81 |
+------------+------+-------+
| 2020-01-05 | C | 70 |
+------------+------+-------+
| 2020-01-05 | D | 20 |
+------------+------+-------+
我想转换成以下格式:
+------+------------+------------+------------+----------+
| Item | Date From | Date To | Value From | Value To |
+------+------------+------------+------------+----------+
| A | 2020-01-01 | 2020-01-02 | 100 | 102 |
+------+------------+------------+------------+----------+
| B | 2020-01-01 | 2020-01-02 | 80 | 82 |
+------+------------+------------+------------+----------+
| C | 2020-01-01 | 2020-01-02 | 70 | 65 |
+------+------------+------------+------------+----------+
| A | 2020-01-02 | 2020-01-05 | 102 | NAN |
+------+------------+------------+------------+----------+
| B | 2020-01-02 | 2020-01-05 | 82 | 81 |
+------+------------+------------+------------+----------+
| C | 2020-01-02 | 2020-01-05 | 65 | 70 |
+------+------------+------------+------------+----------+
| D | 2020-01-02 | 2020-01-05 | NAN | 20 |
+------+------------+------------+------------+----------+
因此,将“一系列”值转换为范围格式,但我一生都无法弄清楚如何做到这一点。我试过使用 shift 运算符,但不能完全正确。需要注意的几点:
- 物品可以在一段时间内进入和离开——我不在乎它们得到什么价值,只要有一行代表它们
- 将有大量必须全部旋转的值字段。
- 这组的主键将包括额外的列(即不仅仅是日期和项目)
对此的一些帮助将不胜感激。
解决方案
让
data = [
['2020-01-01', 'A', 100],
['2020-01-01', 'B', 80],
['2020-01-01', 'C', 70],
['2020-01-02', 'A', 102],
['2020-01-02', 'B', 82],
['2020-01-02', 'C', 65],
['2020-01-05', 'B', 81],
['2020-01-05', 'C', 70],
['2020-01-05', 'D', 20],
]
df = pd.DataFrame(data, columns=['date', 'Item', 'Value'],)
df['date'] = pd.to_datetime(df['date'])
为仅存在一次的项目填写缺少的先前日期:
dates = sorted(set(pd.to_datetime(df['date'].values)))
value_counts = df.Item.value_counts()
single_items = value_counts[value_counts==1].index
for item in single_items:
last_date = df[df['Item']==item]['date'].iloc[0]
previous_date = dates[dates.index(last_date) - 1]
df = df.append(pd.DataFrame([[previous_date, item, np.nan]], columns=['date', 'Item', 'Value']))
加入数据并删除/重命名不需要的列
dates = sorted(set(pd.to_datetime(df['date'].values)))
df['next_date'] = df.apply(
lambda row: dates[dates.index(row['date']) + 1]
if dates.index(row['date']) != len(dates) - 1 else None,
axis=1
)
df2 = df.copy()
result = df.merge(df2, left_on=['next_date', 'Item'], right_on=['date', 'Item'], how='left')
result.drop(columns=['date_y', 'next_date_y'], inplace=True)
result.rename(columns={
'date_x': 'Date From',
'next_date_x': 'Date To',
'Item_x': 'Item',
'Value_x': 'Value From',
'Value_y': 'Value To'
}, inplace = True)
result = result[['Item', 'Date From', 'Date To', 'Value From', 'Value To']]
result.dropna(subset=['Date To'], inplace=True)
result.sort_values(['Date From', 'Item'])
结果:
Item Date From Date To Value From Value To
0 A 2020-01-01 2020-01-02 100.0 102.0
1 B 2020-01-01 2020-01-02 80.0 82.0
2 C 2020-01-01 2020-01-02 70.0 65.0
3 A 2020-01-02 2020-01-05 102.0 NaN
4 B 2020-01-02 2020-01-05 82.0 81.0
5 C 2020-01-02 2020-01-05 65.0 70.0
9 D 2020-01-02 2020-01-05 NaN 20.0
推荐阅读
- kubernetes - Helm [stable/nginx-ingress] 传递标头时出现问题
- ios - 计算 UILabel 的实际文本框
- swift - 如何在swift中从高度确定文本的字体大小?
- typescript - 如何将对象属性定义为函数 = 正确的语法?
- react-native-ios - 在 react-native 中,iphone X 中只覆盖了一半的通知栏
- json - 将文件保存到 Blob 存储
- javascript - Vue Native App中如何实现nativescript-loading-indicator
- r - 如何解构列表并将其作为 r 中的行添加到数据框中
- windows - 使用 powershell 访问远程计算机时出现审计错误
- javascript - 从电子应用程序运行 python 脚本时如何解决“错误:spawn py ENOENT”错误?