python - 如何在熊猫中将数据合并和分组为条件块
问题描述
我有两个数据框,我试图以某种方式合并。它们如下:
df_A
time expression_w expression_h
0 neutral neutral
1 neutral neutral
2 neutral neutral
3 neutral neutral
4 neutral neutral
5 neutral neutral
6 neutral neutral
7 neutral neutral
8 neutral neutral
9 neutral neutral
10 neutral neutral
11 neutral neutral
12 neutral neutral
13 neutral neutral
14 neutral neutral
15 neutral neutral
16 neutral neutral
17 neutral neutral
18 neutral neutral
19 neutral neutral
和df_B:
start stop
12.12 12.47
13.44 20.82
我只想在和之间合并df_A.expression_w
并df_A.expression_h
基于df_A.time
df_B 。聚合统计信息应为 MODE。df_A.time
df_B.start
df_B.stop
它应该如下所示:
start stop expression_w expression_h
12.12 12.47 neutral neutral
13.44 20.82 neutral neutral
如果我将数据框扩展到
df_A
time expression_w expression_h
0 neutral neutral
1 neutral neutral
2 neutral neutral
3 neutral neutral
4 neutral neutral
5 neutral neutral
6 neutral neutral
7 neutral neutral
8 neutral neutral
9 neutral neutral
10 neutral neutral
11 neutral neutral
12 neutral neutral
13 neutral neutral
14 neutral neutral
15 neutral neutral
16 neutral neutral
17 neutral neutral
18 neutral neutral
19 neutral neutral
20 neutral neutral
21 neutral neutral
22 neutral neutral
23 neutral neutral
24 neutral neutral
25 neutral neutral
26 neutral neutral
27 neutral neutral
28 neutral neutral
29 neutral neutral
30 neutral neutral
31 neutral neutral
32 neutral neutral
33 neutral neutral
34 neutral neutral
35 neutral neutral
36 neutral neutral
37 neutral neutral
38 neutral neutral
39 neutral neutral
30 neutral neutral
41 neutral neutral
42 neutral neutral
43 neutral neutral
44 neutral neutral
45 neutral neutral
46 neutral neutral
47 neutral neutral
48 neutral neutral
49 neutral neutral
50 neutral neutral
和
start stop
0 12.12 12.47
1 13.44 20.82
2 20.88 29.63
3 31.61 33.33
4 33.44 42.21
下面推荐的代码中断并给出此错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-117-47594fba3999> in <module>
1 # df_B= d_s[['start', 'stop']].head(3)
2 # df_A = df.head(500)
----> 3 y = (df_B.assign(dummy=1, idx=np.arange(df_B.shape[1]))
4 .merge(df_A.assign(dummy=1), on='dummy')
5 .query('start<=time<=stop')
~/anaconda3/envs/cv2/lib/python3.6/site-packages/pandas/core/frame.py in assign(self, **kwargs)
3687
3688 for k, v in kwargs.items():
-> 3689 data[k] = com.apply_if_callable(v, data)
3690 return data
3691
~/anaconda3/envs/cv2/lib/python3.6/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
3035 else:
3036 # set column
-> 3037 self._set_item(key, value)
3038
3039 def _setitem_slice(self, key: slice, value):
~/anaconda3/envs/cv2/lib/python3.6/site-packages/pandas/core/frame.py in _set_item(self, key, value)
3111 """
3112 self._ensure_valid_index(value)
-> 3113 value = self._sanitize_column(key, value)
3114 NDFrame._set_item(self, key, value)
3115
~/anaconda3/envs/cv2/lib/python3.6/site-packages/pandas/core/frame.py in _sanitize_column(self, key, value, broadcast)
3756
3757 # turn me into an ndarray
-> 3758 value = sanitize_index(value, self.index)
3759 if not isinstance(value, (np.ndarray, Index)):
3760 if isinstance(value, list) and len(value) > 0:
~/anaconda3/envs/cv2/lib/python3.6/site-packages/pandas/core/internals/construction.py in sanitize_index(data, index)
746 if len(data) != len(index):
747 raise ValueError(
--> 748 "Length of values "
749 f"({len(data)}) "
750 "does not match length of index "
ValueError: Length of values (2) does not match length of index (5)
有想法该怎么解决这个吗?
解决方案
让我们尝试交叉合并和查询
(df_B.assign(dummy=1, idx=np.arange(len(df_B))
.merge(df_A.assign(dummy=1), on='dummy')
.query('start<=time<=stop')
.drop('dummy',axis=1)
.groupby('idx', as_index=False)
.agg(pd.Series.mode)
)
输出(请注意,第一行之间没有time
中间start
和stop
第一行):
idx start stop expression_w expression_h
0 1 13.44 20.82 neutral neutral
推荐阅读
- python - 如何将熊猫 value_counts 转换为 python 列表
- c - 在 MIPS 中实现这个递归函数
- android - notifyDataSetChanged() 在 recyclerview 上做了什么?为什么每次我调用 notifyDataSetChanged() 时它都会不断添加新数据?
- kettle - PDI - 如何在水壶中实施 rijndael CBC 加密?
- c - C 中的文件 I/O 可能会遇到哪些错误?
- javascript - 计算星级 - ng-repeat 的替代方法
- python - cx_Freeze .exe 问题:Pygame 错误:无法打开“文件名”
- jquery - 如何从元素的 ng-init 调用 jQuery 函数?
- vue.js - vue-tables-2 彩色单元格背景
- android - 在 Play 商店中发布适用于手机和手表的 Android 应用