首页 > 解决方案 > 沿同一索引合并列

问题描述

我正在尝试使用pandas以下格式来处理 ~500mb 制表符分隔的数据文件:

+-------+---------+-------+---------+-------+---------+
| Time1 | Sensor1 | Time2 | Sensor2 | Time3 | Sensor3 |
+-------+---------+-------+---------+-------+---------+
|     0 | x       |     0 | y       | 0     | z       |
|     1 | x       |     2 | y       | 0.5   | z       |
|     2 | x       |     4 | y       | 1     | z       |
|     3 | x       |       |         | 1.5   | z       |
|     4 | x       |       |         | 2     | z       |
|     5 | x       |       |         | 2.5   | z       |
|       |         |       |         | 3     | z       |
|       |         |       |         | 3.5   | z       |
|       |         |       |         | 4     | z       |
|       |         |       |         | 4.5   | z       |
|       |         |       |         | 5     | z       |
+-------+---------+-------+---------+-------+---------+

我想沿一个时间轴获取所有传感器值,如下所示:

+------+---------+---------+---------+
| Time | Sensor1 | Sensor1 | Sensor3 |
+------+---------+---------+---------+
| 0    | x       | y       | z       |
| 0.5  | NaN     | NaN     | z       |
| 1    | x       | NaN     | z       |
| 1.5  | NaN     | NaN     | z       |
| 2    | x       | y       | z       |
| 2.5  | NaN     | NaN     | z       |
| 3    | x       | NaN     | z       |
| 3.5  | NaN     | NaN     | z       |
| 4    | x       | y       | z       |
| 4.5  | NaN     | NaN     | z       |
| 5    | x       | NaN     | z       |
+------+---------+---------+---------+

我从以下代码开始。循环部分工作正常(尽管需要很长时间)。但是,该concat部分会导致大量重复的时间索引,并且不会将多个传感器值组合到一行中。

import pandas as pd
dfList = []
numberOfChannels = 3
for x in range(0,numberOfChannels):
    columns = [numberOfChannels]
    frame = pd.read_table('testinput.csv', 
                          usecols = [x*2, x*2+1],
                          index_col = 0)
    frame.index.name = 'time'
    frame.index = pd.to_timedelta(frame.index, unit = 'ms')

    dfList.append(frame)
df = pd.concat(dfList)

有没有更好的方法来实现这一点?

标签: pythonpandasdataframe

解决方案


您可以创建一个系列列表,然后pandas.concat将它们组合成一个数据框。

该解决方案在功能上与@DyZ 相同,但布局不同。

series_list = [df.set_index('Time'+str(i))['Sensor'+str(i)].dropna() \
               for i in range(1, int(len(df.columns)/2) + 1)]

res = pd.concat(series_list, axis=1)\
        .rename_axis('Time').reset_index()

设置

df = pd.DataFrame({'Time1': [0, 1, 2, 3, 4, 5, np.nan, np.nan, np.nan, np.nan, np.nan],
                   'Sensor1': ['x', 'x', 'x', 'x', 'x', 'x', np.nan, np.nan, np.nan, np.nan, np.nan],
                   'Time2': [0, 2, 4, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
                   'Sensor2': ['y', 'y', 'y', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
                   'Time3': [0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5],
                   'Sensor3': ['z', 'z', 'z', 'z', 'z', 'z', 'z', 'z', 'z', 'z', 'z']})

结果

print(res)

    Time Sensor1 Sensor2 Sensor3
0    0.0       x       y       z
1    0.5     NaN     NaN       z
2    1.0       x     NaN       z
3    1.5     NaN     NaN       z
4    2.0       x       y       z
5    2.5     NaN     NaN       z
6    3.0       x     NaN       z
7    3.5     NaN     NaN       z
8    4.0       x       y       z
9    4.5     NaN     NaN       z
10   5.0       x     NaN       z

推荐阅读