python - 如何通过聚合循环运行数据框列表?
问题描述
下面的代码块与单个数据帧完美配合,它需要一个日期时间序列数据帧,并为四个传感器列创建一个平均 1 小时的滚动滞后窗口。我有一个数据框列表来执行此操作,有没有办法遍历列表或创建一个函数,这样我就没有重复的代码块?
数据框列表:
df_list = [df_t6,
df_t7,
df_t8,
df_t11,
df_t14,
df_t15,
df_t17,
df_t19]
工作的代码块是:
# df_t6 telemetry means lag window
# create an empty list 'temp'
temp = []
# define the feature columns to be iterated
features = ['HP', 'Coolant1', 'AccumulatedWork', 'CuttingHP']
# loop
for column in features:
# append to the list 'temp' a three hour (1H) sample taking the mean for each 'column' from the 'features' list
temp.append(pd.pivot_table(df_t6, index = 'datetime', columns = 'Tool', values = column)
.resample('1H', closed = 'left', label = 'right').mean().unstack())
# create a dataframe to hold the information and concat the 'temp' list
sensorData1H_mean = pd.concat(temp, axis = 1)
# name the columns using the list 'features' + '1H_mean'
sensorData1H_mean.columns = [n + '_1H_mean' for n in features]
# reset the index values
sensorData1H_mean.reset_index(inplace = True)
我知道我可以为此定义一个方法来快速迭代,但我想知道是否有更快/更好的方法?
def oneHmean(d):
# create an empty list 'temp'
temp = []
# define the feature columns to be iterated
features = ['HP', 'Coolant1', 'AccumulatedWork', 'CuttingHP']
# loop
for column in features:
# append to the list 'temp' a three hour (1H) sample taking the mean for each 'column' from the 'features' list
temp.append(pd.pivot_table(d, index = 'datetime', columns = 'Tool', values = column)
.resample('1H', closed = 'left', label = 'right').mean().unstack())
# create a dataframe to hold the information and concat the 'temp' list
sensorData1H_mean = pd.concat(temp, axis = 1)
# name the columns using the list 'features' + '1H_mean'
sensorData1H_mean.columns = [n + '_1H_mean' for n in features]
# reset the index values
sensorData1H_mean.reset_index(inplace = True)
return sensorData1H_mean
df_t6_m = oneHmean(df_t6)
df_t7_m = oneHmean(df_t7)
ETC...
子集:
df_t6:
Unnamed: 0 IDData HP Coolant1 AccumulatedWork CuttingHP Tool datetime
0 0 0 0 388 30452 -1775 T6 2019-02-22 11:50:21
1 1 1 1812 388 30452 37 T6 2019-02-22 11:50:21
2 2 2 1775 388 30452 0 T6 2019-02-22 11:50:21
3 3 3 1797 382 30452 22 T6 2019-02-22 11:50:21
4 4 4 1797 382 30452 22 T6 2019-02-22 11:50:21
df_t7:
Unnamed: 0 IDData HP Coolant1 AccumulatedWork CuttingHP Tool datetime
0 0 0 1646 14 3291 -1912 T7 2019-02-22 11:50:42
1 1 1 1680 14 3291 -1878 T7 2019-02-22 11:50:42
2 2 2 1719 14 3291 -1839 T7 2019-02-22 11:50:42
3 3 3 1673 14 3291 -1885 T7 2019-02-22 11:50:42
4 4 4 1648 14 3291 -1910 T7 2019-02-22 11:50:42
解决方案
我想你可能想连接df
s, groupby
a key
,然后应用你的oneHmean
函数。
# concat the dfs into one, add a key for each to separate them
df = pd.concat([
df_t6,
df_t7
], keys=[
't6', 't7'
])
# your function
def oneHmean(d):
# create an empty list 'temp'
temp = []
# define the feature columns to be iterated
features = ['HP', 'Coolant1', 'AccumulatedWork', 'CuttingHP']
# loop
for column in features:
# append to the list 'temp' a three hour (1H) sample taking the mean for each 'column' from the 'features' list
temp.append(pd.pivot_table(d, index = 'datetime', columns = 'Tool', values = column)
.resample('1H', closed = 'left', label = 'right').mean().unstack())
# create a dataframe to hold the information and concat the 'temp' list
sensorData1H_mean = pd.concat(temp, axis = 1)
# name the columns using the list 'features' + '1H_mean'
sensorData1H_mean.columns = [n + '_1H_mean' for n in features]
# reset the index values
sensorData1H_mean.reset_index(inplace = True)
return sensorData1H_mean
# group on the keys and apply your function
df.groupby(level=0).apply(oneHmean)
结果
推荐阅读
- django - Django - 如何仅通过外键检索对象?
- haskell - 查找列表的总和,然后将其除以某个数字 Haskell
- woocommerce - 产品类别和子类别滑块和模板问题
- c# - EntityFramework Core 删除包含已读取实体的实体
- key - 如何通过提供助记词(NEAR 协议)获取钱包的私钥和公钥
- python-3.x - EE 证书密钥太弱 (_ssl.c:1131)
- typescript - 当涉及到 TypeScript 时,webpack 配置函数的类型是什么?
- c++ - C ++中的名称和变量有什么区别
- api - 如何找到基本网址?
- python - PySide2 接收者签名