首页 > 解决方案 > Pandas Dataframe:有没有办法在组内的循环中填充缺失值?

问题描述

我正在尝试填充数据框中缺少的数字值。每个变量组都有从 1 到 100 的日期,一旦日期达到 100,一些变量就会从 1 开始有第二个日期循环。在一个变量内,date可以重复。我需要将它们从数字 1 填充到 100。例如,A 的值是 1、2、3、3、4、5、6、10 和 1、2、3、3、4。我需要它们是 1,2,3,3,4,5,6,7,8,9,10,11,12,13,14.........100 和 1,2, 3,3,4,5,6,7,8,9,10,11,12,13,14…………100。当我填写日期时,我想填写NaN其余的列。

df = pd.DataFrame({"date": [1,2,3,3,4,5,6,10,1,2,3,3,4,1,1,1,4,4,4,1,1,1,2,2,3,3,3,4,4],
               "var": ["A","A","A", "A", "A", "A","A","A","A", "A", "A","A","A", "B", "B", "B","B","B","B" ,"C", "C", "C","C", "D","D","D","D","D","D"],
               "no": [ 1.5, 1.5,1, 2.2, 3.5, 1.5, 1.5, 1.2, 1.3, 1.1, 2, 3,1, 2.2, 3.5, 1.5, 1.5, 1.2, 1.3, 1.1, 2, 3,9,1.2, 1.3, 1.1, 2, 3,9],
               "value": [ -1.135632, 1.212112,0.469112, -0.282863, -1.509059, -1.135632, 1.212112, -0.173215,
                         0.119209, -1.044236, -0.861849, None,0.469112, -0.282863, -1.509059, -1.135632, 1.212112, -0.173215,
                         0.119209, -1.044236, -0.861849, None,0.87,1.2, 1.3, 1.1, 2, 3,9]})
 date  var  no      value
0   1   A   1.5    -1.135632
1   2   A   1.5     1.212112
2   3   A   1.0     0.469112
3   3   A   2.2    -0.282863
4   4   A   3.5    -1.509059
5   5   A   1.5    -1.135632
6   6   A   1.5     1.212112
7   10  A   1.2    -0.173215
8   1   A   1.3     0.119209
9   2   A   1.1    -1.044236
10  3   A   2.0    -0.861849
11  3   A   3.0    NaN
12  4   A   1.0    0.469112
13  1   B   2.2    -0.282863
14  1   B   3.5    -1.509059
15  1   B   1.5    -1.135632
16  4   B   1.5    1.212112
17  4   B   1.2    -0.173215
18  4   B   1.3    0.119209
19  1   C   1.1    -1.044236
20  1   C   2.0    -0.861849
21  1   C   3.0    NaN
22  2   C   9.0    0.870000
23  2   D   1.2    1.200000
24  3   D   1.3    1.300000
25  3   D   1.1    1.100000
26  3   D   2.0    2.000000
27  4   D   3.0    3.000000
28  4   D   9.0    9.000000

期望的输出是:

date   var  no      value
1       A   1.5    -1.135632
2       A   1.5     1.212112
3       A   1.0     0.469112
3       A   2.2    -0.282863
4       A   3.5    -1.509059
5       A   1.5    -1.135632
6       A   1.5     1.212112
7       A       NaN        NaN
8       A       NaN        NaN 
9       A       NaN        NaN  
.       .       ....       ..........
.       .       ....       ..........
.       .       ....       ..........
100 A   1.2    -0.173215

这只是一组的一个例子。我在数据框中至少有 300 个这样的组,总共有 100,000 行。在这里,日期 3 被重复,但我需要保持原样。请帮忙!

标签: pythonpandasdataframe

解决方案


似乎您只需要一个列来组织日期,而不管实际日期列说什么。这是一个创建一个名为“Date_New”的新列的解决方案,可以为您执行此操作。在这里,Date_New 为组和子组列出了 1,2,3,3,4,5,6,7,8,9,10,11,12,13,14.........100。

此外,您提供的示例已经将 NaN 值显示为 NaN。如果您的实际数据不同,您可以使用我回答中的第一行将任何字符串替换为 NaN。[即 df.replace("Nothing", np.NaN) 或 df.replace("Nada", np.NaN)]

#Replace whatever strings here with NaNs
df = df.replace("None", np.NaN)

#Create separate dataframes for each group
df_groups = df.groupby('var')

date_list = []
counter = 0

#Loop through every group, assigning the index number to date_list
#If index > 100, start the count over by subtracting 99 
for group, df_group in df_groups:
    for i, row in zip(range(len(df_group)), df_group.iterrows()):
        counter = counter + 1
        if counter <= 100:
            date_list.append(i+1)
        else:
            date_list.append(i-99)

#Create a new column called Date_new       
df['Date_New'] = date_list 

推荐阅读