首页 > 解决方案 > 如何根据前一行满足条件插入行

问题描述

我有一个简化的文件(foo.csv),

foo 的内容:

['MyNum', 'Cycle', 'Line', 'V1', 'V2', 'T1']
['1', 'C', '1', '6.7', '25.6', '90']
['3', 'A', '1', '5.8', '22.5', '89.9']
['3', 'A', '2', '5.8', '24.2', '90']
['3', 'A', '3', '5.8', '25.4', '90']
['5', 'B', '1', '6', '25.3', '89.9']
['5', 'B', '2', '6.3', '23.8', '89.9']
['7', 'C', '1', '7.1', '24', '89.9']
['7', 'C', '2', '9999', '9111', '9333']
['7', 'C', '3', '9999', '9111', '9333']

我想要的是 3 行,每行具有相同的第一项(MyNum),但它们的第三项(Line)从 1 增加到 3。因此,如果我只有 1 或 2 行具有该 MyNum 第一项值,我需要插入一或两行,每一行都与它上面的行相同,除了应该增加的行项。

期望的输出:

['MyNum', 'Cycle', 'Line', 'V1', 'V2', 'T1']
['1', 'C', '1', '6.7', '25.6', '90']
['1', 'C', '2', '6.7', '25.6', '90']
['1', 'C', '3', '6.7', '25.6', '90']
['3', 'A', '1', '5.8', '22.5', '89.9']
['3', 'A', '2', '5.8', '24.2', '90']
['3', 'A', '3', '5.8', '25.4', '90']
['5', 'B', '1', '6', '25.3', '89.9']
['5', 'B', '2', '6.3', '23.8', '89.9']
['5', 'B', '3', '6.3', '23.8', '89.9']
['7', 'C', '1', '7.1', '24', '89.9']
['7', 'C', '2', '9999', '9111', '9333']
['7', 'C', '3', '9999', '9111', '9333']

代码

import csv
data = pd.read_csv('foo.csv')
df = pd.DataFrame(data)
print('\n'*5)

print(df["MyNum"])
"""
for i in df["MyNum"]:
    if i+1 = i
    print(i)
"""
with open('foo.csv', 'r') as f_in, open('__fooOut.csv', 'w') as f_out:  # this creates a new output file in write mode
    reader = csv.reader(f_in, delimiter=',') # modify for your file
    writer = csv.writer(f_out, delimiter=',') # modify for your file
    num = 0
    num_count = 3
    while num_count > 0:
        for row in reader:
            print(row)


"""
Manual method

The first item in the first row is 1 and a third item 1.
The following three rows have a first item of 3 and their third items go from 
1, 2 to 3.
The following two rows have a first item of 5 and their third items go from 
1 to 2.
The following 3 rows have a first item of 7 and their third items go from 
1, 2 to 3.

What should happen is there should be three rows each having the same first 
item, and when they do their third item (row[2] or "Line") should increment and
 be either a 1, 2 or 3.
When there is not two rows with the same first item as the row above a new row
 should be inserted immediately below the row with the same details as the 
 row above except for the third item.
"""

我不知道该怎么做,方法是否应该是数据框,也不知道如何检查下一行的第一项是否等于被测行。

标签: pythonpandasdataframenumpycsv

解决方案


如果输入数据是列的整数,请使用带有with的Line自定义 lambda 函数用于前向填充范围内不存在的值,例如:DataFrame.reindexmethod='ffill'1,4

df = pd.read_csv('foo.csv')

f = lambda x: x.reindex(range(1, 4), method='ffill')
df = (df.set_index('Line')
        .groupby(['MyNum','Cycle'])
        .apply(f)
        .drop(['MyNum','Cycle'], 1)
        .reset_index())

print (df)
    MyNum Cycle  Line      V1      V2      T1
0       1     C     1     6.7    25.6    90.0
1       1     C     2     6.7    25.6    90.0
2       1     C     3     6.7    25.6    90.0
3       3     A     1     5.8    22.5    89.9
4       3     A     2     5.8    24.2    90.0
5       3     A     3     5.8    25.4    90.0
6       5     B     1     6.0    25.3    89.9
7       5     B     2     6.3    23.8    89.9
8       5     B     3     6.3    23.8    89.9
9       7     C     1     7.1    24.0    89.9
10      7     C     2  9999.0  9111.0  9333.0
11      7     C     3  9999.0  9111.0  9333.0

推荐阅读