首页 > 解决方案 > Remove certain rows without iterating the whole file line by line in Python

问题描述

I have the dataset like below :

Category,Date,Id,Amount
Risk A,11/12/2020,1,-10
Risk A,11/13/2020,2,10
Risk A,11/14/2020,3,22
Risk A,11/15/2020,4,32
Total Risk A : 4  ----- needs to be removed
Risk C,11/9/2020,5,43
Risk C,11/10/2020,6,22
Risk C,11/11/2020,7,11
Risk C,11/12/2020,8,-50
Total Risk C : 4   ----- needs to be removed
Risk D,11/12/2020,9,3
Risk D,11/13/2020,10,1
Risk D,11/14/2020,11,3
Risk D,11/15/2020,12,4
Risk D,11/9/2020,13,55
Risk D,11/10/2020,14,32
Total Risk C : 6      ----- needs to be removed

In between the data rows , there are some specific total(summary) rows, which I need to remove from the file. Looking for a better way to remove these rows, without iterating the file line by line in python.As I have few thousand rows and its a time taking to remove some summary lines. Kindly suggest?

标签: pythonfilefor-loopline

解决方案


You can use Regex to perform string substitution:

import re
t = """Category,Date,Id,Amount
Risk A,11/12/2020,1,-10
Risk A,11/13/2020,2,10
Risk A,11/14/2020,3,22
Risk A,11/15/2020,4,32
Total Risk A : 4  ----- needs to be removed
Risk C,11/9/2020,5,43
Risk C,11/10/2020,6,22
Risk C,11/11/2020,7,11
Risk C,11/12/2020,8,-50
Total Risk C : 4   ----- needs to be removed
Risk D,11/12/2020,9,3
Risk D,11/13/2020,10,1
Risk D,11/14/2020,11,3
Risk D,11/15/2020,12,4
Risk D,11/9/2020,13,55
Risk D,11/10/2020,14,32
Total Risk C : 6      ----- needs to be removed"""

print(re.sub(r'\nTotal.*','', t))

re.sub will find all the parts of the file that matches the pattern (r'\nTotal.*': a newline followhed by the word "Total", followed by any character until the end of line), and replace them with ''.


推荐阅读