python - Remove certain rows without iterating the whole file line by line in Python
问题描述
I have the dataset like below :
Category,Date,Id,Amount
Risk A,11/12/2020,1,-10
Risk A,11/13/2020,2,10
Risk A,11/14/2020,3,22
Risk A,11/15/2020,4,32
Total Risk A : 4 ----- needs to be removed
Risk C,11/9/2020,5,43
Risk C,11/10/2020,6,22
Risk C,11/11/2020,7,11
Risk C,11/12/2020,8,-50
Total Risk C : 4 ----- needs to be removed
Risk D,11/12/2020,9,3
Risk D,11/13/2020,10,1
Risk D,11/14/2020,11,3
Risk D,11/15/2020,12,4
Risk D,11/9/2020,13,55
Risk D,11/10/2020,14,32
Total Risk C : 6 ----- needs to be removed
In between the data rows , there are some specific total(summary) rows, which I need to remove from the file. Looking for a better way to remove these rows, without iterating the file line by line in python.As I have few thousand rows and its a time taking to remove some summary lines. Kindly suggest?
解决方案
You can use Regex to perform string substitution:
import re
t = """Category,Date,Id,Amount
Risk A,11/12/2020,1,-10
Risk A,11/13/2020,2,10
Risk A,11/14/2020,3,22
Risk A,11/15/2020,4,32
Total Risk A : 4 ----- needs to be removed
Risk C,11/9/2020,5,43
Risk C,11/10/2020,6,22
Risk C,11/11/2020,7,11
Risk C,11/12/2020,8,-50
Total Risk C : 4 ----- needs to be removed
Risk D,11/12/2020,9,3
Risk D,11/13/2020,10,1
Risk D,11/14/2020,11,3
Risk D,11/15/2020,12,4
Risk D,11/9/2020,13,55
Risk D,11/10/2020,14,32
Total Risk C : 6 ----- needs to be removed"""
print(re.sub(r'\nTotal.*','', t))
re.sub
will find all the parts of the file that matches the pattern (r'\nTotal.*'
: a newline followhed by the word "Total", followed by any character until the end of line), and replace them with ''.
推荐阅读
- docker - 如何防止 docker apache 在 docker-compose up 上创建 http 文件夹
- r - 在 R 中转换多个值
- java - 有没有一种方法可以在 if else 语句中放置两个或多个带有空格的单词字符串?
- ag-grid-angular - Angular ag-grid如何在autogroupcolumn之前添加一列
- html - 如何在 Bootstrap 进度条中添加带有工具提示的点?
- php - 不显示来自 Laravel Eloquent 关系的数据
- sql - 函数 pgAdmin 中的 LIKE 运算符
- laravel - 达到速率限制时,Laravel 暂停调度程序
- php - 如何使用mysql在php中上传多个文件
- python - 错误:“无法确定关系的真值”但没有使用任何操作来处理 sympy?