首页 > 解决方案 > 删除带有单元格的行,检查其字符串或 int

问题描述

我有 500k 行的数据,整个数据的格式有点不一致我正在使用 Spyder、pandas 来清理数据

我将有一列由数字或字符串组成。如果该特定单元格在字符串中,我想删除整行

如下所示是我的代码,由于机密信息而进行了一些调整

import pandas as pd
import csv
mydataset = pd.read_csv('test.txt', error_bad_lines=False,
                    engine='python',
                    index_col=False,header = None,quoting=csv.QUOTE_NONE,  
                    sep="[\s|,|/]",names=["1","2","3","4","a","b","c",
                    "h","i","j","k","l","m","n","o","p","f","g",
                    "q","r","s","t","u","v","w","x","y","z",
                    "5","6","7","8","9","10","11","12","13","14"])

print (mydataset.shape)

columns =['3','4','h','a','b','c','i','j','k','l','m','n','f','g']
mydataset.drop(columns,inplace=True,axis=1)
print (mydataset.shape)

mydataset = mydataset[(mydataset.q.notnull())&(mydataset.r.notnull())& 
(mydataset.s.notnull())&(mydataset.2.notnull())&(mydataset.2 != "@")]

请原谅标题的命名约定。

example of data:
1    2    3    4   <--header
abc  123  123  bcd <--Data
123  123  123  bcd <--Data

想检测“abc”并删除整行

请指教!

标签: pythonpandas

解决方案


使用 dataframe.map,它可能如下所示(我不确定所有语法都是正确的):

def remove(row):
     if 'abc' in row:
          row = []
mydataset.map(remove)

推荐阅读