首页 > 解决方案 > 在第二次出现列值后删除所有行

问题描述

我想在列值的第二个实例之后删除已转换为数据框的 .txt 文件中的所有数据。在这种情况下,分隔符“---”。

数据框构造如下:

15 Leading Causes of Death  15                          Code        Deaths      Population          Crude Rate  Crude Rate Lower 95% Confidence Interval    Crude Rate Upper 95% Confidence Interval
#Accidents (unintentional injuries) (V01-X59,Y85-Y86)   GR113-112   21          152430              13.8        8.5                                         21.1
#Intentional self-harm (suicide) (*U03,X60-X84,Y87.0)   GR113-124   15          152430              Unreliable  5.5                                         16.2
---                     
Dataset: Underlying Cause of Death, 1999-2019                       
Query Parameters:                       
States: Marin County, CA (06041)                        
Ten-Year Age Groups: 25-34 years                        
Year/Month: 1999; 2000; 2001; 2002; 2003                        
Group By: 15 Leading Causes of Death                        
Show Totals: Disabled                       
Show Zero Values: Disabled                      
Show Suppressed: Disabled                       
Calculate Rates Per: 100,000                        
Rate Options: Default intercensal populations for years 2001-2009 (except Infant Age Groups)                        
---                     
Help: See http://wonder.cdc.gov/wonder/help/ucd.html for more information.                      
---                     
Query Date: Sep 23, 2021 6:51:59 PM

在列值或 NaN 等的第一个实例之后,我已经看到了很多解决方案,但对于第二个或 nth 没有任何解决方案......

这是到目前为止我在文件中读取的简单代码。

import pandas as pd

dl = pd.read_csv('Underlying Cause of Death, 1999-2019(3).txt', sep = '\t')
dl.to_csv('test.csv', index = False)

标签: pythonpandasdataframecsvdelimiter

解决方案


查找以 '---' 开头的行并应用累积总和,然后使第一行的索引等于 2 并将您的数据帧切片到该索引。

>>> df.iloc[:df.iloc[:, 0].str.startswith('---').cumsum().eq(2).idxmax()]

0   #Accidents (unintentional injuries) (V01-X59,Y...  GR113-112    21.0    152430.0        13.8                                       8.5                                      21.1
1   #Intentional self-harm (suicide) (*U03,X60-X84...  GR113-124    15.0    152430.0  Unreliable                                       5.5                                      16.2
2                                                 ---        NaN     NaN         NaN         NaN                                       NaN                                       NaN
3       Dataset: Underlying Cause of Death, 1999-2019        NaN     NaN         NaN         NaN                                       NaN                                       NaN
4                                   Query Parameters:        NaN     NaN         NaN         NaN                                       NaN                                       NaN
5                    States: Marin County, CA (06041)        NaN     NaN         NaN         NaN                                       NaN                                       NaN
6                    Ten-Year Age Groups: 25-34 years        NaN     NaN         NaN         NaN                                       NaN                                       NaN
7                                    Year/Month: 1999       2000  2001.0      2002.0        2003                                       NaN                                       NaN
8                Group By: 15 Leading Causes of Death        NaN     NaN         NaN         NaN                                       NaN                                       NaN
9                               Show Totals: Disabled        NaN     NaN         NaN         NaN                                       NaN                                       NaN
10                         Show Zero Values: Disabled        NaN     NaN         NaN         NaN                                       NaN                                       NaN
11                          Show Suppressed: Disabled        NaN     NaN         NaN         NaN                                       NaN                                       NaN
12                       Calculate Rates Per: 100,000        NaN     NaN         NaN         NaN                                       NaN                                       NaN
13  Rate Options: Default intercensal populations ...        NaN     NaN         NaN         NaN                                       NaN                                       NaN

推荐阅读