首页 > 解决方案 > 重写语句以使用 loc

问题描述

我是 Pandas 的新手,在学习 Plurasight 课程时,我一直在坚持如何重写以下内容以使用 loc 而不是链式索引。

every_6th_row = pd.Series(range(5, len(df),6))

# can you rewrite this line to use df.loc
df['MIN_TEMP_GROUND'].drop(every_6th_row).isnull().all()

在数据框中,每六行 MIN_TEMP_GROUND 列中只有一个值,因此该语句检查以确保所有其他行都为空。

我尝试了多种组合,例如:

df.drop(df.loc[every_6th_row, 'MIN_TEMP_GROUND'])

没有成功。任何指向我哪里出错的指针都将不胜感激。

标签: pandasdataframe

解决方案


Assuming a 0 indexed DataFrame, let's try using the modulus of the index and keep all rows except the 6th:

df.loc[(df.index % 6) != 5, 'MIN_TEMP_GROUND'].isnull().all()  # True

Or more generally based on the shape of the DataFrame and arange if the index is not a 0 indexed range already:

df.loc[(np.arange(df.shape[0]) % 6) != 5, 'MIN_TEMP_GROUND'].isnull().all()

Sample Data:

import numpy as np
import pandas as pd

df = pd.DataFrame({'MIN_TEMP_GROUND': [np.nan, np.nan, np.nan,
                                       np.nan, np.nan, 5] * 2})
    MIN_TEMP_GROUND
0               NaN
1               NaN
2               NaN
3               NaN
4               NaN
5               5.0
6               NaN
7               NaN
8               NaN
9               NaN
10              NaN
11              5.0
df.loc[(df.index % 6) != 5, 'MIN_TEMP_GROUND']
0    NaN
1    NaN
2    NaN
3    NaN
4    NaN
6    NaN
7    NaN
8    NaN
9    NaN
10   NaN
Name: MIN_TEMP_GROUND, dtype: float64

推荐阅读