首页 > 解决方案 > 在列表的 Pandas 数据框列中查找最大值

问题描述

我有一个数据框(df):

df = pd.DataFrame({'A' : [54321, 'it is 54322', 'is it 54323 or 4?', np.NaN]})

我可以找到其中的数字:

df['B'] = df.A.replace(regex={'[^\w]':'','^\D+':'','\D+':' '}).str.split('\s')

                   A           B
0              54321         NaN
1        it is 54322     [54322]
2  is it 54323 or 4?  [54323, 4]
3                NaN         NaN

但是当我尝试找到每一行的最高数字时:

df['C'] = df['B'].apply(lambda x : max(x))

我得到:

TypeError: 'float' object is not iterable

标签: pythonpandasmax

解决方案


使用 lambda 函数if-else,还添加了转换为整数的正确方法max

f = lambda x : max(int(y) for y in x) if isinstance(x, list) else np.nan
df['C'] = df['B'].apply(f)
print (df)
                   A           B        C
0              54321         NaN      NaN
1        it is 54322     [54322]  54322.0
2  is it 54323 or 4?  [54323, 4]  54323.0
3                NaN         NaN      NaN

或者使用Series.str.extractallfor MultiIndexwith convert to intand using maxper first level:

df = pd.DataFrame({'A' : [54321, 'it is 54322', 'is it 54323 or 4?', np.NaN]})
df['C'] = df.A.astype(str).str.extractall('(\d+)').astype(int).max(level=0)
print (df)
                   A        C
0              54321  54321.0
1        it is 54322  54322.0
2  is it 54323 or 4?  54323.0
3                NaN      NaN

推荐阅读