首页 > 解决方案 > 拆分数字和单词 pandas 保留索引

问题描述

我有我的数据框 df 的列,例如 col1。从 col1,我需要创建两列数字和单词。df['col1'] 拆分为 df['code'], df['name']。

(index)                                  col1

94                                      520XX
111                                     316aa
114                                     Entry
144                                 325 Sport
146                                 xColor35d
166                               420 M Sport
167                                        XX
199                                        XX
225                                    645 Ai

我试过这样

import pandas as pd
import numpy as np

result = df['col1'].str.split('(\d+)([A-Za- z]+)', expand=True)
result = result.loc[:,[0,1,2,3]]
result.rename(columns={0:'split_0',1:'split_1', 2:'split_2',3:'split_3'}, inplace=True)
result['split_0'] = result['split_0'].fillna(value=pd.np.NaN, inplace=False)
result['split_0'] = result['split_0'].astype(str).replace(r'^\s*$', np.nan, regex=True)
result

结果是

    split_0       split_1   split_2  split_3  

94  520            XX       None    None    
111 NaN            316      aa  
114 Entry          None     None    None
144 325 Sport      None     None    None
146 xColor         35       d   
166 420 M Sport    None     None    None
167 XX             None     None    None
199 XX             None     None    None
225 645 Ci         None     None    

当我尝试将“split_0”列拆分为数字和单词并随后连接并在最后只有两列包含所有“split_*”列中的数字和单词时,我的问题就出现了,保留索引如下:

    code           name

94  520            XX           
111 316            aa           
114 NaN            Entry    
144 325            Sport    
146 35             xColor d 
166 420            M Sport  
167 NaN            XX       
199 NaN            XX       
225 645            Ci           

标签: pythonstringpandasdataframenumbers

解决方案


尝试str.extractallpd.concat

code = (df.col1.str.extractall('(\d+)')[0]+ ' ') \
                   .sum(level=0).str.strip().rename('code')
name = (df.col1.str.extractall('([a-zA-Z]+)')[0]+ ' ') \
                   .sum(level=0).str.strip().rename('name')

df_out = pd.concat([code, name], axis=1)


Out[139]:
    code      name
94   520        XX
111  316        aa
114  NaN     Entry
144  325     Sport
146   35  xColor d
166  420   M Sport
167  NaN        XX
199  NaN        XX
225  645        Ai

推荐阅读