首页 > 解决方案 > 如何从 Pandas DataFrame 中的行中提取单词

问题描述

如果我有列名类别,并且其中有像 Plane Travel|Train Travel|Bus Travel 这样的行,那么如何在 pandas Dataframe 中提取 Plane Travel

标签: pandas

解决方案


You need to use the .str accessor and then .split() your string then you can put the result into separated columns.

Let's generate the proper DataFrame:

df = pd.DataFrame({"Category":["Plane France", "Train Russia", "Spacecraft Moon"],
                   "other_variable":[1,2,3] })
print df

       Category  other_variable
0  Plane France               1
1  Train Russia               2
2  Spacecraft Moon            3

You now can access strings with .straccessor (take a look at Pandas doc) and split them.

df["category_list"] = df.Category.str.split(" ") # you can replace " " with any   
                                                 # other word delimiter

and you have to then attibute each element of the list to a new column

df[["transportation", "destination"]] = pd.DataFrame(df.category_list.values.tolist(), 
                                                     index = df.index)

that gives

          Category  other_variable       category_list transportation  \
0     Plane France               1     [Plane, France]          Plane   
1     Train Russia               2     [Train, Russia]          Train   
2  Spacecraft Moon               3  [Spacecraft, Moon]     Spacecraft   

  destination  
0      France  
1      Russia  
2        Moon  

You now have your transportation an destination columns.


推荐阅读