首页 > 解决方案 > How to search human names in a dataframe?

问题描述

I am trying to search human name from the dataframe. I have a very large dataset where i need to tokenize everyword but human names should be full names. I am just creating this as an example.

Date                             Text
09.05.2019                       His name is Detlef Schubert. 
04.09.2019                       Mr. Klau Gerd is a good person. 

So i want the output like this:

Date                             Text
09.05.2019                       His
09.05.2019                       name
09.05.2019                       is
09.05.2019                       Detlef Schubert
04.09.2019                       Mr. Klaus Gerd
04.09.2019                       is
04.09.2019                       a
04.09.2019                       good 
04.09.2019                       person.

So far i am doing this:

df = df.set_index('Date')['Text'].str.split().explode()

But i want full name when i do tokenizing

标签: pythonpandasnltktokenize

解决方案


推荐阅读