首页 > 解决方案 > Unnest grab keywords/nextwords/beforewords function

问题描述

Background

I have the following code to create a df:

import pandas as pd
word_list = ['crayons', 'cars', 'camels']
l = ['there are many different crayons in the bright blue box and crayons of all different colors',
     'i like a lot of sports cars because they go really fast'
    'the middle east has many camels to ride and have fun',
    'all camels are fun']
df = pd.DataFrame(l, columns=['Text'])

the df looks like this

    Text
0   there are many different crayons in the bright blue box and crayons of all different colors
1   i like a lot of sports cars because they go really fastthe middle east has many camels to ride and have fun
2   all camels are fun

The following code works and creates a function that grabs the trigger words, along with words that come before (beforewords) and after (nextwords) the trigger words

def find_words(row, word_list):

    sentence = row[0]

    #make empty lists
    trigger = []
    next_words = []
    before_words = []

    for keyword in word_list:
        #split words
        words = str(sentence).split()

        for index in range(0, len(words) - 1):

            # get keyword we want
            if words[index] == keyword:

                # get words after keyword and add to empty list
                next_words.append(words[index + 1:index + 3])

                # get words before keyword and add to empty list
                before_words.append(words[max(index - 3, 0):max(index - 1, 0)])

                # append
                trigger.append(keyword)

    return pd.Series([trigger,  before_words, next_words], index = ['Trigger', 'BeforeWords','NextWords'])

# glue together
df= df.join(df.apply(lambda x: find_words(x, word_list), axis=1))

Output

    Text         Trigger                  BeforeWords             NextWords
0   there ...    [crayons, crayons] [[are, many],[blue, box]] [[in, the],[of, all]]
1   i like ...   [cars, camels]     [[lot, of], [east, has]] [[because, they], [to, ride]]
2   all camels... [camels]             [[]]                  [[are, fun]]

Problem

However, I would like to either 1) unstack 2) unlist OR use another/better way to get the following

Desired Output

Text             Trigger        BeforeWords     NextWords
0   there ...    crayons        are many        in the 
1   there ...    crayons        blue box        of all
2   i like ...   cars           lot of          because they
3   i like ...   camels         east has        to ride
4   all camels...camels                         are fun  

Question

How do I tweak my find_words function to achieve the desired output?

标签: python-3.xpandasfunctionloopsnlp

解决方案


看起来像取消嵌套,所以我们可以使用

s=df.set_index(['Text']).stack()
s=pd.DataFrame(s.tolist(),index=s.index).stack()
s.apply(lambda x : ' '.join(x) if type(x)==list else x).unstack(1).reset_index(level=0)
                                                Text      ...          NextWords
0  there are many different crayons in the bright...      ...             in the
1  there are many different crayons in the bright...      ...             of all
0  i like a lot of sports cars because they go re...      ...       because they
1  i like a lot of sports cars because they go re...      ...            to ride
0                                 all camels are fun      ...            are fun
[5 rows x 4 columns]

推荐阅读