python-3.x - Unnest grab keywords/nextwords/beforewords function
问题描述
Background
I have the following code to create a df
:
import pandas as pd
word_list = ['crayons', 'cars', 'camels']
l = ['there are many different crayons in the bright blue box and crayons of all different colors',
'i like a lot of sports cars because they go really fast'
'the middle east has many camels to ride and have fun',
'all camels are fun']
df = pd.DataFrame(l, columns=['Text'])
the df
looks like this
Text
0 there are many different crayons in the bright blue box and crayons of all different colors
1 i like a lot of sports cars because they go really fastthe middle east has many camels to ride and have fun
2 all camels are fun
The following code works and creates a function that grabs the trigger
words, along with words that come before (beforewords
) and after (nextwords
) the trigger
words
def find_words(row, word_list):
sentence = row[0]
#make empty lists
trigger = []
next_words = []
before_words = []
for keyword in word_list:
#split words
words = str(sentence).split()
for index in range(0, len(words) - 1):
# get keyword we want
if words[index] == keyword:
# get words after keyword and add to empty list
next_words.append(words[index + 1:index + 3])
# get words before keyword and add to empty list
before_words.append(words[max(index - 3, 0):max(index - 1, 0)])
# append
trigger.append(keyword)
return pd.Series([trigger, before_words, next_words], index = ['Trigger', 'BeforeWords','NextWords'])
# glue together
df= df.join(df.apply(lambda x: find_words(x, word_list), axis=1))
Output
Text Trigger BeforeWords NextWords
0 there ... [crayons, crayons] [[are, many],[blue, box]] [[in, the],[of, all]]
1 i like ... [cars, camels] [[lot, of], [east, has]] [[because, they], [to, ride]]
2 all camels... [camels] [[]] [[are, fun]]
Problem
However, I would like to either 1) unstack 2) unlist OR use another/better way to get the following
Desired Output
Text Trigger BeforeWords NextWords
0 there ... crayons are many in the
1 there ... crayons blue box of all
2 i like ... cars lot of because they
3 i like ... camels east has to ride
4 all camels...camels are fun
Question
How do I tweak my find_words
function to achieve the desired output?
解决方案
看起来像取消嵌套,所以我们可以使用
s=df.set_index(['Text']).stack()
s=pd.DataFrame(s.tolist(),index=s.index).stack()
s.apply(lambda x : ' '.join(x) if type(x)==list else x).unstack(1).reset_index(level=0)
Text ... NextWords
0 there are many different crayons in the bright... ... in the
1 there are many different crayons in the bright... ... of all
0 i like a lot of sports cars because they go re... ... because they
1 i like a lot of sports cars because they go re... ... to ride
0 all camels are fun ... are fun
[5 rows x 4 columns]
推荐阅读
- amazon-web-services - AWS Glue - 数据未插入到所需的目的地
- json - 是什么导致我的 Blazor 应用程序中出现 JsonSerializationException
- python - 遍历 csv.reader python
- cordova - 如何在 Framework7 中使用 AdMob Pro 插件?
- r - 如何在 SJplot 包中生成聚集标准错误?
- mysql - 基于另一个布尔列的列的总和值,分别为 true 和 false
- function - 数据框的列正在交换:当我按名称识别和分配列时,为什么我的循环会切换列值?
- node.js - 试图让我的输入值四舍五入到下两位小数
- python - 识别以口语介绍开头的 MP3 文件中钢琴音乐的开头,并使用 Python 删除口语部分
- spring-security - Spring Security,我们如何从数据库动态创建授权规则匹配器