r - In R, how to apply a function on each dataframe row that uses a column value?
问题描述
Let's say I have a dataframe
Author | Lyrics |
Name1 Text (characters)
Name2 Text (characters)
I want to create another column through applying a function that for each row takes the Text from the Text column, separates by whitespaces, then iterates over each token to see if it is within another vector I made (so I can work out the percentage of tokens within the text that are within that other vector).
The function as I have written so far is below
ReturnPercentPosWord = function(textLyrics){
WhitespaceSplitText = strsplit(textLyrics, " ")
LengthSplitText = length(WhitespaceSplitText)
CountInPosList = 0
for (i in WhitespaceSplitText) {
if (i %in% PositiveWords$word) {
CountInPosList = CountInPosList+1
}
}
if (CountInPosList == 0) {
return(0)
}
PercentInPos = (CountInPosList/LengthSplitText)*100
return(PercentInPos)}
I want to apply this function to each row now. I have tried
TestPOSwordsDF$PercentPositiveWords = ReturnPercentPosWord(TestPOSwordsDF$Lyrics)
and
TestPOSwordsDF$PercentPositiveWords = apply(TestPOSwordsDF[, c('Lyrics'),drop=F], 1, ReturnPercentPosWord)
But I get a message saying
the condition has length > 1 and only the first element will be used
I would really appreciate any help with this. Thank you!
解决方案
Try using this :
TestPOSwordsDF$PercentPositiveWords <- sapply(
strsplit(TestPOSwordsDF$Lyrics, " "), function(x)
mean(x %in% PositiveWords$word) * 100)
Here we split Lyrics
on space, get the ratio of words which are present in PositiveWords$word
.
推荐阅读
- sql - 使用 CASE 语句比较前一行和当前行之间的 TIMESTAMP 数据,并根据差异 Teradata 执行操作
- scala - 移除一个模拟函数
- java - Java API 中缺少 Z3 第 n 个函数?
- java - Jpanel 和 Jframe 在运行时分别显示
- c# - 使用 ML.net 进行预测时内存逐渐增加
- java - 我将如何在垂直堆栈中来回绘制 4 个正方形?
- swift - 如何对元组数组进行字符串化?
- reactjs - 如何在 React 中传递(和使用)多维数组作为道具?
- architecture - 如何在 Apache Flink 中将作业分配给 TaskManager
- python-2.7 - 响应'对象在运行查找插件'hashi_vault'时没有属性'__getitem__