首页 > 解决方案 > 在 r 中使用 dplyr 转换变量

问题描述

我有titanic数据集,我想让变量适合 SVM 分析。

> str(train)
'data.frame':   891 obs. of  12 variables:
 $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
 $ Survived   : int  0 1 1 1 0 0 0 0 1 1 ...
 $ Pclass     : int  3 1 3 1 3 3 1 3 3 2 ...
 $ Name       : chr  "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
 $ Sex        : chr  "male" "female" "female" "female" ...
 $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
 $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
 $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
 $ Ticket     : chr  "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
 $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
 $ Cabin      : chr  "" "C85" "" "C123" ...
 $ Embarked   : chr  "S" "C" "S" "S" ...

我想删除一些变量,并将chr变量作为 Sex 和 Embarked 更改为因子。

这是我到目前为止所拥有的。

train <- train %>%
  dplyr::select(-1,-4,-9,-11) %>%
  mutate(Sex=recode(Sex, "male"=1, "female"=0)) %>%
  mutate(Embarked=recode(Embarked, "C"=1, "S"=0)) %>%
  na.omit() 

标签: rdplyr

解决方案


你的意思是这样的答案吗?得到一个因素并重新编码?

library(titanic)
# titanic_train dataset
View(titanic_train)

train <- titanic_train %>%
  mutate_if(is.character, as.factor) %>% # all char to factor
  dplyr::select(-1,-4,-9,-11) %>% #removing columns
  mutate(Sex=recode(Sex, "male"="1", "female"="0"))%>% # recode factor
  mutate(Embarked=recode(Embarked, "C"="1", "S"="0")) %>% # recode factor, cave here are 4 levels
  na.omit() 

推荐阅读