首页 > 解决方案 > 在 R 中将数据框项从字符串转换为 int 时出现意外行为

问题描述

尝试将地图应用于数据框时,我在 R 中遇到了一种奇怪的行为。

我有一个名为的数据框data,其中有一列“月”,其中包含月份的字符串名称,例如“jan”、“feb”、...、“dec”。

我想将这些字符串转换为相应的月份编号,例如“jun”变为 6,因为 6 月是一年中的第 6 个月。

根据这篇文章的建议,我编写了以下映射:

months = 1:12
names(months) = c("jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec")

data这是映射之前的前几个条目:

> data$month[1:20]
 [1] mar oct oct mar mar aug aug aug sep sep sep sep aug sep sep sep mar oct mar apr
Levels: apr aug dec feb jan jul jun mar may nov oct sep

但是,当我将 map 操作应用于 时data,似乎出了点问题:

> months[data$month[1:20]]
aug nov nov aug aug feb feb feb dec dec dec dec feb dec dec dec aug nov aug jan 
  8  11  11   8   8   2   2   2  12  12  12  12   2  12  12  12   8  11   8   1 

我期望得到的是从 3 10 10 3 而不是 8 11 11 8 开始的东西,因为 3 月是第 3 个月,10 月是第 10 个月。

我错过了什么吗?

提前感谢您的帮助!:D

标签: rstringdictionarydata-conversion

解决方案


示例中的问题发生是因为月份是因子格式,级别按字母顺序排列,您可以通过将其转换为字符来避免这种情况,如下所示;

# Creating the dataframe
data <-
  data.frame(
    month = c("mar" , "oct" , "oct" , "mar" , "mar" , "aug" , "aug" , 
              "aug" , "sep" , "sep" , "sep" , "sep" , "aug" , "sep" , 
              "sep" , "sep" , "mar" , "oct" , "mar" , "apr"),
    stringAsFactors = TRUE # Because from the example it is apparent it is factor
  )

# Creating frame of month number
months = 1:12
names(months) = c("jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec")

months[as.character(data$month[1:20])] # Getting month number after converstion to character

# mar oct oct mar mar aug aug aug sep sep sep sep aug sep sep sep mar oct mar apr 
# 3  10  10   3   3   8   8   8   9   9   9   9   8   9   9   9   3  10   3   4 

一种更简单的方法是使用 match() 函数,该函数会自动获取月份名称并获取其编号,而无需创建向量,如下所示;

# Creating the dataframe
data <-
  data.frame(
    month = c("mar" , "oct" , "oct" , "mar" , "mar" , "aug" , "aug" , 
              "aug" , "sep" , "sep" , "sep" , "sep" , "aug" , "sep" , 
              "sep" , "sep" , "mar" , "oct" , "mar" , "apr"),
    stringAsFactors = TRUE # Because from the example it is apparent it is factor
  )

# str_to_title is used to convert first character to upper case mar -> Mar
# Then match is used to get month number from its name
match(stringr::str_to_title(data$month), month.abb)

# mar oct oct mar mar aug aug aug sep sep sep sep aug sep sep sep mar oct mar apr 
# 3  10  10   3   3   8   8   8   9   9   9   9   8   9   9   9   3  10   3   4 

推荐阅读