首页 > 解决方案 > Problem with R error when using dplyr::distinct(): "no applicable method for 'distinct_' applied to an object of class "c('double', 'numeric')""

问题描述

Here's my example dataframe:

df.ex <- structure(
  list(
    id_1 = c(15796L, 15796L, 15799L, 15799L),
    id_2 = c(61350L,
             351261L, 61488L, 315736L),
    days = c(30.5, 36.4854, 30.5, 30.5)
  ),
  row.names = c(NA,-4L),
  class = "data.frame",
  .Names = c("id_1",
             "id_2", "days")
)

I am getting this error with dplyr::distinct()

Error in UseMethod("distinct_") : no applicable method for 'distinct_' applied to an object of class "c('double', 'numeric')"

What's confusing is that it works whenever I pass a dataframe to the function and specify the column like this: distinct(df.ex, days). However, if I create a vector of the variable of interest like so: days_vec <- df.ex$days and pass the vector as an argument to the function like so: distinct(days_vec) I then get the error.

In my actual code I need to use distinct in a dplyr pipe like so:

df.ex %>% summarise(distinct_values = distinct(days))

And of course, this also doesn't work. Does anyone know how to overcome this error?

Many thanks, Peter

EDIT: for my acutal problem I need to make a summary table with the count of distinct values for days which would be grouped by id_1, it would look like this:

result <- tibble(
  id_1 = c(15796, 15799),
  count_distinct_values = c(2, 1)
)

I would have thought that the following would help, however it returns another error:

result <- df.ex %>% group_by(id_1) %>% summarise(count_distinct_values = count(distinct(., days)))

Any ideas would be very much appreciated.

标签: rdplyrdistinct

解决方案


Maybe you can try

df.ex %>% group_by(id_1) %>% summarise(distinct_values = n_distinct(days))

You need the . inside distinct since it applies to tbl's (or data frames...), and I add the list to show all the distinct values, and not only the first.

Another way:

df.ex %>% distinct(distinct_values = days)

UPDATE accordingly to question. I think this solves your problem:

df.ex %>% group_by(id_1) %>% summarise(distinct_values = n_distinct(days))

推荐阅读