首页 > 解决方案 > sparklyr date_format 仅适用于某些格式

问题描述

我正在尝试使用 Hive UDFdate_format()来提取星期几,但它只返回NA。让我们看一个例子

sc <- sparklyr::spark_connect(master = "local")
df <- dplyr::copy_to(
  sc,
  data.frame(date = as.POSIXct("2020-01-01")),
  "df"
)
df
# # Source: spark<df> [?? x 1]
#   date
#   <dttm>
# 1 2019-12-31 23:00:00

# Extracting the year works fine...
dplyr::mutate_at(
  .tbl = df,
  .vars = "date",
  .funs = ~date_format(., "yyyy")
)
# # Source: spark<?> [?? x 1]
#   date
#   <chr>
# 1 2020

# But extracting the day of the week does not...
dplyr::mutate_at(
  .tbl = df,
  .vars = "date",
  .funs = ~date_format(., "E")
)
# # Source: spark<?> [?? x 1]
#   date
#   <chr>
# 1 NA

任何帮助,将不胜感激。一些系统信息:

标签: rapache-sparkhiveapache-spark-sqlsparklyr

解决方案


我的尝试是使用mutate。如果要原地更改,请替换DoWdate.

library(tidyverse)
library(sparklyr)

sc <- spark_connect(master = "local")

df <- dplyr::copy_to(sc, data.frame(date = as.POSIXct("2020-01-01")), "df")
df %>% mutate(DoW=date_format(date, "E"))
# Source: spark<?> [?? x 2]
  date                DoW  
  <dttm>              <chr>
1 2019-12-31 23:00:00 Wed  

推荐阅读