r - 填写缺失的日期并添加“0”
问题描述
下面的代码生成滑雪季节(12 月至 3 月)每年每月在 SLC 中的雪崩数量。由于此代码获取每个年月的总数,因此它不会添加有 0 次雪崩的年月。我如何填写我的表格,以便它提供全年的月份?
# write the webscraper
library(XML)
library(RCurl)
library(dplyr)
avalanche<-data.frame()
avalanche.url<-"https://utahavalanchecenter.org/observations?page="
all.pages<-0:202
for(page in all.pages){
this.url<-paste(avalanche.url, page, sep="")
this.webpage<-htmlParse(getURL(this.url))
thispage.avalanche<-readHTMLTable(this.webpage, which=1, header=T,stringsAsFactors=F)
names(thispage.avalanche)<-c('Date','Region','Location','Observer')
avalanche<-rbind(avalanche,thispage.avalanche)
}
# subset the data to the Salt Lake Region
avalancheslc<-subset(avalanche, Region=="Salt Lake")
str(avalancheslc)
# convert the dates and get the total the number of avalanches
avalancheslc <- avalancheslc %>%
group_by(Date = format(as.yearmon(Date, "%m/%d/%Y"), "%Y-%m")) %>%
summarise(AvalancheTotal = n())
# pipe to only include Dec-Mar of each year
avalancheslc <- avalancheslc %>% filter(as.integer(substr(Date, 6, 7)) %in% c(12, 1:3))
# the data right now looks like this
Date AvalancheTotal
1980-01 1
1981-02 1
.
.
.
# the data needs to look like this
Date AvalancheTotal
1980-01 1
1980-02 0
1980-03 0
1980-12 0
1981-01 0
1981-02 1
1981-03 1
解决方案
library("tidyverse")
library("lubridate")
# You data here...
# Simpler version
avalancheslc %>%
separate(Date, c("year", "month")) %>%
# Some years might be missing (no avalanches at all)
# We can fill in those with `full_seq` but
# `full_seq` works with numbers not characters
mutate(year = as.integer(year)) %>%
complete(year = full_seq(year, 1), month,
fill = list(AvalancheTotal = 0)) %>%
unite("Date", year, month, sep = "-")
# Alternative version (fills in all months, so needs filtering afterwards)
avalancheslc <- avalancheslc %>%
# In case `Date` needs parsing
mutate(Date = parse_date_time(Date, "%y-%m"))
# A full data frame of months
all_months <- avalancheslc %>%
expand(Date = seq(first(Date), last(Date), by = "month"))
# Join to `avalanches` and fill in with 0s
avalancheslc %>%
right_join(all_months) %>%
replace_na(list(AvalancheTotal = 0))
推荐阅读
- python - 如何在 python 脚本中查看在给定 cuda 核心上运行的所有进程?
- java - 默认重定向 URL 如何在 Spring Security 5 中工作
- reactjs - 创建一个无法访问 React 生命周期的钩子会导致项目进一步发展的问题吗?
- swift - swift中的UITextView右对齐
- python-3.x - pandas read_csv 与交替列中的数据和标题
- shibboleth - 使用 Windows 活动目录配置 shibboleth idp 版本 4
- nginx - 在代理和从上游读取时,使 NGINX “失败(111:连接被拒绝),udp”
- android - 使用不同货币的计费客户订阅
- html - 在 Fetch 语句中获取 React Js 中的用户输入
- php - Symfony 中序列化服务的问题:不允许序列化“SimpleXMLElement”