首页 > 解决方案 > 从 Boston BlueBikes 数据导入 R 中的表时如何处理 nullsError

问题描述

我正在尝试从此 zip 文件链接中读取数据集:https ://s3.amazonaws.com/hubway-data/201901-bluebikes-tripdata.zip在 R 降价中。首先,我使用了下面名为“code1”的代码,但控制台会吐出一条错误消息:

第 1 行似乎包含
read.table 中的嵌入式空值错误(“ https://s3.amazonaws.com/hubway-data/201901-bluebikes-tripdata.zip ”,:比列名更多的列“。

然后我做了一些调整,其他的代码叫做“code2”,如下图,但是控制台还是吐出错误信息:

在输入连接上发现无效输入
' https://s3.amazonaws.com/hubway-data/201901-bluebikes-tripdata.zip '在' https://s3.amazonaws.com/hubway-data上的 readTableHeader 发现的最后一行不完整/201901-bluebikes-tripdata.zip '"

我已经在网上浏览了所有可能的解决方案并尝试了许多其他方法,但仍然无法使其发挥作用。有人可以告诉我一个解决方案吗?真的很感激!

code1 <- read.table("https://s3.amazonaws.com/hubway-data/201901-bluebikes-tripdata.zip", header = TRUE, sep = ",")   
code2 <- read.table("https://s3.amazonaws.com/hubway-data/201901-bluebikes-tripdata.zip", header = TRUE, sep = ",", fileEncoding = "utf-8", skipNul = TRUE)

标签: rdatasetzip

解决方案


您可以将所有内容包装在一个函数中

library(tidyverse)

read_zip <- function(path_down, file_name = NULL){

  if(is.null(file_name)) stop("please provide a file name")

  download.file(path_down,
                destfile = paste0(file_name, ".zip"))

  unzip(paste0(file_name, ".zip"))    
  return(read_csv(paste0(file_name, ".csv")))
}

data <- read_zip(path_down = "https://s3.amazonaws.com/hubway-data/201901-bluebikes-tripdata.zip",
                 file_name = "201901-bluebikes-tripdata")

data

## A tibble: 69,872 x 15
#   tripduration starttime           stoptime           
#          <dbl> <dttm>              <dttm>             
# 1          371 2019-01-01 00:09:13 2019-01-01 00:15:25
# 2          264 2019-01-01 00:33:56 2019-01-01 00:38:20
# 3          458 2019-01-01 00:41:54 2019-01-01 00:49:33
# 4          364 2019-01-01 00:43:32 2019-01-01 00:49:37
# 5          681 2019-01-01 00:49:56 2019-01-01 01:01:17
# 6          549 2019-01-01 00:50:01 2019-01-01 00:59:10
# 7          304 2019-01-01 00:54:48 2019-01-01 00:59:53
# 8          425 2019-01-01 01:00:48 2019-01-01 01:07:53
# 9         1353 2019-01-01 01:03:34 2019-01-01 01:26:07
#10          454 2019-01-01 01:08:56 2019-01-01 01:16:30
## ... with 69,862 more rows, and 12 more variables: `start
##   station id` <dbl>, `start station name` <chr>, `start
##   station latitude` <dbl>, `start station longitude` <dbl>,
##   `end station id` <dbl>, `end station name` <chr>, `end
##   station latitude` <dbl>, `end station longitude` <dbl>,
##   bikeid <dbl>, usertype <chr>, `birth year` <dbl>,
##   gender <dbl>

推荐阅读