首页 > 解决方案 > 如何从 url 读取 IMF xls- 或 sdmx-data?

问题描述

我想从IMF将 .xls 文件从 URL 直接读取到 R 中,但到目前为止所有尝试都失败了。奇怪的是,我可以手动下载文件download.file(),也可以在 Microsoft Outlook 或文本编辑器中毫无问题地打开它。但是,即使那样我也无法将数据读入 R。

我总是尝试同时使用httpshttp

myUrl <- "https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls"
myUrl2 <- "http://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls"

1. 经典方法——失败。

imf <- read.table(file=myUrl, sep="\t", header=TRUE)
# Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
#                 line 51 did not have 55 elements

imf <- read.table(file=url(myUrl), sep="\t", header=TRUE)
# Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
#                 line 51 did not have 55 elements

2. 几个包——失败。

imf <- readxl::read_xls(myUrl)
# Error: `path` does not exist: ‘https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls’

imf <- readxl::read_xls(myUrl2)
# Error: `path` does not exist: ‘http://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls’

imf <- gdata::read.xls(myUrl)
# Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method,  : 
#   Intermediate file 'C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f873be18e0.csv' missing!
# In addition: Warning message:
# In system(cmd, intern = !verbose) :
#   running command '"C:\STRAWB~1\perl\bin\perl.exe" 
#   "C:/Program Files/R/R-3.6.1rc/library/gdata/perl/xls2csv.pl" 
#   "https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls" 
#   "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f873be18e0.csv" "1"' had status 2
# Error in file.exists(tfn) : invalid 'file' argument

imf <- gdata::read.xls(myUrl2)  # <----------------------------------------------  THIS DOWNLOADS SOMETHING AT LEAST!
# trying URL 'http://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls'
# Content type 'application/vnd.ms-excel' length unknown
# downloaded 8.9 MB
#
# Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method,  : 
#   Intermediate file 'C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f87ded406b.csv' missing!
# In addition: Warning message:
# In system(cmd, intern = !verbose) :
#   running command '"C:\STRAWB~1\perl\bin\perl.exe" 
#   "C:/Program Files/R/R-3.6.1rc/library/gdata/perl/xls2csv.pl" 
#   "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f87f532cb3.xls" 
#   "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f87ded406b.csv" "1"' had status 255
# Error in file.exists(tfn) : invalid 'file' argument

3. Tempfile 方法——失败。

temp <- tempfile()
download.file(myUrl, temp)  # THIS WORKS...

## BUT...
imf <- gdata::read.xls(temp)
# Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method,  : 
#   Intermediate file 'C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f870f55e04.csv' missing!
# In addition: Warning message:
# In system(cmd, intern = !verbose) :
#   running command '"C:\STRAWB~1\perl\bin\perl.exe"
#   "C:/Program Files/R/R-3.6.1rc/library/gdata/perl/xls2csv.pl" 
#   "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f8746a46db" 
#   "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f870f55e04.csv" "1"' had status 255
# Error in file.exists(tfn) : invalid 'file' argument

# even not...
tmp1 <- readLines(temp)
# Warning message:
#   In readLines(temp) :
#   incomplete final line found on 
#   'C:\Users\jay\AppData\Local\Temp\Rtmp00GPlq\file2334435c2905'
str(tmp1)
#  chr [1:8733] "WEO Country Code\tISO\tWEO Subject Code\tCountry\tSubject 
#    Descriptor\tSubject Notes\tUnits\tScale\tCountry/Seri"| __truncated__ ...

4. SDMX

我也试过SDMX IMF的offer,也没有成功。可能这将是一种更复杂的方法,但我从未使用过 SDMX。

link <- "https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019_SDMXData.zip"

temp <- tempfile()
download.file(link, temp, quiet=TRUE)
imf <- rsdmx::readSDMX(temp)
# Error in function (type, msg, asError = TRUE)  : 
#   Could not resolve host: C

# imf <- rsdmx::readSDMX(unzip(temp))  # runs forever and crashes R
unlink(temp)

现在...有人知道发生了什么,以及如何将数据加载到 R 中吗?

标签: rurldownloadxlssdmx

解决方案


为什么不直接使用fill=TRUE

imf <- read.table(file=myUrl, sep="\t", header=TRUE, fill = TRUE)

?read.table

充满

合乎逻辑。如果为 TRUE,则如果行的长度不相等,则会隐式添加空白字段。阅读详情'。


推荐阅读