首页 > 解决方案 > R XML提取日期时间列返回错误值

问题描述

我正在尝试使用下面的代码在 R 中提取 XML 数据,并且是该过程的新手。除了 NEW_DATE 列之外,所有数据点似乎都是正确的。id=1 行的新日期:NEW_DATE = 852163200000,而不是下面原始 XML 格式中列出的 1997-01-02T00:00:00。似乎当我解析会话响应时,NEW_DATE 返回一个字符类型,其值我无法解释。我为这篇文章更改的唯一代码是将代理 URL 替换为 # 占位符。

任何帮助是极大的赞赏!

library(XML)
library(RCurl)
library(xml2)
library(httr)
library(rvest)
library(dplyr)
library(tidyverse)

#setup proxy
my_proxy = use_proxy(url="##.#.##.##:####")
 
#setup session and response
my_session = html_session("https://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData",my_proxy)
my_response = my_session$response
 
#check status
status_code(my_session)
status_code(my_response)
 
#retrieve content XML
content_parsed = content(my_session$response, as = "parsed")
 
#convert list to data frame
ust.df = data.frame(t(sapply(content_parsed$d,c)))
 
#<xs:datetime> data type is used to represent date and time in YYYY-MM-DDThh:mm:ss
 
#list column names
colnames(ust.df)
 
#remove X__metadata column
ust.df = ust.df %>%
  select(-1)
 
#replace Date with "" in NEW_DATE column
ust.df$NEW_DATE = gsub("Date", "", paste(ust.df$NEW_DATE))
 
#replace (,),/ with "" in NEW_DATE column
ust.df$NEW_DATE =gsub("[[:punct:]]", "", ust.df$NEW_DATE)
 
#fix $NEW_DATE format -- 12 digits
 
#Id =1 NEW_DATE = 852163200000 instead of 1997-01-02T00:00:00 listed below

Id = 1 参考的 XML 代码示例

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xml:base="http://data.treasury.gov/Feed.svc/">
   <title type="text">DailyTreasuryYieldCurveRateData</title>
   <id>http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData</id>
   <updated>2021-03-11T17:17:56Z</updated>
   <link rel="self" title="DailyTreasuryYieldCurveRateData" href="DailyTreasuryYieldCurveRateData" />
   <entry>
      <id>http://data.treasury.gov/Feed.svc/DailyTreasuryYieldCurveRateData(1)</id>
      <title type="text" />
      <updated>2021-03-11T17:17:56Z</updated>
      <author>
         <name />
      </author>
      <link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(1)" />
      <category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
      <content type="application/xml">
         <m:properties>
            <d:Id m:type="Edm.Int32">1</d:Id>
            <d:NEW_DATE m:type="Edm.DateTime">1997-01-02T00:00:00</d:NEW_DATE>
            <d:BC_1MONTH m:type="Edm.Double" m:null="true" />
            <d:BC_2MONTH m:type="Edm.Double" m:null="true" />
            <d:BC_3MONTH m:type="Edm.Double">5.190000057220459</d:BC_3MONTH>
            <d:BC_6MONTH m:type="Edm.Double">5.3499999046325684</d:BC_6MONTH>
            <d:BC_1YEAR m:type="Edm.Double">5.630000114440918</d:BC_1YEAR>
            <d:BC_2YEAR m:type="Edm.Double">5.96999979019165</d:BC_2YEAR>
            <d:BC_3YEAR m:type="Edm.Double">6.130000114440918</d:BC_3YEAR>
            <d:BC_5YEAR m:type="Edm.Double">6.3000001907348633</d:BC_5YEAR>
            <d:BC_7YEAR m:type="Edm.Double">6.4499998092651367</d:BC_7YEAR>
            <d:BC_10YEAR m:type="Edm.Double">6.5399999618530273</d:BC_10YEAR>
            <d:BC_20YEAR m:type="Edm.Double">6.8499999046325684</d:BC_20YEAR>
            <d:BC_30YEAR m:type="Edm.Double">6.75</d:BC_30YEAR>
            <d:BC_30YEARDISPLAY m:type="Edm.Double">0</d:BC_30YEARDISPLAY>
         </m:properties>
      </content>
   </entry>
</feed>

标签: rxmldatetimeparsing

解决方案


推荐阅读