首页 > 解决方案 > rpy2 不会转换回熊猫

问题描述

我有一个不会转换为 Pandas 的 R 对象,奇怪的是它不会引发错误。

用我正在使用的代码进行了更新,很抱歉没有预先提供 - 并且错过了 2 周的请求!

调用 R 脚本的 Python 代码

import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
import datetime
from rpy2.robjects.conversion import localconverter


def serial_date_to_string(srl_no):
    new_date = datetime.datetime(1970,1,1,0,0) + datetime.timedelta(srl_no - 1)
    return new_date.strftime("%Y-%m-%d")

jurisdiction='TX'
r=ro.r
r_df=r['source']('farrington.R')

with localconverter(ro.default_converter + pandas2ri.converter):
    pd_from_r_df = ro.conversion.rpy2py(r_df)

问题是 pd_from_r_df 返回一个 R 对象而不是 Pandas 数据框:

>>> pd_from_r_df
R object with classes: ('list',) mapped to:
[ListSexpVector, BoolSexpVector]
  value: <class 'rpy2.rinterface.ListSexpVector'>
  <rpy2.rinterface.ListSexpVector object at 0x7faa4c4eff08> [RTYPES.VECSXP]
  visible: <class 'rpy2.rinterface.BoolSexpVector'>
  <rpy2.rinterface.BoolSexpVector object at 0x7faa4c4e7948> [RTYPES.LGLSXP]

这是 R 脚本“farrington.R”,它返回一个监视时间序列,ro.conversion.rpy2py 没有(如上所述)转换为 pandas 数据帧

library('surveillance')
library(readr)
library(tidyr)
library(dplyr)
w<-1
b<-3
nfreq<-52
steps_back<- 28
alpha<-0.05

counts <- read_csv("Weekly_counts_of_death_by_jurisdiction_and_cause_of_death.csv")
counts<-counts[,!colnames(counts) %in% c('Cause Subgroup','Time Period','Suppress','Note','Average Number of Deaths in Time Period','Difference from 2015-2019 to 2020','Percent Difference from 2015-2019 to 2020')]
wide_counts_by_cause<-pivot_wider(counts,names_from='Cause Group',values_from='Number of Deaths',values_fn=(`Cause Group`=sum))
wide_state <- filter(wide_counts_by_cause,`State Abbreviation`==jurisdiction)
wide_state <- filter(wide_state,Type=='Unweighted')
wide_state[is.na(wide_state)] <-0
important_columns=c('Alzheimer disease and dementia','Cerebrovascular diseases','Heart failure','Hypertensive dieases','Ischemic heart disease','Other diseases of the circulatory system','Malignant neoplasms','Diabetes','Renal failure','Sepsis','Chronic lower respiratory disease','Influenza and pneumonia','Other diseases of the respiratory system','Residual (all other natural causes)')

all_columns <- append(c('Year','Week'),important_columns)

selected_wide_state<-wide_state[, names(wide_state) %in% all_columns]
start<-c(as.numeric(min(selected_wide_state[,'Year'])),as.numeric(min(selected_wide_state[,'Week'])))
freq<-as.numeric(max(selected_wide_state[,'Week']))

sts <- new("sts",epoch=1:nrow(numeric_wide_state),start=start,freq=freq,observed=numeric_wide_state)
sts_4 <- aggregate(sts[,important_columns],nfreq=nfreq)
start_idx=end_idx-steps_back

cntrlFar <- list(range=start_idx:end_idx,w==w,b==b,alpha==alpha)
surveil_ts_4_far <- farrington(sts_4,control=cntrlFar)
far_df<-tidy.sts(surveil_ts_4_far)
far_df

(使用这里的 NCHS 数据 [几个月前] https://data.cdc.gov/NCHS/Weekly-counts-of-death-by-jurisdiction-and-cause-o/u6jv-9ijr/

标签: rpandasdataframerpy2

解决方案


在 R 中,source()默认情况下在没有命名函数的脚本上调用时,返回的对象是两个命名组件的列表,$value$visible,其中:

  • $value是最后显示或定义的对象,在您的情况下是far_df数据框(在 Rdata.frame中是类对象扩展list类型);
  • $visible是一个布尔向量,指示是否显示最后一个对象,在您的情况下是TRUE. 这将是FALSE你在结束脚本far_df <- tidy.sts(surveil_ts_4_far)

事实上,您的 Python 错误证实了这个输出表明[ListSexpVector, BoolSexpVector].

因此,由于您只想要第一项,因此按编号或名称对第一项进行索引。

r_raw = ro.r['source']('farrington.R')        # IN R: r_raw <- source('farrington.R')
r_df  = r_raw[0]                              # IN R: r_df  <- r_raw[1]
r_df  = r_raw[r_raw.names.index('value')]     # IN R: r_df  <- r_raw$value

with localconverter(ro.default_converter + pandas2ri.converter):
    pd_from_r_df = ro.conversion.rpy2py(r_df)

推荐阅读