python - pandas2ri.ri2py_dataframe(r_dataframe) 返回浮点数而不是 ISO-8601 (YYYY-MM-DD) 格式的日期
问题描述
代码
# First, convert the input dataframe to an R dataframe to be used by our R function:
input_dataframe_r = pandas2ri.py2ri(input_dataframe)
output_dataframe_r = r_generate_notifications(input_dataframe_r, metric_name, lookback, moving_average, sigmas)
# And convert it back to a Pandas dataframe:
output_dataframe_py = pandas2ri.ri2py_dataframe(output_dataframe_r)
print('output dataframe r:', output_dataframe_r)
print('\n')
print('output dataframe py:', output_dataframe_py)
print('\n')
问题描述
我在 Python 中有一个 Pandas 数据框,我想做一些 R 数学运算。所以我接受一个参数,input_dataframe
它是一个 Pandas 数据帧,做一些事情(在这种情况下,它是一个称为 R 的函数r_generate_notifications()
),然后使用output_dataframe_py = pandas2ri.ri2py_dataframe(output_dataframe_r)
.
问题是 R 代码使用返回一些日期ymd()
,当我转换为 Pandas 数据框时,这些都转换为浮点数。我不确定这是代码中的错误或错误,还是用户错误。我还在 Pandas Github 上将此作为错误发布:https ://github.com/pandas-dev/pandas/issues/21044
预期输出(R 数据帧)
output dataframe r: notify_day daily_value is_high_value time_period_length time_period_value
1 2017-05-09 11033.79 1 7 30938.45
2 2017-05-18 1613.64 1 7 25669.63
3 2017-05-19 2121.38 1 7 28048.14
4 2017-05-26 1774.44 1 7 28185.27
5 2017-06-12 693.24 1 7 26170.57
6 2017-06-24 2275.77 1 7 36550.32
7 2017-06-29 5336.76 1 7 32748.46
8 2017-06-30 8921.38 1 7 43366.39
9 2017-07-11 4007.84 0 7 28986.47
10 2017-07-20 5766.12 0 7 24627.51
11 2017-08-01 4150.32 1 7 24760.60
12 2017-08-04 734.40 0 7 20645.43
13 2017-08-12 0.00 1 7 9898.20
14 2017-12-29 5000.00 1 7 12467.02
15 2018-01-28 0.00 1 7 12538.81
16 2018-02-14 0.00 1 7 14351.24
17 2018-02-20 10628.82 1 7 20905.00
18 2018-03-16 237.44 1 7 24400.76
19 2018-03-21 917.96 1 7 26485.20
20 2018-03-24 1272.85 1 7 39287.70
21 2018-03-26 3231.26 1 7 41543.95
22 2018-03-29 9493.31 1 7 43060.81
23 2018-03-30 21696.04 0 7 34854.90
24 2018-03-31 1403.33 0 7 13158.86
25 2018-04-06 0.00 0 7 15240.38
26 2018-04-08 453.68 0 7 18004.12
27 2018-04-18 4666.36 1 7 27038.60
28 2018-04-21 0.00 0 7 24620.15
29 2018-04-23 4306.88 1 7 27470.00
time_period_start time_period_end comparison_days_ago comparison_value
1 2017-05-03 2017-05-09 28 19056.30
2 2017-05-12 2017-05-18 14 21610.99
3 2017-05-13 2017-05-19 28 24321.11
4 2017-05-20 2017-05-26 28 14530.01
5 2017-06-06 2017-06-12 28 20087.97
6 2017-06-18 2017-06-24 28 30796.60
7 2017-06-23 2017-06-29 14 28394.23
8 2017-06-24 2017-06-30 28 22758.57
9 2017-07-05 2017-07-11 14 36122.77
10 2017-07-14 2017-07-20 28 29509.53
11 2017-07-26 2017-08-01 7 19662.71
12 2017-07-29 2017-08-04 28 30518.06
13 2017-08-06 2017-08-12 1 4487.40
14 2017-12-23 2017-12-29 28 0.00
15 2018-01-22 2018-01-28 28 10393.82
16 2018-02-08 2018-02-14 28 2177.36
17 2018-02-14 2018-02-20 28 602.64
18 2018-03-10 2018-03-16 28 19042.76
19 2018-03-15 2018-03-21 28 14042.68
20 2018-03-18 2018-03-24 28 9351.16
21 2018-03-20 2018-03-26 28 7909.36
22 2018-03-23 2018-03-29 28 464.28
23 2018-03-24 2018-03-30 1 43060.81
24 2018-03-25 2018-03-31 14 24163.32
25 2018-03-31 2018-04-06 14 17591.66
26 2018-04-02 2018-04-08 14 39418.18
27 2018-04-12 2018-04-18 14 12906.06
28 2018-04-15 2018-04-21 28 39287.70
29 2018-04-17 2018-04-23 14 18153.08
comparison_period_start comparison_period_end
1 2017-04-05 2017-04-11
2 2017-04-28 2017-05-04
3 2017-04-15 2017-04-21
4 2017-04-22 2017-04-28
5 2017-05-09 2017-05-15
6 2017-05-21 2017-05-27
7 2017-06-09 2017-06-15
8 2017-05-27 2017-06-02
9 2017-06-21 2017-06-27
10 2017-06-16 2017-06-22
11 2017-07-19 2017-07-25
12 2017-07-01 2017-07-07
13 2017-08-05 2017-08-11
14 2017-11-25 2017-12-01
15 2017-12-25 2017-12-31
16 2018-01-11 2018-01-17
17 2018-01-17 2018-01-23
18 2018-02-10 2018-02-16
19 2018-02-15 2018-02-21
20 2018-02-18 2018-02-24
21 2018-02-20 2018-02-26
22 2018-02-23 2018-03-01
23 2018-03-23 2018-03-29
24 2018-03-11 2018-03-17
25 2018-03-17 2018-03-23
26 2018-03-19 2018-03-25
27 2018-03-29 2018-04-04
28 2018-03-18 2018-03-24
29 2018-04-03 2018-04-09
实际输出(Python/Pandas 数据框)
output dataframe py: notify_day daily_value is_high_value time_period_length \
0 17295.0 11033.79 1.0 7
1 17304.0 1613.64 1.0 7
2 17305.0 2121.38 1.0 7
3 17312.0 1774.44 1.0 7
4 17329.0 693.24 1.0 7
5 17341.0 2275.77 1.0 7
6 17346.0 5336.76 1.0 7
7 17347.0 8921.38 1.0 7
8 17358.0 4007.84 0.0 7
9 17367.0 5766.12 0.0 7
10 17379.0 4150.32 1.0 7
11 17382.0 734.40 0.0 7
12 17390.0 0.00 1.0 7
13 17529.0 5000.00 1.0 7
14 17559.0 0.00 1.0 7
15 17576.0 0.00 1.0 7
16 17582.0 10628.82 1.0 7
17 17606.0 237.44 1.0 7
18 17611.0 917.96 1.0 7
19 17614.0 1272.85 1.0 7
20 17616.0 3231.26 1.0 7
21 17619.0 9493.31 1.0 7
22 17620.0 21696.04 0.0 7
23 17621.0 1403.33 0.0 7
24 17627.0 0.00 0.0 7
25 17629.0 453.68 0.0 7
26 17639.0 4666.36 1.0 7
27 17642.0 0.00 0.0 7
28 17644.0 4306.88 1.0 7
time_period_value time_period_start time_period_end \
0 30938.45 17289.0 17295.0
1 25669.63 17298.0 17304.0
2 28048.14 17299.0 17305.0
3 28185.27 17306.0 17312.0
4 26170.57 17323.0 17329.0
5 36550.32 17335.0 17341.0
6 32748.46 17340.0 17346.0
7 43366.39 17341.0 17347.0
8 28986.47 17352.0 17358.0
9 24627.51 17361.0 17367.0
10 24760.60 17373.0 17379.0
11 20645.43 17376.0 17382.0
12 9898.20 17384.0 17390.0
13 12467.02 17523.0 17529.0
14 12538.81 17553.0 17559.0
15 14351.24 17570.0 17576.0
16 20905.00 17576.0 17582.0
17 24400.76 17600.0 17606.0
18 26485.20 17605.0 17611.0
19 39287.70 17608.0 17614.0
20 41543.95 17610.0 17616.0
21 43060.81 17613.0 17619.0
22 34854.90 17614.0 17620.0
23 13158.86 17615.0 17621.0
24 15240.38 17621.0 17627.0
25 18004.12 17623.0 17629.0
26 27038.60 17633.0 17639.0
27 24620.15 17636.0 17642.0
28 27470.00 17638.0 17644.0
comparison_days_ago comparison_value comparison_period_start \
0 28.0 19056.30 17261.0
1 14.0 21610.99 17284.0
2 28.0 24321.11 17271.0
3 28.0 14530.01 17278.0
4 28.0 20087.97 17295.0
5 28.0 30796.60 17307.0
6 14.0 28394.23 17326.0
7 28.0 22758.57 17313.0
8 14.0 36122.77 17338.0
9 28.0 29509.53 17333.0
10 7.0 19662.71 17366.0
11 28.0 30518.06 17348.0
12 1.0 4487.40 17383.0
13 28.0 0.00 17495.0
14 28.0 10393.82 17525.0
15 28.0 2177.36 17542.0
16 28.0 602.64 17548.0
17 28.0 19042.76 17572.0
18 28.0 14042.68 17577.0
19 28.0 9351.16 17580.0
20 28.0 7909.36 17582.0
21 28.0 464.28 17585.0
22 1.0 43060.81 17613.0
23 14.0 24163.32 17601.0
24 14.0 17591.66 17607.0
25 14.0 39418.18 17609.0
26 14.0 12906.06 17619.0
27 28.0 39287.70 17608.0
28 14.0 18153.08 17624.0
comparison_period_end
0 17267.0
1 17290.0
2 17277.0
3 17284.0
4 17301.0
5 17313.0
6 17332.0
7 17319.0
8 17344.0
9 17339.0
10 17372.0
11 17354.0
12 17389.0
13 17501.0
14 17531.0
15 17548.0
16 17554.0
17 17578.0
18 17583.0
19 17586.0
20 17588.0
21 17591.0
22 17619.0
23 17607.0
24 17613.0
25 17615.0
26 17625.0
27 17614.0
28 17630.0
``pd.show_versions() 的输出
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-122-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: None
pip: 10.0.1
setuptools: 39.1.0
Cython: None
numpy: 1.14.3
scipy: None
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
解决方案
作为一种解决方法,您可以在将 R 日期发送回 Python 之前合并一些将 R 日期转换为字符串的东西吗?
library(lubridate)
df[sapply(df, is.Date)] <- lapply(df[sapply(df, is.Date)], as.character)
我并不真正使用日期,所以这是我的(简单)理解。R 将日期存储为数字,并包含一些额外的信息,给出编号开始的日期/时间、时区信息等。当您的数据帧返回 Python 时,看起来这在翻译中丢失了,因此将它们存储为字符是可能更安全。
推荐阅读
- postgresql - “Int 类型的变量邀请 ID!用于期望 Int_comparison_exp 的位置”
- reactjs - 如何使用从 .scss 文件导出的变量来创建 Material-UI 主题?
- azure - 我怎样才能通过我的 API 网关?(AKS)
- python - 熊猫公历日期到朱利安
- r - 将值附加到 R 中列表的子列表的最佳方法是什么?
- python - json.load 和 json.loads 无法加载 JSON 文件数据,Python
- java - KafkaStreams Scala 在 2 个字段上分组以获得不同的计数
- javascript - 我正在尝试在反应中实施 menuitem 但出现错误
- nginx - 如何在 terraform 中创建 nginx 入口 - aks
- php - 任何人都可以帮助我使用 php-iban 库吗?