r - 小数据框导致 R 崩溃
问题描述
我有一个(分组的)data.frames 列表,它们都有 1 或 2 行并且都具有相同的列。其中两个数据框按预期工作得非常好。但是,将第三个数据帧表达到控制台或以任何方式对其进行操作都会导致 R 崩溃。在某些 R Studio 环境中,我什至无法使用 readRDS() 加载数据。第三个数据是否有可能包含一些嵌入数据?如果是这样我怎么能检查它?真的不可能重现一个例子,所以我在这里把小数据集上传到了filedropper:
https://www.filedropper.com/filemanager/public.php?service=files&t=0c7cbfc10bc788e4515814748c96399b
> library(dplyr)
>
> df_list <- readRDS(file = "C:\\Users\\crist\\Desktop\\dataframe_list.rds")
>
> df_list[[1]] %>% ungroup() %>% class()
[1] "tbl_df" "tbl" "data.frame"
> df_list[[1]] %>% ungroup() %>% colnames()
[1] "leg.id" "arb_identifier" "SecurityID" "date" "UnderlyingClose" "UnderlyingOpen"
[7] "TotalReturn" "ReferenceExchange" "OptionID" "Expiration" "CallPut" "Strike"
[13] "Volume" "OpenInterest" "ImpliedVolatility" "Delta" "Gamma" "Vega"
[19] "Theta" "AdjustmentFactor" "BestBid" "BestOffer" "Last" "LastTradeDate"
[25] "T" "stale" "old" "roll" "n_opt_shares" "delta.hedge"
[31] "OrigBid" "OrigOffer" "PXRecov" "acquisition_date" "tranche_id"
> df_list[[1]] %>% ungroup() %>% NROW()
[1] 1
> df_list[[1]] %>% ungroup()%>% data.frame()
leg.id arb_identifier SecurityID date UnderlyingClose UnderlyingOpen TotalReturn ReferenceExchange OptionID Expiration
1 L_P_OTM5.0_93_0 1 506528 2005-12-19 5539.8 5531.6 0.001482339 -99 150042133 2006-01-20
CallPut Strike Volume OpenInterest ImpliedVolatility Delta Gamma Vega Theta AdjustmentFactor BestBid
1 P 2.581493e-320 8.685674e-321 16674 0.1386455 -0.06867341 0.0005814221 216.8916 -164.2587 0 7
BestOffer Last LastTradeDate T stale old roll n_opt_shares delta.hedge OrigBid OrigOffer PXRecov acquisition_date tranche_id
1 7 7 <NA> 32 days FALSE FALSE FALSE 2499 Inf 35 35 0.2 2005-10-03 9381673
>
> df_list[[2]] %>% ungroup() %>% class()
[1] "tbl_df" "tbl" "data.frame"
> df_list[[2]] %>% ungroup() %>% colnames()
[1] "leg.id" "arb_identifier" "SecurityID" "date" "UnderlyingClose" "UnderlyingOpen"
[7] "TotalReturn" "ReferenceExchange" "OptionID" "Expiration" "CallPut" "Strike"
[13] "Volume" "OpenInterest" "ImpliedVolatility" "Delta" "Gamma" "Vega"
[19] "Theta" "AdjustmentFactor" "BestBid" "BestOffer" "Last" "LastTradeDate"
[25] "T" "stale" "old" "roll" "n_opt_shares" "delta.hedge"
[31] "OrigBid" "OrigOffer" "PXRecov" "acquisition_date" "tranche_id"
> df_list[[2]] %>% ungroup() %>% NROW()
[1] 1
> df_list[[2]] %>% ungroup()%>% data.frame()
leg.id arb_identifier SecurityID date UnderlyingClose UnderlyingOpen TotalReturn ReferenceExchange OptionID Expiration
1 L_P_OTM5.0_93_0 1 506528 2005-12-20 5547.9 5539.8 0.001462164 -99 150042133 2006-01-20
CallPut Strike Volume OpenInterest ImpliedVolatility Delta Gamma Vega Theta AdjustmentFactor BestBid
1 P 2.581493e-320 1.167971e-320 17155 0.1427761 -0.0636712 0.0005403475 201.6763 -158.4806 0 6.5
BestOffer Last LastTradeDate T stale old roll n_opt_shares delta.hedge OrigBid OrigOffer PXRecov acquisition_date tranche_id
1 6.5 6.5 <NA> 31 days FALSE FALSE FALSE 2499 Inf 35 35 0.1857143 2005-10-03 9381673
>
> df_list[[3]] %>% ungroup() %>% class()
[1] "tbl_df" "tbl" "data.frame"
> df_list[[3]] %>% ungroup() %>% colnames()
[1] "leg.id" "arb_identifier" "SecurityID" "date" "UnderlyingClose" "UnderlyingOpen"
[7] "TotalReturn" "ReferenceExchange" "OptionID" "Expiration" "CallPut" "Strike"
[13] "Volume" "OpenInterest" "ImpliedVolatility" "Delta" "Gamma" "Vega"
[19] "Theta" "AdjustmentFactor" "BestBid" "BestOffer" "Last" "LastTradeDate"
[25] "T" "stale" "old" "roll" "n_opt_shares" "delta.hedge"
[31] "OrigBid" "OrigOffer" "PXRecov" "acquisition_date" "tranche_id"
> df_list[[3]] %>% ungroup() %>% NROW()
[1] 1
解决方案
我不确切知道该文件中有什么损坏,但我可以在 linux 中重现它。我已经能够通过删除第 12-13 列来避免它(并且只丢失两列数据):
as.data.frame(df_list[[1]]) # no problem
as.data.frame(df_list[[2]]) # no problem
as.data.frame(df_list[[3]])[-(12:13)]
# leg.id arb_identifier SecurityID date UnderlyingClose
# 1 L_P_OTM5.0_93_0 1 506528 2005-12-21 5587.4
# UnderlyingOpen TotalReturn ReferenceExchange OptionID Expiration CallPut
# 1 5547.9 0.007119811 -99 150042133 2006-01-20 P
# OpenInterest ImpliedVolatility Delta Gamma Vega Theta
# 1 18051 0.1430592 -0.04223828 0.0003929387 144.2409 -117.1857
# AdjustmentFactor BestBid BestOffer Last LastTradeDate T stale old
# 1 0 4 4 4 <NA> 30 days FALSE FALSE
# roll n_opt_shares delta.hedge OrigBid OrigOffer PXRecov acquisition_date
# 1 FALSE 2499 Inf 35 35 0.1142857 2005-10-03
# tranche_id
# 1 9381673
我的猜测是文件本身以某种方式损坏(我对内部 RDS 结构的了解不够深入,无法深入研究),并且第 12-13 列(Strike
和Volume
)导致了问题。(如果您可以在没有或从第三帧重新生成这两个值的情况下生活,那么您应该能够继续前进。)