首页 > 解决方案 > 小数据框导致 R 崩溃

问题描述

我有一个(分组的)data.frames 列表,它们都有 1 或 2 行并且都具有相同的列。其中两个数据框按预期工作得非常好。但是,将第三个数据帧表达到控制台或以任何方式对其进行操作都会导致 R 崩溃。在某些 R Studio 环境中,我什至无法使用 readRDS() 加载数据。第三个数据是否有可能包含一些嵌入数据?如果是这样我怎么能检查它?真的不可能重现一个例子,所以我在这里把小数据集上传到了filedropper:

https://www.filedropper.com/filemanager/public.php?service=files&t=0c7cbfc10bc788e4515814748c96399b

> library(dplyr)
> 
> df_list <- readRDS(file = "C:\\Users\\crist\\Desktop\\dataframe_list.rds")
> 
> df_list[[1]] %>% ungroup() %>% class()
[1] "tbl_df"     "tbl"        "data.frame"
> df_list[[1]] %>% ungroup() %>% colnames()
 [1] "leg.id"            "arb_identifier"    "SecurityID"        "date"              "UnderlyingClose"   "UnderlyingOpen"   
 [7] "TotalReturn"       "ReferenceExchange" "OptionID"          "Expiration"        "CallPut"           "Strike"           
[13] "Volume"            "OpenInterest"      "ImpliedVolatility" "Delta"             "Gamma"             "Vega"             
[19] "Theta"             "AdjustmentFactor"  "BestBid"           "BestOffer"         "Last"              "LastTradeDate"    
[25] "T"                 "stale"             "old"               "roll"              "n_opt_shares"      "delta.hedge"      
[31] "OrigBid"           "OrigOffer"         "PXRecov"           "acquisition_date"  "tranche_id"       
> df_list[[1]] %>% ungroup() %>% NROW()
[1] 1
> df_list[[1]] %>% ungroup()%>% data.frame()
           leg.id arb_identifier SecurityID       date UnderlyingClose UnderlyingOpen TotalReturn ReferenceExchange  OptionID Expiration
1 L_P_OTM5.0_93_0              1     506528 2005-12-19          5539.8         5531.6 0.001482339               -99 150042133 2006-01-20
  CallPut        Strike        Volume OpenInterest ImpliedVolatility       Delta        Gamma     Vega     Theta AdjustmentFactor BestBid
1       P 2.581493e-320 8.685674e-321        16674         0.1386455 -0.06867341 0.0005814221 216.8916 -164.2587                0       7
  BestOffer Last LastTradeDate       T stale   old  roll n_opt_shares delta.hedge OrigBid OrigOffer PXRecov acquisition_date tranche_id
1         7    7          <NA> 32 days FALSE FALSE FALSE         2499         Inf      35        35     0.2       2005-10-03    9381673
> 
> df_list[[2]] %>% ungroup() %>% class()
[1] "tbl_df"     "tbl"        "data.frame"
> df_list[[2]] %>% ungroup() %>% colnames()
 [1] "leg.id"            "arb_identifier"    "SecurityID"        "date"              "UnderlyingClose"   "UnderlyingOpen"   
 [7] "TotalReturn"       "ReferenceExchange" "OptionID"          "Expiration"        "CallPut"           "Strike"           
[13] "Volume"            "OpenInterest"      "ImpliedVolatility" "Delta"             "Gamma"             "Vega"             
[19] "Theta"             "AdjustmentFactor"  "BestBid"           "BestOffer"         "Last"              "LastTradeDate"    
[25] "T"                 "stale"             "old"               "roll"              "n_opt_shares"      "delta.hedge"      
[31] "OrigBid"           "OrigOffer"         "PXRecov"           "acquisition_date"  "tranche_id"       
> df_list[[2]] %>% ungroup() %>% NROW()
[1] 1
> df_list[[2]] %>% ungroup()%>% data.frame()
           leg.id arb_identifier SecurityID       date UnderlyingClose UnderlyingOpen TotalReturn ReferenceExchange  OptionID Expiration
1 L_P_OTM5.0_93_0              1     506528 2005-12-20          5547.9         5539.8 0.001462164               -99 150042133 2006-01-20
  CallPut        Strike        Volume OpenInterest ImpliedVolatility      Delta        Gamma     Vega     Theta AdjustmentFactor BestBid
1       P 2.581493e-320 1.167971e-320        17155         0.1427761 -0.0636712 0.0005403475 201.6763 -158.4806                0     6.5
  BestOffer Last LastTradeDate       T stale   old  roll n_opt_shares delta.hedge OrigBid OrigOffer   PXRecov acquisition_date tranche_id
1       6.5  6.5          <NA> 31 days FALSE FALSE FALSE         2499         Inf      35        35 0.1857143       2005-10-03    9381673
> 
> df_list[[3]] %>% ungroup() %>% class()
[1] "tbl_df"     "tbl"        "data.frame"
> df_list[[3]] %>% ungroup() %>% colnames()
 [1] "leg.id"            "arb_identifier"    "SecurityID"        "date"              "UnderlyingClose"   "UnderlyingOpen"   
 [7] "TotalReturn"       "ReferenceExchange" "OptionID"          "Expiration"        "CallPut"           "Strike"           
[13] "Volume"            "OpenInterest"      "ImpliedVolatility" "Delta"             "Gamma"             "Vega"             
[19] "Theta"             "AdjustmentFactor"  "BestBid"           "BestOffer"         "Last"              "LastTradeDate"    
[25] "T"                 "stale"             "old"               "roll"              "n_opt_shares"      "delta.hedge"      
[31] "OrigBid"           "OrigOffer"         "PXRecov"           "acquisition_date"  "tranche_id"       
> df_list[[3]] %>% ungroup() %>% NROW()
[1] 1

标签: rdataframecrashrstudio

解决方案


我不确切知道该文件中有什么损坏,但我可以在 linux 中重现它。我已经能够通过删除第 12-13 列来避免它(并且只丢失两列数据):

as.data.frame(df_list[[1]]) # no problem
as.data.frame(df_list[[2]]) # no problem
as.data.frame(df_list[[3]])[-(12:13)]
#            leg.id arb_identifier SecurityID       date UnderlyingClose
# 1 L_P_OTM5.0_93_0              1     506528 2005-12-21          5587.4
#   UnderlyingOpen TotalReturn ReferenceExchange  OptionID Expiration CallPut
# 1         5547.9 0.007119811               -99 150042133 2006-01-20       P
#   OpenInterest ImpliedVolatility       Delta        Gamma     Vega     Theta
# 1        18051         0.1430592 -0.04223828 0.0003929387 144.2409 -117.1857
#   AdjustmentFactor BestBid BestOffer Last LastTradeDate       T stale   old
# 1                0       4         4    4          <NA> 30 days FALSE FALSE
#    roll n_opt_shares delta.hedge OrigBid OrigOffer   PXRecov acquisition_date
# 1 FALSE         2499         Inf      35        35 0.1142857       2005-10-03
#   tranche_id
# 1    9381673

我的猜测是文件本身以某种方式损坏(我对内部 RDS 结构的了解不够深入,无法深入研究),并且第 12-13 列(StrikeVolume)导致了问题。(如果您可以在没有或从第三帧重新生成这两个值的情况下生活,那么您应该能够继续前进。)


推荐阅读