首页 > 解决方案 > 如何在 R 中读取 .dat 文件中的二进制数据?使用 readBin 时出错

问题描述

我有一个大的 .dat 文件(30GB),里面是二进制文件,因为当我使用文本编辑器打开它时,我无法立即看到任何有意义的东西。我对每一列都有具体的定义,应该是超过 900 列和 30,000,000 行。所有列名和值都定义为具有不同长度的字符。这是我在尝试 readBin 函数时收到的警告:

df = readBin(bdata,字符(),n = 10)

警告消息:1:在 readBin(bdata, character(), n = 10) 中:找不到空终止符:在 10000 字节处断开字符串

df 给了我这个:

02375606320105657659301200400301200800500000984400001100000001173000001271600001358300001411000001490500001577500001696500001857500001260500001279200001808300001326300001346800002018200002117500002111700001467300001478000002296100002373300001656100001584800003445000003445000003445000003445000003445000003415000003415000003415000003738600003738600002415000002405000003405000002405000002555000003555000003555000003555000002585000002505000003505000003505000002505000002525000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031100000025800000020500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000428000000435300000435600000438400000431300000400000000404200000399900000402100000399100000394500000395400000393700000397900000392200000397000000397200000395700000397500000395900000406600000393900000397500000401800000430000000430000000430000000430000000430000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000428000000435300000435600000438400000431300000400000000404200000399900000402100000399100000394500000395400000393700000397900000392200000397000000397200000395700000397500000395900000406600000393900000397500000401800000430000000430000000430000000430000000430000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000428000000435300000435600000438400000431300000400000000404200000399900000402100000399100000394500000395400000393700000397900000392200000397000000397200000395700000397500000395900000406600000393900000397500000401800000430000000430000000430000000430000000430000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000040000000040000000040000000040000000040000000040000000040000000040000000040000000040000000040000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004000000004000000004000000004000000004000000004000000004000000004000000004000000004000000004000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

...

这不是我应该看到的。请指教。谢谢。

标签: r

解决方案


如果只有 8 列,这将读取它的开头。根据您的文档,您需要填写其余部分:

read.fwf(bdata, widths = c(10, 2, 3, 3, 3, 3, 3, 3), 
         col.names = c("Col1", "Col2", ...),  # Use the actual names
         n = 3)   # Limit to reading 3 lines until you've got it right, then
                  # remove n = 3 to read the whole file
       

推荐阅读