r - 如何在 R 中读取 .dat 文件中的二进制数据?使用 readBin 时出错
问题描述
我有一个大的 .dat 文件(30GB),里面是二进制文件,因为当我使用文本编辑器打开它时,我无法立即看到任何有意义的东西。我对每一列都有具体的定义,应该是超过 900 列和 30,000,000 行。所有列名和值都定义为具有不同长度的字符。这是我在尝试 readBin 函数时收到的警告:
df = readBin(bdata,字符(),n = 10)
警告消息:1:在 readBin(bdata, character(), n = 10) 中:找不到空终止符:在 10000 字节处断开字符串
df 给了我这个:
02375606320105657659301200400301200800500000984400001100000001173000001271600001358300001411000001490500001577500001696500001857500001260500001279200001808300001326300001346800002018200002117500002111700001467300001478000002296100002373300001656100001584800003445000003445000003445000003445000003445000003415000003415000003415000003738600003738600002415000002405000003405000002405000002555000003555000003555000003555000002585000002505000003505000003505000002505000002525000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031100000025800000020500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000428000000435300000435600000438400000431300000400000000404200000399900000402100000399100000394500000395400000393700000397900000392200000397000000397200000395700000397500000395900000406600000393900000397500000401800000430000000430000000430000000430000000430000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000428000000435300000435600000438400000431300000400000000404200000399900000402100000399100000394500000395400000393700000397900000392200000397000000397200000395700000397500000395900000406600000393900000397500000401800000430000000430000000430000000430000000430000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000428000000435300000435600000438400000431300000400000000404200000399900000402100000399100000394500000395400000393700000397900000392200000397000000397200000395700000397500000395900000406600000393900000397500000401800000430000000430000000430000000430000000430000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000040000000040000000040000000040000000040000000040000000040000000040000000040000000040000000040000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004000000004000000004000000004000000004000000004000000004000000004000000004000000004000000004000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
...
这不是我应该看到的。请指教。谢谢。
解决方案
如果只有 8 列,这将读取它的开头。根据您的文档,您需要填写其余部分:
read.fwf(bdata, widths = c(10, 2, 3, 3, 3, 3, 3, 3),
col.names = c("Col1", "Col2", ...), # Use the actual names
n = 3) # Limit to reading 3 lines until you've got it right, then
# remove n = 3 to read the whole file
推荐阅读
- javascript - 如何使用 PHP 中循环数据中的方法 post 将数据发送到控制器
- c++ - 我的 epoll 服务器丢失了一些连接。为什么?
- javascript - Chart.js 在悬停时显示旧图表
- elasticsearch - Elasticsearch 管道搜索?
- adsense-api - 为什么 AdSense 网络界面和 API 报告的数字不同?
- ajax - 如何将codeigniter与ajax连接起来
- apache-kafka - 卡夫卡不开始空白输出
- python - 将 zbar 实施到 pyinstaller - Exe 不工作
- php - Wordpress:找不到 PHP 脚本 HTTP 404 错误
- java - PageNotFound:1136 - 在名为“spring”的 DispatcherServlet 中找不到具有 URI [] 的 HTTP 请求的映射