r - 使用 read.table 解析愚蠢的数据帧?
问题描述
我进行了一个相当大的实验并将所有数据保存到一个 csv 文件中,但数据似乎是一种......愚蠢的格式。模拟需要几天时间才能运行,而且我无法重新运行它,所以我很好奇是否可以在 R 中做任何事情来帮助我以文件的当前形式提取数据。
我似乎无法附上这个问题的文件,所以我会尽我所能解释这个困境。csv 文件是一列数据,全部包含在一个列中。在 Excel 中打开,第一个条目 A1 包含
[run number],"agent-preference","infection-rate","length-of-patch","incubation-rate","recovery-rate","init-infected","prop-move","total-pop","neighborhood-radius","move-distance","infective-radius","time-to-disease-spread","init-vaccinated","[step]","precision ((count turtles with [disease-status = 4]) / total-pop) 4"
单元格 A2-A1000 中此下方的所有条目都包含相同压缩格式的数据,例如
2,"none","0.75","100","0.1","0.12","0.002","0","2000","10","1","2","0","0","222","0.9815"
也就是说,每个单元格都以一种长逗号分隔格式(上面提到的愚蠢格式)包含所有数据。我认为为了解决这个问题,我可以使用read.table
, 定义我自己的列名(以绕过 A1 中的混乱),然后让逗号表示分隔,如下所示:
my.df<-read.table("run_1.csv", header = F,
col.names = c("run_number","agent_preference","infection_rate","length_of_patch",
"incubation_rate","recovery_rate","init_infected","prop_move","total_pop",
"neighborhood_radius","move_distance","infective_radius","time_to_disease_spread",
"init_vaccinated","step","outbreak_prop"),
sep = ",", # define the separator between columns
colClasses = c("character", "character", "factor", "integer", "factor", "factor",
"factor", "factor", "integer", "factor", "factor", "factor", "factor",
"factor", "factor", "factor"),
fill = TRUE) # add blank fields if rows have unequal length
请注意,我通过指定我自己的列名来绕过 A1 的时髦格式,并尝试预定义列类以提供帮助。不幸的是,这不起作用,我最终得到(这里使用单行数据框作为示例):
>my.df[1,]
run_number
1 2,"none","0.75","100","0.1","0.12","0.002","0","2000","10","1","2","0","0","222","0.9815"
agent_preference infection_rate
1
length_of_patch incubation_rate
1 NA
recovery_rate init_infected prop_move
1
total_pop neighborhood_radius
1 NA
move_distance infective_radius
1
time_to_disease_spread init_vaccinated
1
step outbreak_prop
1
如果我想查看这一行中的第一个条目,我会得到
> my.df[1,1]
[1] "2,\"none\",\"0.75\",\"100\",\"0.1\",\"0.12\",\"0.002\",\"0\",\"2000\",\"10\",\"1\",\"2\",\"0\",\"0\",\"222\",\"0.9815\""
这是错误的,因为(1)我希望各个条目是整个向量,而不是第一个向量,并且(2)我不确定在哪里引入破折号......
任何帮助将不胜感激。
解决方案
推荐阅读
- vba - 输入框编译错误
- c# - 如何在 C# 中以编程方式选择 datagridview 行
- ionic3 - 应用浏览器 IONIC 3 中的模板解析错误
- excel - 当任何一列包含某个单词时如何突出显示整行?
- .net - 使用轮询线程抓取重复项的 BlockingCollection
- ios - Swift & Firebase - Cloud firestore 可扩展?
- html - 如何从 Bootstrap 按钮中删除悬停效果?
- javascript - 如何将 TypeORM 与 ExpressJS 一起使用(从 Express 生成器生成的项目)?
- r - Rstudio 非常缓慢
- javascript - 为什么 axios 会导致 Access-Control-Allow-Origin 错误