r - R data.table 按行条件从 .csv 或文本文件中读取特定单元格
问题描述
我正在开发一个小型闪亮应用程序,它可以读取和处理原始 ELISA 数据。数据格式如下:
1560200 Version 1.06
15/09/2021 16:22
Serial Number: 19103026
Protocol Name: euroimmune elisa
Read: Normal 00:05 Shake
Blanks: None
450 1 2 3 4 5 6 7 8 9 10 11 12
A 0.612 0.081 0.638 0 0 0 0 0 0 0 0 0
B 1.886 0.082 0.502 0 0 0 0 0 0 0 0 0
C 0.064 1.297 0.072 0 0 0 0 0 0 0 0 0
D 2.343 0.571 0.069 0 0 0 0 0 0 0 0 0
E 0.082 0.289 0.621 0 0 0 0 0 0 0 0 0
F 0.265 3.648 0.224 0 0 0 0 0 0 0 0 0
G 1.063 0.114 2.411 0 0 0 0 0 0 0 0 0
H 0.942 0.685 0.09 0 0 0 0 0 0 0 0 0
630 1 2 3 4 5 6 7 8 9 10 11 12
A 0.037 0.035 0.034 0 0 0 0 0 0 0 0 0
B 0.037 0.035 0.036 0 0 0 0 0 0 0 0 0
C 0.041 0.039 0.035 0 0 0 0 0 0 0 0 0
D 0.037 0.033 0.037 0 0 0 0 0 0 0 0 0
E 0.036 0.033 0.037 0 0 0 0 0 0 0 0 0
F 0.035 0.041 0.037 0 0 0 0 0 0 0 0 0
G 0.036 0.034 0.041 0 0 0 0 0 0 0 0 0
H 0.037 0.035 0.036 0 0 0 0 0 0 0 0 0
Delta 1 2 3 4 5 6 7 8 9 10 11 12
A 0.575 0.046 0.604 0.575 0 0 0 0 0 0 0 0
B 1.849 0.047 0.466 1.849 0 0 0 0 0 0 0 0
C 0.023 1.258 0.037 0.023 0 0 0 0 0 0 0 0
D 2.306 0.538 0.032 2.306 0 0 0 0 0 0 0 0
E 0.046 0.256 0.584 0.046 0 0 0 0 0 0 0 0
F 0.23 3.607 0.187 0 0 0 0 0 0 0 0 0
G 1.027 0.08 2.37 0 0 0 0 0 0 0 0 0
H 0.905 0.65 0.054 0 0 0 0 0 0 0 0 0
值得注意的是,行标题没有列名,其余列标题仅编号为 1 到 12。如果单元格有结果,则其中会有一个数值 - 否则空白单元格的值为 0。
我有一些代码使用我通常使用的 ELISA 阅读器的输出准确指定从哪些单元格读取数据(通过单元格引用):
# Import raw ELISA data from .csv:
elisa_raw <- fread(file = params$resultats,
stringsAsFactors = FALSE, # read in strings as characters
skip = 27, # number of rows to skip
select = c(2:14), # which columns to select
header = TRUE, # use the first row of the selected data as column names
dec = ".", # specify the decimal separator (comma or period)
colClasses = list(character = 2, numeric = 3:14)) # predefine column types
这很好用,但是我想升级代码以处理更一般的情况 - 例如,如果我的 ELISA 阅读器上的软件更新,跳过的行数可能会改变。如果我希望我的应用程序可用于拥有不同型号 ELISA 阅读器的人,那么肯定会有不同数量的行或列要跳过。
在所有这些场景中保持不变的一件事是数据的结构:它保持如上,有 8 行和 13 列(第一个是行标题),其中行命名为 A - H,列命名1 - 12. 包含行标题的列通常没有名称。
我想看看是否有一种方法可以有条件地选择要使用此结构导入的数据单元格。我尝试grep -E
如下 fread 并在这篇文章中描述:
elisa_raw <- fread(cmd = "grep -E 'A|B|C|D|E|F|G|H' TF_210915_IPB.txt")
我认为这将导入行的第一个值为 A 或 B 或 C 等的所有行。但是这给了我一个错误:
> elisa_raw <- fread(cmd = "grep -E 'A|B|C|D|E|F|G|H' TF_210915_IPB.txt")
'grep' is not recognized as an internal or external command,
operable program or batch file.
Warning messages:
1: In (if (.Platform$OS.type == "unix") system else shell)(paste0("(", :
'(grep -E 'A|B|C|D|E|F|G|H' TF_210915_IPB.txt) > C:\Users\Username\AppData\Local\Temp\Rtmpeki6Rv\file4010600782e' execution failed with error code 255
2: In fread(cmd = "grep -E 'A|B|C|D|E|F|G|H' TF_210915_IPB.txt") :
File 'C:\Users\Username\AppData\Local\Temp\Rtmpeki6Rv\file4010600782e' has size 0. Returning a NULL data.table.
与原始帖子不同,我无法指定列名,因为行标题没有。我正在使用 Windows 10 的 PC;我知道grep -E
在这种情况下这是正确的命令,但现在不太确定给出错误消息。
我怎样才能使这项工作或有另一种方法来解决这个问题?
编辑:
为了演示我正在导入的文件类型及其变化方式,我在下面提供了两个不同的原始 ELISA 文件示例的 dput。它们在没有任何规范的情况下被导入到 r 中,以保留它们的结构,如下所示:
# Example file 1: .csv from a Biotek reader
biotek <- fread(file = "Biotek_21091501.csv", stringsAsFactors = FALSE, na.strings = c(""))
# Example file 2: .txt from a ThermoFisher Scientific reader
tfs <- fread(file = "TF_21091501.txt", stringsAsFactors = FALSE, na.strings = c(""))
这些示例文件可以通过使用下面的 dput 在 R 中重新创建对象然后分别导出为 .csv 和 .txt 来重新创建(没有列标题或行名)。
请注意,这两个文件之一实际上具有三个符合我标准的数据结构,而我只需要第三个(名为 delta)。目前,尽管我不介意任何可以带回所有三个的解决方案,因为我可以在之后对它们进行子集化。
# Example data from Biotek ELISA reader 1:
> dput(biotek)
structure(list(V1 = c("1560200 Version 1.06", "15/09/2021 16:22",
"Serial Number:", "Protocol Name:", "Read:", "Blanks:", "", "450",
"", "", "", "", "", "", "", "", "", "630", "", "", "", "", "",
"", "", "", "", "Delta", "", "", "", "", "", "", "", ""), V2 = c("",
"", "19103026", "euroimmune elisa", "Normal", "None", "", "",
"A", "B", "C", "D", "E", "F", "G", "H", "", "", "A", "B", "C",
"D", "E", "F", "G", "H", "", "", "A", "B", "C", "D", "E", "F",
"G", "H"), V3 = c("", "", "", "", "00:05 Shake", "", "", "1",
"0.612", "1.886", "0.064", "2.343", "0.082", "0.265", "1.063",
"0.942", "", "1", "0.037", "0.037", "0.041", "0.037", "0.036",
"0.035", "0.036", "0.037", "", "1", "0.575", "1.849", "0.023",
"2.306", "0.046", "0.23", "1.027", "0.905"), V4 = c(NA, NA, NA,
NA, NA, NA, NA, 2, 0.081, 0.082, 1.297, 0.571, 0.289, 3.648,
0.114, 0.685, NA, 2, 0.035, 0.035, 0.039, 0.033, 0.033, 0.041,
0.034, 0.035, NA, 2, 0.046, 0.047, 1.258, 0.538, 0.256, 3.607,
0.08, 0.65), V5 = c(NA, NA, NA, NA, NA, NA, NA, 3, 0.638, 0.502,
0.072, 0.069, 0.621, 0.224, 2.411, 0.09, NA, 3, 0.034, 0.036,
0.035, 0.037, 0.037, 0.037, 0.041, 0.036, NA, 3, 0.604, 0.466,
0.037, 0.032, 0.584, 0.187, 2.37, 0.054), V6 = c(NA, NA, NA,
NA, NA, NA, NA, 4, 0, 0, 0, 0, 0, 0, 0, 0, NA, 4, 0, 0, 0, 0,
0, 0, 0, 0, NA, 4, 0.575, 1.849, 0.023, 2.306, 0.046, 0, 0, 0
), V7 = c(NA, NA, NA, NA, NA, NA, NA, 5L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, NA, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, 5L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), V8 = c(NA, NA, NA, NA, NA, NA, NA,
6L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, 6L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, NA, 6L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V9 = c(NA,
NA, NA, NA, NA, NA, NA, 7L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA,
7L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, 7L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), V10 = c(NA, NA, NA, NA, NA, NA, NA, 8L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, NA, 8L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
NA, 8L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V11 = c(NA, NA, NA,
NA, NA, NA, NA, 9L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, 9L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, 9L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), V12 = c(NA, NA, NA, NA, NA, NA, NA, 10L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, NA, 10L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA,
10L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V13 = c(NA, NA, NA, NA,
NA, NA, NA, 11L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, 11L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, 11L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), V14 = c(NA, NA, NA, NA, NA, NA, NA, 12L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, NA, 12L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
NA, 12L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), row.names = c(NA,
-36L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000002a5ddbf1ef0>)
# Example data from ThermoFisher Scientific ELISA reader 2:
> dput(tfs)
structure(list(V1 = c("Protocole:", "Format de plaque:", "",
"Absorbances Filtre 1: 450nm", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "A", "B", "C", "D", "E", "F", "G", "H"), V2 = c("Demo3 15.09.2021 17:11:54",
"96 puits", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "1", "0.456",
"1.337", "0.065", "1.577", "0.123", "0.33", "1.094", "0.291"),
V3 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2,
0.134, 0.116, 1.112, 0.29, 0.209, 2.915, 0.353, 0.719), V4 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3, 0.513, 0.342,
0.109, 0.112, 0.562, 0.238, 2.224, 0.15), V5 = c(NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), V6 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V7 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 6L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L), V8 = c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 7L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V9 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 8L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L), V10 = c(NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 9L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
V11 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 10L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V12 = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 11L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), V13 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 12L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), row.names = c(NA,
-36L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000002a5ddbf1ef0>)
解决方案
推荐阅读
- firebase - 进行firebase身份验证的正确方法是什么
- kubernetes - 使用 K8S 横向扩展
- google-app-engine - 使用 Google App Engine Image API 时有什么方法可以配置平移/裁剪位置?
- swift - 将多个按钮链接到同一个站点 xcode
- react-native - 无法使用 AsyncStorage API 存储登录的用户数据
- azure - 将资源组所有者授予个人和企业帐户
- java - “else if”语句中的逻辑运算符
- java - spring boot-无法读取yaml属性文件application.yml
- android - android/view/View$OnUnhandledKeyEvent 监听器错误
- c++ - 将基本类型数组中的内存重用于不同(但仍然是基本)类型数组是否合法