grep - 计算不包含特定单词的特定行
问题描述
请问我有问题:我有一个这样的文件
@HWI-ST273:296:C0EFRACXX:2:2101:17125:145325/1
TTAATACACCCAACCAGAAGTTAGCTCCTTCACTTTCAGCTAAATAAAAG
+
8?8A;DDDD;@?++8A?;C;F92+2A@19:1*1?DDDECDE?B4:BDEEI
@BBBB-ST273:296:C0EFRACXX:2:1303:5281:183410/1
TAGCTCCTTCGCTTTCAGCTAAATAAAAGCCCAGTACTTCTTTTTTACCA
+
CCBFFFFFFHHHHJJJJJJJJJIIJJJJJJJJJJJJJJJJJJJIJJJJJI
@HWI-ST273:296:C0EFRACXX:2:1103:16617:140195/1
AAGTTAGCTCCTTCGCTTTCAGCTAAATAAAAGCCCAGTACTTCTTTTTT
+
@C@FF?EDGFDHH@HGHIIGEGIIIIIEDIIGIIIGHHHIIIIIIIIIII
@HWI-ST273:296:C0EFRACXX:2:1207:14316:145263/1
AATACACCCAACCAGAAGTTAGCTCCTTCGCTTTCAGCTAAATAAAAGCC
+
CCCFFFFFHHHHHJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJIJ
我
我对以“@HWI”开头的行感兴趣,但我想计算所有不以“@HWI”开头的行。在所示示例中,结果将为 1,因为有一行以“@BBB”开头。
更清楚地说:我只想知道不是'@HWI'的模式的第一行(重复的4行)的数量;我希望我足够清楚。如果您需要更多说明,请告诉我
解决方案
With GNU sed, you can use its extended address to print every fourth line, then use grep to count the ones that don't start with @HWI:
sed -n '1~4p' file.fastq | grep -cv '^@HWI'
Otherwise, you can use e.g. Perl
perl -ne 'print if 1 == $. % 4' -- file.fastq | grep -cv '^@HWI'
$.
contains the current line number, %
is the modulo operator.
But once we're running Perl, we don't need grep anymore:
perl -lne '++$c if 1 == $. % 4; END { print $c }' -- file.fastq
-l
removes newlines from input and adds them to output.
推荐阅读
- visual-studio - Visual Studio 2019 未发布 CSHTML 文件
- python - django Queryset filter(a__b__c=d): 获取 b 用于进一步过滤
- spring-boot - java.lang.IllegalStateException:之前查找 Docker 环境的尝试失败。不会重试。请查看日志并检查配置
- javascript - 在 JavaScript 中提交表单时动态导入模块
- python - python:日期必须是该月的第一周和第三周
- electron - 如何在 Electron 中安全地包含 C++ SDK
- react-native - 在弹出警报中集成倒数计时器 React native
- python - 为什么我收到索引错误异常错误?
- wso2 - WSO2 Identity Server 5.8 - SAML2 Web SSO:浏览器兼容性
- rust - 在 rust doc 注释中显示矩阵