c++ - 构建 k-mers 集合的重叠图
问题描述
我正在学习基因组测序中的图形算法课程,目前的任务涉及从一堆 k-mers 构建重叠图。正式的问题如下:
输入: k-mers 模式的集合。
输出:邻接表形式的重叠图。
我有一个工作算法可以处理除自循环以外的大多数情况。问题是我不知道何时/为什么应该添加自循环。对此的任何帮助将不胜感激。以下是一些说明我的问题和我当前解决方案的示例。谢谢!
示例 1:
输入:
ACT
CTT
TTT
输出:
TTT->TTT
ACT->CTT
CTT->TTT
示例 2:
输入:
中交
输出:
中交->中交
示例 3:
输入:
CT
TG
TG
TC
TT
TC
输出:
TC->CT
CT->TC,TG,TT
TT->TC,TG,TT
/*
* Creates an overlap graph from a list of k-mers
*
* \param patterns a vector of string k-mers
*
* \return a string containing an adjacency list representation of the
* overlap graph as described in the problem specification
*
* Output Format. The overlap graph Overlap(Patterns), in the form of an
* adjacency list. Specifically, each line of the output represents all
* edges (u,v) leaving a given node u in the format “u -> v”, where u and v
* are both k-mers. If a given node u has multiple edges leaving it (e.g. v and w),
* the destination nodes are comma-separated in any order. For example, if
* there exist nodes ACG, CGT, and CGG, the resulting line of the adjacency
* list would be: ACG -> CGT,CGG
*/
#include <algorithm>
#include <iostream>
#include <map>
#include <vector>
#include <set>
#include <string>
std::string overlap_graph(const std::vector<std::string>& patterns) {
// Overlap Graph of the k-mers collection patterns, in which a vertex exists for
// each unique k-mer, and an edge (u,v) exists if the last k-1 characters of u are
// equal to the first k-1 characters of v.
if (patterns.size() == 1) return patterns[0] + "->" + patterns[0];
// Build adjancency list O(n^2)
std::map <std::string, std::vector<std::string>> adj;
for (const auto& kmer_u : patterns) {
// Add kmer as key if it doesn't exist in the map and find its neighbours.
if (adj.find(kmer_u) == adj.end()) {
adj[kmer_u] = {};
for (const auto& kmer_v : patterns) {
if ((kmer_u != kmer_v) &&
(kmer_u.substr(1, kmer_u.length() - 1) == kmer_v.substr(0, kmer_v.length() - 1)) &&
(std::find(adj[kmer_u].begin(), adj[kmer_u].end(), kmer_v) == adj[kmer_u].end())) {
adj[kmer_u].push_back(kmer_v);
}
}
} // Otherwise continue.
}
// Convert to string O(n).
std::string ans;
for (auto const& key_value : adj) {
std::string kmer_u{ key_value.first };
std::vector<std::string> kmers_v{ key_value.second };
if (!kmers_v.empty()) {
ans.append(kmer_u + "->" + kmers_v[0]);
for (std::size_t kmer_v{ 1 }; kmer_v < kmers_v.size(); kmer_v++) {
ans.append(',' + kmers_v[kmer_v]);
}
ans.push_back('\n');
}
}
return ans;
}
void tests() {
/* Sample Dataset */
std::vector<std::string> patterns{ {"AAG"}, {"AGA"}, {"ATT"}, {"CTA"}, {"CTC"}, {"GAT"},
{"TAC"}, {"TCT"}, {"TCT"}, {"TTC"} };
std::string answer{ "TTC->TCT\nCTA->TAC\nAAG->AGA\nCTC->TCT\nTCT->CTA,CTC\nAGA->GAT\nGAT->ATT\nATT->TTC\n" };
std::cout << answer << std::endl;
// std::cout << (overlap_graph(patterns) == answer) << std::endl;
std::cout << overlap_graph(patterns) << std::endl;
}
void solution() {
std::vector<std::string> patterns;
std::string pattern;
while (std::cin >> pattern) {
patterns.push_back(pattern);
}
std::cout << overlap_graph(patterns) << std::endl;
}
int main() {
//tests();
solution();
}
解决方案
推荐阅读
- python - VSCode:需要帮助尝试导入请求以运行程序,但找不到模块
- wso2 - WSO2 APIM 3.2.0 - 无法重定向到订阅工作流中的 peyment 网关
- firebase - 为什么我可以在 Firebase 上连接 Apple Connect 而不能连接 Facebook
- android - react-native-get-music-files 杀死应用程序
- javascript - 检查频道是否有任何消息 Discord.js
- javascript - 如何在我的函数中使用 html 用户输入作为 javascript 变量?
- flutter - 任务':app:compileFlutterBuildDebug'的Flutter执行失败
- c# - 一个对象可以转换为两种不相关的类型吗?
- oracle - 如何将多个值存储在要在案例语句中使用的变量中
- mysql - 错误 1726 (HY000):存储引擎“MyISAM”不支持系统表。(遵循指南后)