c++ - Sort a vector of vectors and count frequencies of unique occurences using the standard library
问题描述
Given a std::vector<std::vector<int>>
:
- I want to output a sorted
std::vector<std::vector<int>>
- that contains only unique
std::vector<int>
- as well as the frequency (i.e., count) of these unique
std::vector<int>
My question is twofold:
- how can I efficiently achieve this relying only on the standard library?
- what is causing the code below to perform poorly?
I tried the following:
std::vector<std::vector<int>> responseVectors
{
{ 2, 3, 2 },
{ 3, 2, 3 },
{ 3, 2, 3 },
{ 3, 3, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 2, 1, 2 },
{ 0, 1, 2 },
{ 2, 3, 2 },
{ 3, 2, 3 },
{ 3, 2, 3 },
{ 3, 3, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 2, 1, 2 },
{ 0, 1, 2 }
};
std::vector<std::vector<int>> uniqueResponseVectors;
uniqueResponseVectors.reserve(responseVectors.size());
std::sort(responseVectors.begin(), responseVectors.end());
std::unique_copy(responseVectors.begin(), responseVectors.end(), std::back_inserter(uniqueResponseVectors));
std::vector<int> freqTotal;
freqTotal.reserve(uniqueResponseVectors.size());
for(auto&& vector : uniqueResponseVectors)
{
int count = std::count(responseVectors.begin(), responseVectors.end(), vector);
freqTotal.push_back(count);
}
And the output should be:
for(auto&& vector : uniqueResponseVectors)
{
for(auto&& value : vector)
{
std::cout << "\t" << value << "\t";
}
std::cout << "\n";
}
// Printed result for the `uniqueResponseVectors`:
//
// 0 1 2
// 1 2 3
// 2 1 2
// 2 3 2
// 3 2 3
// 3 3 3
// Similarly for the `freqTotal`:
//
// 2
// 6
// 2
// 2
// 4
// 2
Additional context:
when I tried the code above against a larger dataset (i.e.,
responseVectors
with a size of 100000 and 12 as the size of eachstd::vector<int>
), it was slow.I also tried to manipulate
responseVectors
directly by callingstd::unique
anderase()
on it, in order avoid creating a copy. Then I used an iterator loop tostd::count
how many times a uniquestd::vector<int>
occurs. However, this proved even slower.
解决方案
和PaulMcKenzie进一步重申了选择正确数据结构的重要性
基于此输入,我尝试将YSC提供的解决方案具体化到我的案例中,以便更好地了解正在发生的事情。它归结为使用std::map
哪个是排序的关联容器:
std::map<std::vector<int>, int> SortAndCountOccurences(std::vector<std::vector<int>>& vectors)
{
std::map<std::vector<int>, int> result;
std::for_each(vectors.begin(), vectors.end(), [&result](auto const& vector) {
++result[vector];
});
return result;
}
具有以下用法:
std::vector<std::vector<int>> responseVectors
{
{ 2, 3, 2 },
{ 3, 2, 3 },
{ 3, 2, 3 },
{ 3, 3, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 2, 1, 2 },
{ 0, 1, 2 },
{ 2, 3, 2 },
{ 3, 2, 3 },
{ 3, 2, 3 },
{ 3, 3, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 2, 1, 2 },
{ 0, 1, 2 }
};
std::map<std::vector<int>, int> sortedVectors = SortAndCountOccurences(responseVectors);
这将输出:
for(auto&& vector : sortedVectors)
{
for(auto&& value : vector.first)
{
std::cout << "\t" << value << "\t";
}
std::cout << "-> x" << vector.second << " times" << "\n";
}
// 0 1 2 -> x2 times
// 1 2 3 -> x6 times
// 2 1 2 -> x2 times
// 2 3 2 -> x2 times
// 3 2 3 -> x4 times
// 3 3 3 -> x2 times
推荐阅读
- azure-storage - 如何查找 Azure 驱动器类型?
- javascript - Angular CDK 布局 - 如何在整个项目中全局包含 BreakPointObserver
- python - 如何修复 Python 3.7 中的“KeyError:****_target”错误
- paypal - PayPal 结帐和 Braintree 类的问题
- linker - 针对静态库编译 Fortran 程序时找不到函数(在 Windows 中使用 IFORT)
- python-3.x - 如何将数据帧打包成一组或引用它们而不改变它们?蟒蛇 3
- javascript - Css - 如果元素到达屏幕上的某个位置,如何更改元素大小?
- customization - Acumatica:如何更改案例中的原因下拉选项?
- html - 导航栏仅在网站的一个部分中不可点击
- html - 如何在 flexbox 中创建具有固定导航栏的 3 列布局