首页 > 解决方案 > 使用浮点键实现类似哈希表的数据结构,其中容差内的值被合并在一起

问题描述

我需要一个带有浮点键的关联数据结构,其中值几乎相等的键被合并在一起。我正在使用 C++,但语言并不重要。

基本上我目前的策略是

  1. 只处理单精度浮点数

  2. 使用带有自定义键类型的 unordered_map

  3. 将键类型上的散列函数定义为

    一种。给定浮点数v除以v某个容差,例如 0.0005,以双精度计算,产生k.

    湾。转换k为 64 位整数,产生ki

    C。返回 std::hash of ki.

首先,是否有一个标准的命名数据结构可以做这样的事情?如果没有,有比我的一般方法更好的方法吗?

我不喜欢以下实现的主要事情是我不直观地将哪些浮点值合并在一起;我通过大致了解我想将输入中的哪些值算作相同的值并只是测试各种容差来解决这个问题,但是如果你将 12.0453 添加到容器中,那么值 12.0453 +/- 0.0005 会很好如果容差参数为 0.0005,则认为是相等的,但事实并非如此——我什至认为在 unordered_map 之上这种行为是不可能的,因为我认为散列函数将取决于表中的值。

基本上,我的实现是将数轴划分为一维网格,其中每个网格单元的宽度为 epsilon 单位,然后将浮点值分配给它们所属的网格单元的从零开始的索引。我的问题是,是否有更好的方法来实现浮点值的关联容器,其容差也是 O(1)?下面的实现有问题吗?

    template<typename V, int P=4>
    class float_map
    {
    private:
        struct key {
        public:
            long long val;

            static constexpr double epsilon(int digits_of_precision)
            {
                return (digits_of_precision == 1) ? 0.5 : 0.1 * epsilon(digits_of_precision - 1);
            }

            static constexpr double eps = epsilon(P);

            key(float fval) : val(static_cast<long long>( fval / eps))
            {}

            bool operator==(key k) const {
                return val == k.val;
            }
        };

        struct key_hash
        {
            std::size_t operator()(key k) const {
                return std::hash<long long>{}(k.val);
            }
        };

        std::unordered_map<key, V, key_hash> impl_;

    public:
        V& operator[](float f) {
            return impl_[key(f)];
        }

        const V& at(float f) const {
            return impl_.at(key(f));
        }

        bool contains(float f) const {
            return impl_.find(f) != impl_.end();
        }

        double epsilon() const {
            return key::eps;
        }
    };

    int main()
    {
        float_map<std::string> test;

        test[12.0453f] = "yes";

        std::cout << "epsilon = " << test.epsilon() << std::endl;                             // 0.0005

        std::cout << "12.0446f => " << (test.contains(12.0446f) ? "yes" : "no") << std::endl; // no
        std::cout << "12.0447f => " << (test.contains(12.0447f) ? "yes" : "no") << std::endl; // no
        std::cout << "12.0448f => " << (test.contains(12.0448f) ? "yes" : "no") << std::endl; // no
        std::cout << "12.0449f => " << (test.contains(12.0449f) ? "yes" : "no") << std::endl; // no
        std::cout << "12.0450f => " << (test.contains(12.0450f) ? "yes" : "no") << std::endl; // yes
        std::cout << "12.0451f => " << (test.contains(12.0451f) ? "yes" : "no") << std::endl; // yes
        std::cout << "12.0452f => " << (test.contains(12.0452f) ? "yes" : "no") << std::endl; // yes
        std::cout << "12.0453f => " << (test.contains(12.0453f) ? "yes" : "no") << std::endl; // yes
        std::cout << "12.0454f => " << (test.contains(12.0454f) ? "yes" : "no") << std::endl; // yes
        std::cout << "12.0455f => " << (test.contains(12.0455f) ? "yes" : "no") << std::endl; // yes
        std::cout << "12.0456f => " << (test.contains(12.0456f) ? "yes" : "no") << std::endl; // no
        std::cout << "12.0457f => " << (test.contains(12.0457f) ? "yes" : "no") << std::endl; // no
        std::cout << "12.0458f => " << (test.contains(12.0458f) ? "yes" : "no") << std::endl; // no
        std::cout << "12.0459f => " << (test.contains(12.0459f) ? "yes" : "no") << std::endl; // no
        std::cout << "12.0460f => " << (test.contains(12.0460f) ? "yes" : "no") << std::endl; // no

    }

标签: c++data-structuresfloating-pointhashtablefixed-point

解决方案


最好的方法是使用定点算法。

问题详细信息中的实现有效,但比它需要的更加模糊。它视为 epsilon 或容差实际上是“bin 宽度” - 划分实数线的网格线之间的一维间距 - 因此,如果您期望 epsilon 值充当容差,您会注意到垃圾箱边缘/网格线附近的反直觉行为。

无论如何,考虑这个问题的更清晰的方法是不要尝试使用“公差”的概念,而是使用“有效数字”的概念。仅n将小数点右侧的 base-10 数字视为重要并对其进行参数化n。这基本上导致使用定点值作为键而不是浮点值;在上述实现中,它类似于使用 0.0001 而不是 0.0005 的 epsilon。

但是,除了修改原始代码中的 epsilon 之外,现在没有理由不只是将定点值设为公共类型并将该类型用作向用户公开的 unordered_map 的键。以前我们想通过将实现的 unordered_map 包装在自定义数据结构中来隐藏键类型,因为在这种情况下键是不透明的,没有直观的含义。在普通的 unordered_map 中使用定点键的另一个好处是,我们不必为所有标准 std::unordered_map 调用实现包装器方法,因为现在给用户一个实际的 unordered_map。

下面的代码:

template<int P=4>
class fixed_point_value
{
    static constexpr double calc_scaling_factor(int digits_of_precision)
    {
        return (digits_of_precision == 1) ? 10.0 : 10.0 * calc_scaling_factor(digits_of_precision - 1);
    }

    static constexpr double scaling_factor = calc_scaling_factor(P);

    template<int P>
    friend struct fixed_point_hash;

public:
    fixed_point_value(float val) : 
        impl_(static_cast<long long>(std::llround(scaling_factor * val)))
    {}

    bool operator==(fixed_point_value<P> fpv) const 
    {
        return impl_ == fpv.impl_;
    }

    float to_float() const
    {
        return static_cast<float>(impl_ / scaling_factor);
    }

private:
    long long impl_;
};

template<int P = 4>
struct fixed_point_hash
{
    std::size_t operator()(fixed_point_value<P> key) const {
        return std::hash<long long>{}(key.impl_);
    }
};

template<typename V, int P = 4>
using fixed_point_table = std::unordered_map<fixed_point_value<P>, V, fixed_point_hash<P>>;

int main()
{
    fixed_point_table<std::string, 4> t4;

    t4[12.0453f] = "yes";

    // these will all be "no" except 12.0453f because we have 4 base-10 digits of precision i.e.
    // 4 digits right of the decimal must be an exact match
    std::cout << "precision = 4" << std::endl;
    std::cout << "12.0446f => " << (t4.find(12.0446f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0447f => " << (t4.find(12.0447f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0448f => " << (t4.find(12.0448f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0449f => " << (t4.find(12.0449f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0450f => " << (t4.find(12.0450f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0451f => " << (t4.find(12.0451f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0452f => " << (t4.find(12.0452f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0453f => " << (t4.find(12.0453f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0454f => " << (t4.find(12.0454f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0455f => " << (t4.find(12.0455f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0456f => " << (t4.find(12.0456f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0457f => " << (t4.find(12.0457f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0458f => " << (t4.find(12.0458f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0459f => " << (t4.find(12.0459f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0460f => " << (t4.find(12.0460f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "\n";

    fixed_point_table<std::string, 3> t3;
    t3[12.0453f] = "yes"; // 12.0453 will round to the fixed point value 12.045.
    std::cout << "precision = 3" << std::endl;
    std::cout << "12.0446f => " << (t3.find(12.0446f) != t3.end() ? "yes" : "no") << std::endl; // rounds to 12.045 so yes;
    std::cout << "12.0447f => " << (t3.find(12.0447f) != t3.end() ? "yes" : "no") << std::endl; // rounds to 12.045 so yes;
    std::cout << "12.0448f => " << (t3.find(12.0448f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0449f => " << (t3.find(12.0449f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0450f => " << (t3.find(12.0450f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0451f => " << (t3.find(12.0451f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0452f => " << (t3.find(12.0452f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0453f => " << (t3.find(12.0453f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0454f => " << (t3.find(12.0454f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0455f => " << (t3.find(12.0455f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0456f => " << (t3.find(12.0456f) != t3.end() ? "yes" : "no") << std::endl; // 12.0456f rounds to the 3 digits of precison fixed point value 12.046 so no
    std::cout << "12.0457f => " << (t3.find(12.0457f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0458f => " << (t3.find(12.0458f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0459f => " << (t3.find(12.0459f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0460f => " << (t3.find(12.0460f) != t3.end() ? "yes" : "no") << std::endl; // '

}

推荐阅读