首页 > 解决方案 > 一份求职面试学习指南告诉我用数组构建哈希?

问题描述

如何在 python 中仅使用数组来构建具有快速查找且没有冲突的字典?

我有超过 5 年的经验,并且已经为数百万用户花费了数百万美元构建了功能,并且完全不知道从哪里开始。

标签: python

解决方案


这也许是一个很好的起点。仅使用数组的自定义字典的实现。但这有很多限制。值得注意的限制是:

  • 可能会发生碰撞。正如评论中已经提到的那样,如果我们想要一个没有冲突的字典,我相信键中应该有某种形式的约束,我们可以在散列期间用作属性(换句话说,在选择一个目标存储桶/数组位置)以确保没有单个键:值会存在于同一个存储桶中。约束的示例可能是每个键具有唯一的长度,或者每个键以不同的字符开头,或者每个键在其内容(转换为整数)相加时具有不同的整数和,等等。
  • 存储桶/数组大小不会调整大小。当前大小为 100。因此,如果您存储 101 个项目,则可以保证至少 1 次碰撞。
  • 目前支持的键类型仅限于 int 和 str。
  • 哈希算法可以改进。目前,它对 int 没有任何作用,而对 str,它只是对每个字符的每个 ASCII 码执行一个异或。

一般来说,该算法平均运行 O(1),但在最坏的情况下仍然运行 O(n)。设置新键只需要获取数组中的目标位置并将其放置在那里,而获取值则计算相同的目标位置并在那里获取值。

代码

def my_hash(variable):
    """Currently, only types int and str are supported"""
    if isinstance(variable, int):
        return round(variable)
    elif isinstance(variable, str):
        if len(variable) == 0: return 0
        result = ord(variable[0])
        for index in range(1, len(variable)):
            result ^= ord(variable[index])
        return result
    return 0

class MyDict:
    def __init__(self):
        INITIAL_ARRAY_SIZE = 100
        self.internal_array = [[] for _ in range(INITIAL_ARRAY_SIZE)]

    def set(self, key, value):
        hash_value = my_hash(key)
        array_index = hash_value % len(self.internal_array)

        # In the case that the bucket at self.internal_array[array_index] is not empty and there is
        # a new key to be placed on it, then there is a collision. At this point, it would be O(n)
        # where n is the number of colliding keys. This could perhaps be improved into O(log n) if
        # we maintain each bucket as sorted, or even O(n) if we maintain a secondary hash table on
        # each bucket which of course requires a second hasher logic.
        for item in self.internal_array[array_index]:
            if item[0] == key:
                item[1] = value
                break
            print("WARNING: Collision detected for key", key)
        else:
             self.internal_array[array_index].append([key, value])   

    def get(self, key):
        hash_value = my_hash(key)
        array_index = hash_value % len(self.internal_array)

        # As noted above in set method, there is a collision if the self.internal_array[array_index]
        # bucket contains more than one key:value pair.
        for item in self.internal_array[array_index]:
            if item[0] == key:
                return item[1]
        return None

    def show(self):
        print("Showing buckets with contents:")
        for index, bucket in enumerate(self.internal_array):
            if len(bucket) == 0: continue
            for item in bucket:
                print("Bucket[", index, "] ==> key:", item[0], "value:", item[1])

        print("Showing all buckets:")
        print(self.internal_array)

obj = MyDict()

print("Setting keys...")
obj.set("GOAT", "Michael Jordan")
obj.set("humans", "The Wildlife Killers")
obj.set("py-2", "sorry, end of life")
obj.set("GOAT", "LeBron James")
obj.set(69, "an ordinary number")
obj.set("Linus", "Gates")
obj.set(12, "los meses en un ano")
obj.set("beaver", "the best animal out there!")

print("\nGetting keys...")
print("Linus is", obj.get("Linus"))
print("12 is", obj.get(12))
print("GOAT is", obj.get("GOAT")) # key with an updated value
print("69 is", obj.get(69))
print("humans is", obj.get("humans"))
print("beaver is", obj.get("beaver"))
print("py-2 is", obj.get("py-2"))
print("COVID is", obj.get("COVID")) # non-existent key

print("\nDisplaying everything...")
obj.show()

输出

Setting keys...
WARNING: Collision detected for key 12

Getting keys...
Linus is Gates
12 is los meses en un ano
GOAT is LeBron James
69 is an ordinary number
humans is The Wildlife Killers
beaver is the best animal out there!
py-2 is sorry, end of life
COVID is None

Displaying everything...
Showing buckets with contents:
Bucket[ 7 ] ==> key: beaver value: the best animal out there!
Bucket[ 12 ] ==> key: humans value: The Wildlife Killers
Bucket[ 12 ] ==> key: 12 value: los meses en un ano
Bucket[ 22 ] ==> key: py-2 value: sorry, end of life
Bucket[ 29 ] ==> key: GOAT value: LeBron James
Bucket[ 69 ] ==> key: 69 value: an ordinary number
Bucket[ 77 ] ==> key: Linus value: Gates
Showing all buckets:
[[], [], [], [], [], [], [], [['beaver', 'the best animal out there!']], [], [], [], [], [['humans', 'The Wildlife Killers'], [12, 'los meses en un ano']], [], [], [], [], [], [], [], [], [], [['py-2', 'sorry, end of life']], [], [], [], [], [], [], [['GOAT', 'LeBron James']], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [[69, 'an ordinary number']], [], [], [], [], [], [], [], [['Linus', 'Gates']], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]

推荐阅读