首页 > 技术文章 > HashMap实现特点——基于JDK文档

xiatom 2019-04-18 09:44 原文

前几天面试pdd,面试官一顿问HashMap,原以为掌握的不错,没想到还是过于皮毛。本着Geek精神,就钻研一下HashMap。首先先基于JDK对HashMap进行详细介绍,以后会再分析源码。

HashMap的实现特点

1. HashMap和HashTable比较相似,区别是:

  • HashMap线程不安全
  • HashMap可以存储null键和null值
  • 由于线程不安全,所以相对的性能较高

2. HashMap存储不保证顺序存储
3. HashMap迭代整个集合的时间和容量及键值对有关,HashMap的一个实例有两个影响其性能的参数:初始容量和负载因子。
4. HashMap是以哈希桶方式存储的。
5. 其Iterator迭代器采用的是快速失败机制
6. 如果使用自定义类当作Key,需要注意其equals方法和hashCode方法
7. HashMap有线程安全版:ConcurrentHashMap
8. HashMap的实现是包含树的,这个在我的另一篇文章讲

下面对各个特点进行讲解:

Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

HashMap是基于哈希表的Map接口的实现,提供了所有可选的方法,并且允许null key和null value。HashMap类近似等于Hashtable,区别在于,HashMap是非同步的也就是线程不安全的并且Hashtable不允许存储null;HashMap类不保证顺序存储,也就是你按照什么顺序存,他不一定是按照这个顺序写在map里。

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the “capacity” of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it’s very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

假设哈希函数在桶之间正确的把元素列散开来,HashMap的实现为基本操作(get和put)提供了恒定的时间性能。也就是迭代整个集合需要的的时间和HashMap的容量和键值对数量成比例.容量是指哈希桶的桶的数量(哈希桶意思如下),所以初始化时尽量不要设置太大的容量

哈希桶

我们在利用哈希值存放数据时,会遇到冲突,常用的解决冲突的方法有:开放定址、链地址、溢出区,其中链地址就是哈希桶。、
我们会将元素的哈希值设为桶,哈希值相同的元素放入一个桶中,就是如下图。根据哈希值和值找元素也是此步骤,找到哈希值的桶,如果桶中第一个元素是要找的则结束,否则继续找下一个元素,直至找到元素
在这里插入图片描述

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

HashMap实例有两个影响其性能的参数:初始容量和负载因子。容量是哈希表中的桶数(上面已经讲解),初始容量只是创建哈希表时的容量。加载因子是在自动增加容量之前允许哈希表获取的完整程度的度量。当哈希表中的条目数超过加载因子和当前容量的乘积时,哈希表将被重新哈希(即,重建内部数据结构),以便哈希表具有大约两倍的桶数。

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

作为一般规则,默认加载因子(.75)在时间和空间成本之间提供了良好的权衡。较高的值会减少空间开销,但会增加查找成本(反映在HashMap类的大多数操作中,包括get和put)。在设置其初始容量时,应考虑映射中的预期条目数及其加载因子,以便最小化重新散列操作的数量。如果初始容量大于最大条目数除以加载因子,则不会发生重新加载操作。

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.

如果存储的键值对过多,那么给予他初始容量足够大会比其自己扩容效率高(因为需要调整Hash表)。使用具有相同hashCode()的许多键会减慢任何哈希表性能,为了改善这种情况,当键是可以比较的,那么这个类可以使用他们的顺序降低这种影响

如加载因子时0.75(默认,最好),当前容量为16,则在键值对数目超过16时进行扩容。

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be “wrapped” using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:

使用  Map m = Collections.synchronizedMap(new HashMap(...));

我们可以注意到HashMap的实现是非同步的,所以如果对其进行增加或删除需要在外部同步,常用的方法就是对象锁,或者使用Collections工具类的synchronizedMap方法进行包装。并且最好是在创建对象的时候包装。(再或者就是使用ConcurrentHashMap,这些同步以后我会写出来)

The iterators returned by all of this class’s “collection view methods” are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator’s own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.

上面讲的是Iterator迭代器的快速失败机制(fail-fast),这个不单单对于HashMap,所有Collection集合的Iterator迭代器都是快速失败,一旦在迭代过程中对集合进行结构化修改,除非是迭代器中的remove方法,其余都会抛出异常。因此遇到并发修改那么就会快速干净的失败,避免在未来的未确定时间发生非确定性行为的风险。并且需要注意,迭代器的快速失败机制行为无法得到保证,比如在有些情况,对迭代器中元素进行删除可能不会引发异常,但一定不要以为可以在迭代时进行删除元素。对于HashMap、ArrayList等集合,迭代时进行结构化操作都会导致异常,只要在操作特定元素时才不会抛出异常,如果想深入了解可以去查看一下这些集合的实现代码。

自定义类当作Key值需要注意的地方:

如果使用了自定义类作为Key,并且重写了equals方法和hashCode方法时,需要保证两种方法判断的标准是一样的,也就是当两个key通过equals比较为相同时,hashCode也应该相同,详细内容见:判断对象相等以及相同对象问题——自定义类重写equals方法以及hashCode方法,以及遇到HashSet集合问题,HashMap的key存储方式与HashSet相同

推荐阅读