首页 > 技术文章 > [python数据结构] hashable, list, tuple, set, frozenset

jay54520 2017-01-25 11:51 原文

学习 cs212 unit4 时遇到了 tuple, list, set 同时使用的问题,并且进行了拼接、合并操作。于是我就被弄混了。所以在这里进行一下总结。

hashable and unhashable

Hashing is the process of converting some large amount of data into a much smaller amount (typically a single integer) in a repeatable way so that it can be looked up in a table in constant-time (O(1)), which is important for high-performance algorithms and data structures.

Immutability is the idea that an object will not change in some important way after it has been created, especially in any way that might change the hash value of that object.

--from Hashable, immutable

 

There are three concepts to grasp when trying to understand idhash and the == and isoperators: identityvalue and hash value. Not all objects have all three.

  1. All objects have an identity, though even this can be a little slippery in some cases. The id function returns a number corresponding to an object's identity (in cpython, it returns the memory address of the object, but other interpreters may return something else). If two objects (that exist at the same time) have the same identity, they're actually two references to the same object. The is operator compares items by identity, a is b is equivalent to id(a) == id(b).

    Identity can get a little confusing when you deal with objects that are cached somewhere in their implementation. For instance, the objects for small integers and strings in cpython are not remade each time they're used. Instead, existing objects are returned any time they're needed. You should not rely on this in your code though, because it's an implementation detail of cpython (other interpreters may do it differently or not at all).

  2. All objects also have a value, though this is a bit more complicated. Some objects do not have a meaningful value other than their identity (so value an identity may be synonymous, in some cases). Value can be defined as what the == operator compares, so any time a == b, you can say that a and b have the same value. Container objects (like lists) have a value that is defined by their contents, while some other kinds of objects will have values based on their attributes. Objects of different types can sometimes have the same values, as with numbers: 0 == 0.0 == 0j == decimal.Decimal("0") == fractions.Fraction(0) == False (yep, bools are numbers in Python, for historic reasons).

    If a class doesn't define an __eq__ method (to implement the == operator), it will inherit the default version from object and its instances will be compared solely by their identities. This is appropriate when otherwise identical instances may have important semantic differences. For instance, two different sockets connected to the same port of the same host need to be treated differently if one is fetching an HTML webpage and the other is getting an image linked from that page, so they don't have the same value.

  3. In addition to a value, some objects have a hash value, which means they can be used as dictionary keys (and stored in sets). The function hash(a) returns the object a's hash value, a number based on the object's value. The hash of an object must remain the same for the lifetime of the object, so it only makes sense for an object to be hashable if its value is immutable (either because it's based on the object's identity, or because it's based on contents of the object that are themselves immutable).

    Multiple different objects may have the same hash value, though well designed hash functions will avoid this as much as possible. Storing objects with the same hash in a dictionary is much less efficient than storing objects with distinct hashes (each hash collision requires more work). Objects are hashable by default (since their default value is their identity, which is immutable). If you write an __eq__ method in a custom class, Python will disable this default hash implementation, since your __eq__ function will define a new meaning of value for its instances. You'll need to write a __hash__ method as well, if you want your class to still be hashable. If you inherit from a hashable class but don't want to be hashable yourself, you can set __hash__ = None in the class body.

    --from Difference between hash() and id()

list

Lists are mutable sequences, typically used to store collections of homogeneous(同种的) items (where the precise degree of similarity will vary by application).

"""
python list concatenate:

>>> [[0, 0]] + ['fill X']
[[0, 0], 'fill X']
>>> [[0, 0]] + ['fill X', (4, 0)]
[[0, 0], 'fill X', (4, 0)]

"""
# use `list` with iterable
list( (1, 2, 3) )
[1, 2, 3]
# user `list comprehension` with tuple or list
[ (1, 2, 3) ]
[(1, 2, 3)]
[ [1, 2, 3] ]
[[1, 2, 3]]

这种语法应该死记硬背吧

tuple

Tuples are immutable sequences, typically used to store collections of heterogeneous(异种的,不同成分的) data (such as the 2-tuples produced by the enumerate() built-in). Tuples are also used for cases where an immutable sequence of homogeneous data is needed (such as allowing storage in a set or dict instance).

 tuple 是不可以修改的,所以

"""
a_tuple = (0,)
a_tuple
(0,)

a_tuple[0] = 1
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

# but you use `+` can concatenate tuples
a_tuple + (1, 1, 1)
(0, 1, 1, 1)
# because concatenate tuples return a new tuple, the original tuple is same
a_tuple
(0,)
"""

  

  

  

set, fronzeset

A set object is an unordered collection of distinct hashable objects.(This means that set is mutable, but it's member must be immutable, so

# set memmber must be immutable
>>> a_set.add([1,2]) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'list'
# but set is mutable, so it can add member
>>> a_set.add((4, 9))

>>> a_set

{(0, 0), (4, 9)}

Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.

The frozenset type is immutable and hashable — its contents cannot be altered after it is created; it can therefore be used as a dictionary key or as an element of another set.

 

But i get in trobule when intialize a set.

# problem when initialize a set
>>> {(0, 0)}
{(0, 0)}
# this only return one `0`
>>> set((0, 0))
{0}

  

  

 

推荐阅读