首页 > 技术文章 > 【算法总结】哈夫曼树和哈夫曼编码

Atanisi 原文

一、哈夫曼树

1. 哈夫曼树也称最优二叉树。

 叶子节点的权值是对叶子节点赋予的一个有意义的数值量。

 设二叉树具有 n 个带权值的叶子结点,从根节点到各个叶子结点的路径长度与相应叶子结点权值的乘积之和叫做二叉树的带权路径长度。

 给定一组具有确定权值的叶子结点,可以构造处不同的二叉树,将其中带权路径长度最小的二叉树称为哈夫曼树

2. 基本思想:

  • 初始化:由给定的 n 个权值 $left{ omega_{1},omega_{2},cdots ,omega_{n} ight}$构造 n 棵只有一个根节点的二叉树,从而得到一个二叉树集合$F=left{T_{1},T_{2},cdots,T_{n} ight}$。
  • 选取与合并:在$F$中选取根节点的权值最小的两颗二叉树分别作为左右子树构造一棵新的二叉树(一般情况下将权值大的结点作为右子树。),这棵新二叉树的根节点的权值为其左、右子树根节点的权值之和。
  • 删除与加入:在$F$中删除作为左、右子树的两棵二叉树,并将新建立的二叉树加入到$F$中。
  • 重复上述两个步骤,当集合$F$中只剩下一棵二叉树时,这棵二叉树便是哈夫曼树。

           

  由哈夫曼算法构造的哈夫曼树中,非叶子节点的度均为2。具有 n 个叶子结点的哈夫曼树公有 2n-1个结点,其中有 n-1 个非叶子结点。它们是在 n-1 次的合并过程中生产的。

二、哈夫曼编码

1. 哈夫曼编码是一种可变字长编码

 如果一组编码中任一编码都不是其他任何一个编码的前缀,我们称这组编码为前缀编码。哈夫曼树可用于构造最短的不等长编码方案。

2. 算法流程

  规定哈夫曼编码树的作分支代表 0,右分支代表 1,则从根结点到每个叶子结点所经过的路径组成的 0 和 1 的序列便成为该叶子结点对应字符的编码。

  

  解码则是将编码串从左到右逐位判别,直到确定一个字符。

  哈夫曼编码树中,树的带权路径长度的含义是各个字符的码长与其出现次数的乘积之和,所以采用哈夫曼树构造的编码是一种能使字符串的编码总长度最短的不等长编码。

 三、题目:

1. POJ 3253 

Description

Farmer John wants to repair a small length of the fence around the pasture. He measures the fence and finds that he needs N (1 ≤ N ≤ 20,000) planks of wood, each having some integer length Li (1 ≤ Li ≤ 50,000) units. He then purchases a single long board just long enough to saw into the N planks (i.e., whose length is the sum of the lengths Li). FJ is ignoring the "kerf", the extra length lost to sawdust when a sawcut is made; you should ignore it, too.

FJ sadly realizes that he doesn't own a saw with which to cut the wood, so he mosies over to Farmer Don's Farm with this long board and politely asks if he may borrow a saw.

Farmer Don, a closet capitalist, doesn't lend FJ a saw but instead offers to charge Farmer John for each of the N-1 cuts in the plank. The charge to cut a piece of wood is exactly equal to its length. Cutting a plank of length 21 costs 21 cents.

Farmer Don then lets Farmer John decide the order and locations to cut the plank. Help Farmer John determine the minimum amount of money he can spend to create the N planks. FJ knows that he can cut the board in various different orders which will result in different charges since the resulting intermediate planks are of different lengths.

Input

Line 1: One integer N, the number of planks 
Lines 2..N+1: Each line contains a single integer describing the length of a needed plank

Output

Line 1: One integer: the minimum amount of money he must spend to make N-1 cuts

Sample Input

3
8
5
8

Sample Output

34

Hint

He wants to cut a board of length 21 into pieces of lengths 8, 5, and 8. 
The original board measures 8+5+8=21. The first cut will cost 21, and should be used to cut the board into pieces measuring 13 and 8. The second cut will cost 13, and should be used to cut the 13 into 8 and 5. This would cost 21+13=34. If the 21 was cut into 16 and 5 instead, the second cut would cost 16 for a total of 37 (which is more than 34).
 

  由题意可知对数据构造Huffman树,其中非叶子节点的值加起来的和为所求。这里无需真正构造哈夫曼树,哈夫曼树的构造过程是不断选择集合种两个权值最小的结点,然后删除此两结点,加入新的权值。利用最小堆即可。

 1 #include <iostream>
 2 #include <queue>
 3 
 4 using namespace std;
 5 
 6 int main()
 7 {
 8     priority_queue<int ,vector<int>,greater<int> > q;
 9     int n;
10     int num;
11     cin >> n;
12     while(n--)
13     {
14         cin >> num;
15         q.push(num);
16     }
17 
18     long long sum = 0;
19     
20     while(!q.empty())
21     {
22         int num1 = q.top();
23         q.pop();
24         int num2 = q.top();
25         q.pop();
26         sum += num1 + num2;
27         if(q.empty())
28             break;
29         q.push(num1 + num2);
30     }
31     
32     cout << sum << endl;
33     return 0;
34 }

推荐阅读