python - Radix Sort for Strings in Python
问题描述
My radix sort function outputs sorted but wrong list when compared to Python's sort:
My radix sort: ['aa', 'a', 'ab', 'abs', 'asd', 'avc', 'axy', 'abid']
Python's sort: ['a', 'aa', 'ab', 'abid', 'abs', 'asd', 'avc', 'axy']
* My radix sort does not do padding
* Its mechanism is least significant bit (LSB)
* I need to utilise the length of each word
The following is my code.
def count_sort_letters(array, size, col, base):
output = [0] * size
count = [0] * base
min_base = ord('a')
for item in array:
correct_index = min(len(item) - 1, col)
letter = ord(item[-(correct_index + 1)]) - min_base
count[letter] += 1
for i in range(base - 1):
count[i + 1] += count[i]
for i in range(size - 1, -1, -1):
item = array[i]
correct_index = min(len(item) - 1, col)
letter = ord(item[-(correct_index + 1)]) - min_base
output[count[letter] - 1] = item
count[letter] -= 1
return output
def radix_sort_letters(array):
size = len(array)
max_col = len(max(array, key = len))
for col in range(max_col):
array = count_sort_letters(array, size, col, 26)
return array
Can anyone find a way to solve this problem?
解决方案
正如我在评论中提到的:
在您的代码中,这些行:
correct_index = min(len(item) - 1, col)
letter = ord(item[-(correct_index + 1)]) - min_base
一旦 col 大于单词长度,总是使用单词的第一个字母。一旦 col 大于单词长度,这将导致较短的单词根据它们的第一个字母进行排序。例如 ['aa', 'a'] 保持不变,因为在 for col 循环中,我们比较了两个单词中的 'a',结果保持不变。
代码更正
注意:已尝试尽量减少对原始代码的更改
def count_sort_letters(array, size, col, base, max_len):
""" Helper routine for performing a count sort based upon column col """
output = [0] * size
count = [0] * (base + 1) # One addition cell to account for dummy letter
min_base = ord('a') - 1 # subtract one too allow for dummy character
for item in array: # generate Counts
# get column letter if within string, else use dummy position of 0
letter = ord(item[col]) - min_base if col < len(item) else 0
count[letter] += 1
for i in range(len(count)-1): # Accumulate counts
count[i + 1] += count[i]
for item in reversed(array):
# Get index of current letter of item at index col in count array
letter = ord(item[col]) - min_base if col < len(item) else 0
output[count[letter] - 1] = item
count[letter] -= 1
return output
def radix_sort_letters(array, max_col = None):
""" Main sorting routine """
if not max_col:
max_col = len(max(array, key = len)) # edit to max length
for col in range(max_col-1, -1, -1): # max_len-1, max_len-2, ...0
array = count_sort_letters(array, len(array), col, 26, max_col)
return array
lst = ['aa', 'a', 'ab', 'abs', 'asd', 'avc', 'axy', 'abid']
print(radix_sort_letters(lst))
测试
lst = ['aa', 'a', 'ab', 'abs', 'asd', 'avc', 'axy', 'abid']
print(radix_sort_letters(lst))
# Compare to Python sort
print(radix_sort_letters(lst)==sorted(lst))
输出
['a', 'aa', 'ab', 'abid', 'abs', 'asd', 'avc', 'axy']
True
解释
计数排序是一种稳定的排序含义:
让我们通过一个示例来了解该函数是如何工作的。
让我们排序:['ac', 'xb', 'ab']
我们以相反的顺序遍历每个列表的每个字符。
迭代 0:
Key is last character in list (i.e. index -1): keys are ['c','b', 'b'] (last characters of 'ac', 'xb', and 'ab' Peforming a counting sort on these keys we get ['b', 'b', 'c'] This causes the corresponding words for these keys to be placed in the order: ['xb', 'ab', 'ac'] Entries 'xb' and 'ab' have equal keys (value 'b') so they maintain their order of 'xb' followed by 'ab' of the original list (since counting sort is a stable sort)
迭代 1:
Key is next to last character (i.e. index -2): Keys are ['x', 'a', 'a'] (corresponding to list ['xb', 'ab', 'ac']) Counting Sort produces the order ['a', 'a', 'a'] which causes the corresponding words to be placed in the order ['ab', 'ac', 'xb'] and we are done.
原始软件错误——您的代码最初从左到右穿过字符串,而不是从右到左。我们需要从右到左,因为我们希望将最后一个排序基于第一个字符,倒数第二个基于第二个字符,等等。
不同长度的字符串——上面的例子是等长的字符串。
假设字符串长度相等,对前面的示例进行了简化。现在让我们尝试不等长的字符串,例如:
['ac', 'a', 'ab']
这立即提出了一个问题,因为单词的长度不相等,我们不能每次都选择一个字母。
我们可以通过用 '*' 之类的虚拟字符填充每个单词来修复:
['ac', 'a*', 'ab']
迭代 0:键是每个单词的最后一个字符,所以:['c', '*', 'b']
The understanding is that the dummy character is less than all other characters, so the sort order will be: ['*', 'b', 'c'] causing the related words to be sorted in the order ['a*', 'ab', 'ac']
迭代 1:键位于每个单词的最后一个字符旁边,因此:['a', 'a', 'a']
Since the keys are all equal counting sort won't change the order so we keep ['a*', 'ab', 'ac'] Removing the dummy character from each string (if any) we end up with: ['a', 'ab', 'ac']
get_index 背后的想法是在没有实际填充的情况下模拟填充字符串的行为(即填充是额外的工作)。因此,根据索引,它评估索引是否指向字符串的填充或未填充部分,并将适当的索引返回到计数数组中以进行计数。
推荐阅读
- sql-server - SQL Server 2019 cross-database function call permission
- sql - Postgres UPSERT with EXCLUSION CONSTRAINT
- python - 使用 Django 制作投资组合网站并找不到页面 (404)
- flutter - 让消费者不听
- git - 推送到 GitHub 时上传 LFS 对象卡在 0
- javascript - 我需要有关如何将通过 python/numpy 创建的矩阵转换为 javascript 的指导
- autosys - AutoSys - “不执行”和“冰上”之间的区别
- node.js - Firebase - 无法使用 npm 安装 Firebase,但纱线可以工作
- spring-boot - 不知道为什么应用程序无法识别返回 404 的端点
- php - Laravel 7 cron 作业不适用于 cpanel(共享主机)