首页 > 解决方案 > 不断更新中位数+空间效率





// Initialize values
noList   = [8,10,4,6]
mean     = 0
noItems  = 0

// Now we want to update the mean continually with further values.
for (value : noList) {
  mean    = (noItems / (noItems + 1)) * mean + (1 / (noItems + 1)) * value
  noItems = noItems + 1

// After iteration 1: wholeList = [8]       ; mean = 8   ; noItems = 1
// After iteration 2: wholeList = [8,10]    ; mean = 9   ; noItems = 2
// After iteration 3: wholeList = [8,10,4]  ; mean = 7.33; noItems = 3
// After iteration 4: wholeList = [8,10,4,6]; mean = 7   ; noItems = 4

问题: 是否有类似的(节省空间的)方法来计算中位数?



// 1) We have a list for which we want to find the median.
noList   = [9,10,4,6,13,12]

// 2) We devide it into two list or datastructures (additionally we sort it).
smallerList = [4,6,9]
biggerList  = [10,12,13]

// 3) Both list have the same length, so the median is between the last element of smallerList und the first element of biggerList.
median = (9 + 10) / 2 = 9.5

// 4) Next, we add a further element and want to update our median.
// We add the number 5 to our datastructures. So the new list is:
noList   = [9,10,4,6,13,12,5]

// 5) Obviously 5 is smaller than our current median of 9.5. So we insert it in a sorted way into smallerList:
smallerList = [4,5,6,9]
biggerList  = [10,12,13]

// 6) Now length(smallerList) > length(biggerList), So, we know, that the updated median should be the last element of smallerList.
median = 9

// 7) Next, we add a further element and want to update our median.
// We add the number 2 to our datastructures. So the new list is:
noList   = [9,10,4,6,13,12,5,2]

// 8) Obviously 2 is smaller than our current median of 9. So we insert it again in a sorted way into smallerList:
smallerList = [2,4,5,6,9]
biggerList  = [10,12,13]

// 9) Now the length of smallerList is much bigger than the length of biggerList and we need to "balance" our list by taking one element from one list and inserting it into the other list.
// We remove the element 9 from smallerList and insert it into biggerList.
smallerList = [2,4,5,6]
biggerList  = [9,10,12,13]

// 10) Both list have the same length, so the median is between the last element of smallerList und the first element of biggerList.
median = (6 + 9) / 2 = 7.5


是的,这可能会回答我最初的问题......但这个解决方案的问题是,两个列表(smallerList 和 largeList)都可能增长到相当大的规模。假设我们有一个 10^18 个数字的流,我们希望找到所有数字的中位数,而不会超出内存。如何以节省空间的方式解决这个问题?

标签: algorithmmeanspacepseudocodemedian



如果到目前为止您已经看到n 个数字,那么对于任何i,其中第i个最小的可能成为中位数:

  • 如果i > n/2,那么如果接下来的2i - n数字更大,就会发生这种情况。

  • 如果i <= n/2,那么如果接下来的n - 2i + 1 个数字更小,就会发生这种情况。
