首页 > 解决方案 > 按 COVID 的姓氏将人们分配到购物日

问题描述

优化邮件列表中的一个问题:

对于 COVID-19,我的地方当局已经尝试过尝试分配人们一周中的某一天去购物。他们只是按字母顺序划分,不考虑我所在城镇使用姓氏的频率,结果很糟糕,排队时间很长。有没有更好的办法?(超市每周营业 5 天。)

注意:通常只有姓氏列表(可能只有流行的姓氏)可用。以直截了当的方式沟通谁应该购物很重要。

标签: algorithmoptimization

解决方案


虽然可以将其设置为混合整数规划问题 (MIP),但这不是最好的主意。考虑到所涉及的变量数量,MIP 求解器可能会花费很长时间来证明最佳答案。当然,您可以在时间限制后切断求解器并获得可接受的近似值,但不能保证。

解决这个问题的最好方法是通过动态规划,它可以极快地找到最佳答案。首先,我们注意到,如果我们有k个组,每个组的权重为w,我们想要做的是最小化任何组中个体的最大权重。

将此最佳值称为 V(n,k)。接下来,我们注意到对于潜在的分区点i,在该点的分区值为:

在此处输入图像描述

因此,我们可以将V(n,k)写为:

在此处输入图像描述

我们有一些边缘情况。如果N≤0,那么我们有一个空列表。由于这是不可接受的,我们返回∞。如果k=0,则没有分区,因此我们只需返回子数组的总和。这最终为我们提供了问题的完整功能形式:

在此处输入图像描述

现在,我们从人口普查中收集一些姓氏分布的数据,并在 Python 中实现递归,如下所示:

#!/usr/bin/env python3
import functools
import pandas as pd

class Partitioner:
  def __init__(self):
    self._weights = None

  @functools.lru_cache(maxsize=None)
  def _subarraysum(self,i,n):
    """Memoized Sum range [i,n)"""
    return sum(self._weights[i:n])

  @functools.lru_cache(maxsize=None)
  def _V(self,n,k):
    """Find best split for the values n,k
    @returns (Value, [List of Split Points])
    """
    if n<=0: #Must have at least one element in each subarray
      return float("inf"), []
    if k==0:  #We've filled all the subarrays
      return self._subarraysum(0,n), []

    #Look through all the split points and find the best one
    val = float("inf"), []
    for i in range(n):
      vali, splitsi = self._V(i,k-1)           #Best split for [0,i)
      vali = max(vali, self._subarraysum(i,n)) #Max of that and value of [i,N)
      if vali<val[0]:
        val = vali, splitsi+[i]
    return val

  def run(self,weights,k):
    """Return a (Value,Splits) tuple for the best partitioning of N objects with
    the given weights into k partitions"""
    self._V.cache_clear()
    self._subarraysum.cache_clear()
    self._weights = weights
    #Subtract 1 from k because we calculate using separators
    return self._V(len(weights),k-1) 

def GetShortNameDataFrame(df, k):
  """Returns a copy of df in which names are abbreviated to `k` characters long
  """
  df = df.copy()
  df['names'] = df['names'].str.slice(0,k)
  return df.groupby('names').agg({"counts":sum}).reset_index()

def PrintSplits(df, splits):
  """Pretty print the name ranges"""
  p = [0] + splits + [len(df)]
  for i in range(len(p)-1):
    startname = df['names'][p[i]]
    endname   = df['names'][p[i+1]-1]
    count     = sum(df['counts'][p[i]:p[i+1]])
    print(f"{startname:>10}-{endname:>10} {count}")

def ExploreNames(namecounts,kmax):
  """Explore shorting names to lengths of [1,kmax] characters"""
  for k in range(1,kmax+1):
    print("k",k)
    snames = GetShortNameDataFrame(namecounts,k)
    #Find optimal way to partition the names into 5 bins
    val, splits = Partitioner().run(snames['counts'].tolist(),5)
    print(f"Max count: {val}")
    PrintSplits(snames, splits)

#Source: https://www.census.gov/topics/population/genealogy/data/2010_surnames.html
names = "SMITH JOHNSON WILLIAMS BROWN JONES GARCIA MILLER DAVIS RODRIGUEZ MARTINEZ HERNANDEZ LOPEZ GONZALEZ WILSON ANDERSON THOMAS TAYLOR MOORE JACKSON MARTIN LEE PEREZ THOMPSON WHITE HARRIS SANCHEZ CLARK RAMIREZ LEWIS ROBINSON WALKER YOUNG ALLEN KING WRIGHT SCOTT TORRES NGUYEN HILL FLORES GREEN ADAMS NELSON BAKER HALL RIVERA CAMPBELL MITCHELL CARTER ROBERTS GOMEZ PHILLIPS EVANS TURNER DIAZ PARKER CRUZ EDWARDS COLLINS REYES STEWART MORRIS MORALES MURPHY COOK ROGERS GUTIERREZ ORTIZ MORGAN COOPER PETERSON BAILEY REED KELLY HOWARD RAMOS KIM COX WARD RICHARDSON WATSON BROOKS CHAVEZ WOOD JAMES BENNETT GRAY MENDOZA RUIZ HUGHES PRICE ALVAREZ CASTILLO SANDERS PATEL MYERS LONG ROSS FOSTER JIMENEZ POWELL JENKINS PERRY RUSSELL SULLIVAN BELL COLEMAN BUTLER HENDERSON BARNES GONZALES FISHER VASQUEZ SIMMONS ROMERO JORDAN PATTERSON ALEXANDER HAMILTON GRAHAM REYNOLDS GRIFFIN WALLACE MORENO WEST COLE HAYES BRYANT HERRERA GIBSON ELLIS TRAN MEDINA AGUILAR STEVENS MURRAY FORD CASTRO MARSHALL OWENS HARRISON FERNANDEZ MCDONALD WOODS WASHINGTON KENNEDY WELLS VARGAS HENRY CHEN FREEMAN WEBB TUCKER GUZMAN BURNS CRAWFORD OLSON SIMPSON PORTER HUNTER GORDON MENDEZ SILVA SHAW SNYDER MASON DIXON MUNOZ HUNT HICKS HOLMES PALMER WAGNER BLACK ROBERTSON BOYD ROSE STONE SALAZAR FOX WARREN MILLS MEYER RICE SCHMIDT GARZA DANIELS FERGUSON NICHOLS STEPHENS SOTO WEAVER RYAN GARDNER PAYNE GRANT DUNN KELLEY SPENCER HAWKINS ARNOLD PIERCE VAZQUEZ HANSEN PETERS SANTOS HART BRADLEY KNIGHT ELLIOTT CUNNINGHAM DUNCAN ARMSTRONG HUDSON CARROLL LANE RILEY ANDREWS ALVARADO RAY DELGADO BERRY PERKINS HOFFMAN JOHNSTON MATTHEWS PENA RICHARDS CONTRERAS WILLIS CARPENTER LAWRENCE SANDOVAL GUERRERO GEORGE CHAPMAN RIOS ESTRADA ORTEGA WATKINS GREENE NUNEZ WHEELER VALDEZ HARPER BURKE LARSON SANTIAGO MALDONADO MORRISON FRANKLIN CARLSON AUSTIN DOMINGUEZ CARR LAWSON JACOBS OBRIEN LYNCH SINGH VEGA BISHOP MONTGOMERY OLIVER JENSEN HARVEY WILLIAMSON GILBERT DEAN SIMS ESPINOZA HOWELL LI WONG REID HANSON LE MCCOY GARRETT BURTON FULLER WANG WEBER WELCH ROJAS LUCAS MARQUEZ FIELDS PARK YANG LITTLE BANKS PADILLA DAY WALSH BOWMAN SCHULTZ LUNA FOWLER MEJIA DAVIDSON ACOSTA BREWER MAY HOLLAND JUAREZ NEWMAN PEARSON CURTIS CORTEZ DOUGLAS SCHNEIDER JOSEPH BARRETT NAVARRO FIGUEROA KELLER AVILA WADE MOLINA STANLEY HOPKINS CAMPOS BARNETT BATES CHAMBERS CALDWELL BECK LAMBERT MIRANDA BYRD CRAIG AYALA LOWE FRAZIER POWERS NEAL LEONARD GREGORY CARRILLO SUTTON FLEMING RHODES SHELTON SCHWARTZ NORRIS JENNINGS WATTS DURAN WALTERS COHEN MCDANIEL MORAN PARKS STEELE VAUGHN BECKER HOLT DELEON BARKER TERRY HALE LEON HAIL BENSON HAYNES HORTON MILES LYONS PHAM GRAVES BUSH THORNTON WOLFE WARNER CABRERA MCKINNEY MANN ZIMMERMAN DAWSON LARA FLETCHER PAGE MCCARTHY LOVE ROBLES CERVANTES SOLIS ERICKSON REEVES CHANG KLEIN SALINAS FUENTES BALDWIN DANIEL SIMON VELASQUEZ HARDY HIGGINS AGUIRRE LIN CUMMINGS CHANDLER SHARP BARBER BOWEN OCHOA DENNIS ROBBINS LIU RAMSEY FRANCIS GRIFFITH PAUL BLAIR OCONNOR CARDENAS PACHECO CROSS CALDERON QUINN MOSS SWANSON CHAN RIVAS KHAN RODGERS SERRANO FITZGERALD ROSALES STEVENSON CHRISTENSEN MANNING GILL CURRY MCLAUGHLIN HARMON MCGEE GROSS DOYLE GARNER NEWTON BURGESS REESE WALTON BLAKE TRUJILLO ADKINS BRADY GOODMAN ROMAN WEBSTER GOODWIN FISCHER HUANG POTTER DELACRUZ MONTOYA TODD WU HINES MULLINS CASTANEDA MALONE CANNON TATE MACK SHERMAN HUBBARD HODGES ZHANG GUERRA WOLF VALENCIA SAUNDERS FRANCO ROWE GALLAGHER FARMER HAMMOND HAMPTON TOWNSEND INGRAM WISE GALLEGOS CLARKE BARTON SCHROEDER MAXWELL WATERS LOGAN CAMACHO STRICKLAND NORMAN PERSON COLON PARSONS FRANK HARRINGTON GLOVER OSBORNE BUCHANAN CASEY FLOYD PATTON IBARRA BALL TYLER SUAREZ BOWERS OROZCO SALAS COBB GIBBS ANDRADE BAUER CONNER MOODY ESCOBAR MCGUIRE LLOYD MUELLER HARTMAN FRENCH KRAMER MCBRIDE POPE LINDSEY VELAZQUEZ NORTON MCCORMICK SPARKS FLYNN YATES HOGAN MARSH MACIAS VILLANUEVA ZAMORA PRATT STOKES OWEN BALLARD LANG BROCK VILLARREAL CHARLES DRAKE BARRERA CAIN PATRICK PINEDA BURNETT MERCADO SANTANA SHEPHERD BAUTISTA ALI SHAFFER LAMB TREVINO MCKENZIE HESS BEIL OLSEN COCHRAN MORTON NASH WILKINS PETERSEN BRIGGS SHAH ROTH NICHOLSON HOLLOWAY LOZANO RANGEL FLOWERS HOOVER SHORT ARIAS MORA VALENZUELA BRYAN MEYERS WEISS UNDERWOOD BASS GREER SUMMERS HOUSTON CARSON MORROW CLAYTON WHITAKER DECKER YODER COLLIER ZUNIGA CAREY WILCOX MELENDEZ POOLE ROBERSON LARSEN CONLEY DAVENPORT COPELAND MASSEY LAM HUFF ROCHA CAMERON JEFFERSON HOOD MONROE ANTHONY PITTMAN HUYNH RANDALL SINGLETON KIRK COMBS MATHIS CHRISTIAN SKINNER BRADFORD RICHARD GALVAN WALL BOONE KIRBY WILKINSON BRIDGES BRUCE ATKINSON VELEZ MEZA ROY VINCENT YORK HODGE VILLA ABBOTT ALLISON TAPIA GATES CHASE SOSA SWEENEY FARRELL WYATT DALTON HORN BARRON PHELPS YU DICKERSON HEATH FOLEY ATKINS MATHEWS BONILLA ACEVEDO BENITEZ ZAVALA HENSLEY GLENN CISNEROS HARRELL SHIELDS RUBIO HUFFMAN CHOI BOYER GARRISON ARROYO BOND KANE HANCOCK CALLAHAN DILLON CLINE WIGGINS GRIMES ARELLANO MELTON ONEILL SAVAGE HO BELTRAN PITTS PARRISH PONCE RICH BOOTH KOCH GOLDEN WARE BRENNAN MCDOWELL MARKS CANTU HUMPHREY BAXTER SAWYER CLAY TANNER HUTCHINSON KAUR BERG WILEY GILMORE RUSSO VILLEGAS HOBBS KEITH WILKERSON AHMED BEARD MCCLAIN MONTES MATA ROSARIO VANG WALTER HENSON ONEAL MOSLEY MCCLURE BEASLEY STEPHENSON SNOW HUERTA PRESTON VANCE BARRY JOHNS EATON BLACKWELL DYER PRINCE MACDONALD SOLOMON GUEVARA STAFFORD ENGLISH HURST WOODARD CORTES SHANNON KEMP NOLAN MCCULLOUGH MERRITT MURILLO MOON SALGADO STRONG KLINE CORDOVA BARAJAS ROACH ROSAS WINTERS JACOBSON LESTER KNOX BULLOCK KERR LEACH MEADOWS ORR DAVILA WHITEHEAD PRUITT KENT CONWAY MCKEE BARR DAVID DEJESUS MARIN BERGER MCINTYRE BLANKENSHIP GAINES PALACIOS CUEVAS BARTLETT DURHAM DORSEY MCCALL ODONNELL STEIN BROWNING STOUT LOWERY SLOAN MCLEAN HENDRICKS CALHOUN SEXTON CHUNG GENTRY HULL DUARTE ELLISON NIELSEN GILLESPIE BUCK MIDDLETON SELLERS LEBLANC ESPARZA HARDIN BRADSHAW MCINTOSH HOWE LIVINGSTON FROST GLASS MORSE KNAPP HERMAN STARK BRAVO NOBLE SPEARS WEEKS CORONA FREDERICK BUCKLEY MCFARLAND HEBERT ENRIQUEZ HICKMAN QUINTERO RANDOLPH SCHAEFER WALLS TREJO HOUSE REILLY PENNINGTON MICHAEL CONRAD GILES BENJAMIN CROSBY FITZPATRICK DONOVAN MAYS MAHONEY VALENTINE RAYMOND MEDRANO HAHN MCMILLAN SMALL BENTLEY FELIX PECK LUCERO BOYLE HANNA PACE RUSH HURLEY HARDING MCCONNELL BERNAL NAVA AYERS EVERETT VENTURA AVERY PUGH MAYER BENDER SHEPARD MCMAHON LANDRY CASE SAMPSON MOSES MAGANA BLACKBURN DUNLAP GOULD DUFFY VAUGHAN HERRING MCKAY ESPINOSA RIVERS FARLEY BERNARD ASHLEY FRIEDMAN POTTS TRUONG COSTA CORREA BLEVINS NIXON CLEMENTS FRY DELAROSA BEST BENTON LUGO PORTILLO DOUGHERTY CRANE HALEY PHAN VILLALOBOS BLANCHARD HORNE FINLEY QUINTANA LYNN ESQUIVEL BEAN DODSON MULLEN XIONG HAYDEN CANO LEVY HUBER RICHMOND MOYER LIM FRYE SHEPPARD MCCARTY AVALOS BOOKER WALLER PARRA WOODWARD JARAMILLO KRUEGER RASMUSSEN BRANDT PERALTA DONALDSON STUART FAULKNER MAYNARD GALINDO COFFEY ESTES SANFORD BURCH MADDOX VO OCONNELL VU ANDERSEN SPENCE MCPHERSON CHURCH SCHMITT STANTON LEAL CHERRY COMPTON DUDLEY SIERRA POLLARD ALFARO HESTER PROCTOR LU HINTON NOVAK GOOD MADDEN MCCANN TERRELL JARVIS DICKSON REYNA CANTRELL MAYO BRANCH HENDRIX ROLLINS ROWLAND WHITNEY DUKE ODOM DAUGHERTY TRAVIS TANG ARCHER"
counts = "2_442_977 1_932_812 1_625_252 1_437_026 1_425_470 1_166_120 1_161_437 1_116_357 1_094_924 1_060_159 1_043_281 874_523 841_025 801_882 784_404 756_142 751_209 724_374 708_099 702_625 693_023 681_645 664_644 660_491 624_252 612_752 562_679 557_423 531_781 529_821 523_129 484_447 482_607 465_422 458_980 439_530 437_813 437_645 434_827 433_969 430_182 427_865 424_958 419_586 407_076 391_114 386_157 384_486 376_966 376_774 365_655 360_802 355_593 348_627 347_636 336_221 334_201 332_423 329_770 327_904 324_957 318_884 311_777 308_417 302_589 302_261 293_218 286_899 286_280 280_791 278_297 277_845 277_030 267_394 264_826 263_464 262_352 261_231 260_464 259_798 252_579 251_663 250_898 250_715 249_379 247_599 246_116 242_771 238_234 236_271 235_251 233_983 230_420 230_374 229_973 229_895 229_374 229_368 227_764 227_118 224_874 222_653 221_741 221_558 220_990 220_599 219_070 218_847 218_393 218_241 214_758 214_703 212_781 210_182 208_614 208_403 205_423 204_621 201_746 201_159 200_247 198_406 197_276 196_925 195_818 195_289 194_246 192_773 192_711 190_667 188_968 188_498 188_497 186_512 185_674 184_910 184_832 184_134 183_922 182_719 181_091 180_842 180_497 177_425 177_386 176_865 176_230 173_835 170_964 169_580 169_149 168_878 167_446 167_044 165_925 164_457 164_035 163_181 163_054 162_440 161_833 161_717 161_633 160_400 160_262 160_213 159_480 158_483 158_421 158_320 156_780 156_601 155_795 154_738 153_666 153_469 153_397 153_329 152_703 152_334 152_147 151_942 150_895 149_500 147_034 147_005 146_570 146_426 145_584 144_646 144_451 143_837 143_452 142_894 142_601 142_277 141_427 140_693 139_951 139_751 138_893 138_629 138_322 137_977 137_513 137_232 137_184 136_720 136_713 135_765 135_718 135_187 135_044 134_963 134_317 134_227 133_872 133_799 133_501 133_171 132_985 132_812 131_440 131_401 131_373 131_303 130_776 130_529 130_164 130_152 129_898 129_699 128_948 128_677 128_625 127_939 127_794 127_470 127_256 127_083 126_101 125_350 125_058 124_995 124_461 122_877 122_587 122_212 121_526 121_130 120_621 120_552 119_706 119_304 119_076 119_053 118_614 118_557 117_708 116_749 116_673 116_618 115_953 115_900 115_679 115_662 114_959 114_940 114_030 113_374 112_154 112_041 111_786 111_371 111_360 111_144 110_967 110_744 110_697 110_529 110_116 109_883 109_433 108_987 108_421 107_690 107_533 107_522 106_696 106_033 105_936 105_833 105_365 105_091 105_079 105_007 104_888 104_518 104_515 104_057 103_930 103_418 103_318 103_306 102_538 101_949 101_931 101_836 101_801 101_694 101_458 101_290 100_959 100_104 99_807 98_468 98_268 97_314 97_040 96_979 96_867 96_810 96_111 95_681 95_622 94_988 93_944 93_786 93_678 93_628 92_904 92_507 92_463 92_260 92_152 91_970 91_694 91_475 91_384 91_129 90_964 90_677 90_670 90_517 90_071 89_796 89_700 89_649 89_401 89_376 89_091 88_728 88_615 88_586 88_230 88_060 87_859 87_531 87_414 87_162 87_000 86_618 86_363 86_240 86_081 85_974 85_195 84_942 84_516 84_320 84_179 84_018 83_967 83_928 83_781 83_621 83_616 83_510 83_265 83_182 83_067 83_063 82_992 82_950 82_873 82_458 82_161 82_146 82_085 81_978 81_939 81_471 81_156 81_006 80_742 80_526 80_460 80_364 80_252 79_803 79_517 79_508 79_316 79_186 78_990 78_848 78_822 78_677 78_482 78_381 78_370 78_350 78_327 78_260 78_256 78_026 77_923 77_652 77_642 77_557 77_085 76_986 76_908 76_897 76_664 76_205 76_171 76_095 75_996 75_356 75_185 75_169 75_143 74_949 74_948 74_919 74_816 74_737 74_542 74_503 74_458 74_324 74_092 73_931 73_919 73_854 73_797 73_664 73_599 73_145 73_136 72_918 72_625 72_451 72_357 72_328 72_175 72_109 71_844 71_759 71_721 71_717 71_646 71_368 71_286 71_085 71_058 71_056 70_502 70_362 70_223 70_125 70_071 70_031 70_000 69_943 69_943 69_879 69_834 69_617 69_515 69_472 69_360 69_345 68_649 68_373 68_281 68_233 67_977 67_961 67_929 67_909 67_893 67_769 67_704 67_411 67_338 67_310 67_304 66_959 66_858 66_827 66_648 66_556 66_454 66_293 66_063 66_059 66_056 66_013 66_003 65_904 65_468 65_125 65_064 65_037 65_004 64_572 64_429 64_403 64_327 64_202 64_191 64_106 63_991 63_936 63_899 63_881 63_760 63_736 63_722 63_649 63_440 63_400 63_254 63_085 62_304 62_227 61_883 61_729 61_671 61_639 61_630 61_625 61_529 61_369 61_355 61_211 61_162 60_998 60_948 60_845 60_820 60_791 60_761 60_667 60_479 60_264 60_002 59_943 59_913 59_882 59_595 59_486 59_463 59_356 59_350 59_213 58_714 58_634 58_480 58_408 58_287 58_278 58_151 58_040 57_779 57_549 57_549 57_497 57_477 57_477 57_464 57_383 57_143 57_127 57_112 57_064 57_044 57_043 56_953 56_900 56_872 56_840 56_638 56_616 56_576 56_410 56_380 56_347 56_322 56_286 56_230 56_226 56_180 55_960 55_917 55_895 55_850 55_595 55_554 55_484 55_251 55_240 55_179 55_174 55_136 55_114 55_021 54_996 54_764 54_621 54_394 54_257 54_217 54_198 54_046 54_015 53_893 53_822 53_794 53_792 53_767 53_739 53_682 53_419 53_376 53_265 53_230 53_159 53_095 53_059 52_920 52_817 52_739 52_701 52_651 52_569 52_481 52_457 52_410 52_321 52_211 52_184 52_138 52_070 52_044 52_035 51_889 51_877 51_865 51_671 51_592 51_475 51_351 51_288 51_153 51_081 51_043 50_920 50_837 50_832 50_788 50_786 50_786 50_742 50_686 50_614 50_610 50_584 50_558 50_524 50_465 50_258 50_247 50_245 50_104 50_069 50_028 49_914 49_817 49_776 49_740 49_733 49_549 49_481 49_402 49_395 49_360 49_316 49_238 49_217 49_177 49_126 49_056 49_033 49_028 48_844 48_813 48_781 48_753 48_746 48_720 48_719 48_696 48_599 48_522 48_487 48_444 48_319 48_207 48_165 48_142 48_120 48_051 48_036 48_024 48_013 47_979 47_963 47_742 47_693 47_641 47_528 47_455 47_367 47_324 47_274 47_246 47_184 47_175 47_170 47_168 46_717 46_534 46_454 46_394 46_393 46_244 46_240 46_229 46_147 46_146 46_054 45_852 45_594 45_558 45_528 45_469 45_432 45_390 45_305 45_153 45_019 44_938 44_914 44_808 44_784 44_742 44_740 44_711 44_581 44_500 44_388 44_388 44_373 44_365 44_325 44_320 44_137 44_130 44_040 44_038 43_904 43_851 43_842 43_830 43_821 43_798 43_701 43_648 43_635 43_631 43_483 43_460 43_389 43_329 43_305 43_278 43_261 43_260 43_197 43_180 43_133 43_110 43_027 43_018 42_983 42_827 42_773 42_693 42_639 42_578 42_577 42_575 42_559 42_469 42_465 42_379 42_265 42_103 42_015 41_802 41_774 41_771 41_750 41_735 41_700 41_667 41_665 41_565 41_553 41_394 41_348 41_300 41_275 41_271 41_163 41_158 41_129 41_063 41_025 41_021 41_000 40_884 40_854 40_736 40_707 40_598 40_590 40_563 40_449 40_410 40_408 40_397 40_395 40_275 40_261 40_250 40_237 40_212 40_193 40_165 40_055 39_986 39_921 39_890 39_879 39_802 39_796 39_787 39_754 39_693 39_670 39_623 39_593 39_580 39_564 39_559 39_555 39_551 39_430 39_411 39_391 39_319 39_277 39_216 39_105 39_097 39_063 38_924 38_835 38_830 38_733 38_681 38_667 38_662 38_528 38_512 38_499 38_374 38_277 38_267 38_265 38_232 38_229 38_147 38_044 38_029 37_932 37_923 37_912 37_903 37_890 37_884 37_870 37_858 37_836 37_754 37_695 37_689 37_672 37_657 37_644 37_578 37_571 37_566 37_502 37_499 37_451 37_368 37_228 37_170 37_053 37_050 37_021 36_973 36_960 36_944 36_922 36_840 36_805 36_765 36_764 36_755 36_743 36_636 36_613 36_585 36_558 36_540 36_466 36_460 36_429 36_423 36_318 36_312 36_269 36_250 36_236 36_194 36_179 36_150 36_129 36_125 36_072 36_043 35_997 35_958 35_877 35_830 35_781 35_770 35_749 35_725 35_642 35_636 35_628 35_606 35_461 35_446 35_438 35_408 35_408 35_350 35_312 35_291 35_266 35_228 35_225 35_194 35_132 35_121 35_118 35_053 35_020 34_987 34_985 34_961 34_949"
namecounts = pd.DataFrame({
  "names": names.split(),
  "counts":  [int(x) for x in counts.split()]
})
namecounts = namecounts.sort_values(by="names").reset_index(drop=True)

ExploreNames(namecounts, 4)

运行它会产生以下结果:

k 1
Max count: 26760957
         A-         C 22603070
         D-         H 26468447
         I-         M 26292942
         N-         R 18722725
         S-         Z 26760957
k 2
Max count: 24879502
        AB-        DA 24470220
        DE-        HO 23293726
        HU-        MI 23537487
        MO-        SA 24667206
        SC-        ZU 24879502
k 3
Max count: 24291136
       ABB-       DAV 24281947
       DAW-       HUG 24186818
       HUL-       MOO 24055053
       MOR-       SCH 24033187
       SCO-       ZUN 24291136
k 4
Max count: 24291136
      ABBO-      DAVI 24281947
      DAWS-      HUGH 24186818
      HULL-      MOOR 24055053
      MORA-      SCHW 24033187
      SCOT-      ZUNI 24291136

这表明,对于使用三个字母前缀的前 1000 个美国姓氏,可以将姓氏最佳分配给超市购物日。


推荐阅读