首页 > 解决方案 > Python dict显示键值列表选择其他键的唯一值

问题描述

我有一系列包含key,value对的文件名。例如,filename1包含:

A : U
B : 10
C : checksum1

我想根据选择其他键的唯一值来获得一组值。例如,如果我在文件中的键值可以表示为:

A  B     C         D
-------------------------
U 10 checksum1 filename1
U 10 checksum2 filename2
U 20 checksum3 filename3
V 20 checksum4 filename4
V 20 checksum5 filename5

我想获得:

t = table.unique_values_for(["A","B"]) 
# [("U",10), ("U",20), ("V,20")]

t.result_for_unique(["C","D"]) 
# [
# [(checksum1, filename1),(checksum2 filename2)], <-result for ("U",10)
# [(checksum3, filename3)], <- result for ("U",20)
# [(checksum4, filename4), (checksum5, filename5)] <- result for ("V,20")
# ]

我尝试过使用普通dict的 s, pandas, astropy.table

这是我迄今为止尝试过的测试:

class minidb():                                                                                                                                                    

    def __init__(self, pattern):                                                                                                                                   
        if isinstance(pattern, str):                                                                                                                               
            pattern = [pattern]                                                                                                                                    
        self.pattern = pattern                                                                                                                                     
        self.heads = [ get_fits_header(f, fast=True) for f in pattern ]                                                                                            
        keys = self.heads[0].keys()                                                                                                                                
        values = [ [ h.get(k) for h in self.heads ] for k in keys ]                                                                                                
        dic = dict(zip(keys, values))                                                                                                                              
        dic["ARP FILENAME"] = pattern # adding filename                                                                                                            
        self.dic = dic                                                                                                                                             
        self.table = Table(dic) # original                                                                                                                         
        self.data = self.table                                                                                                                                     
        self.unique = None                                                                                                                                         
        self.names = None                                                                                                                                          

    def unique_for(self, keys):                                                                                                                                    
        # if isinstance(keys, str):                                                                                                                                
        #     keys = [keys]                                                                                                                                        
        self.data = self.table.group_by(keys)                                                                                                                      
        self.unique = self.data.groups.keys.as_array().tolist()                                                                                                    
        return self.unique                                                                                                                                         

    def names_for(self, keys):                                                                                                                                     
        if isinstance(keys, str):                                                                                                                                  
            keys = [keys]                                                                                                                                          
        self.names = [ np.array(g[keys]).tolist() for g in self.data.groups]                                                                                       
        self.data = self.table[keys]                                                                                                                               
        return self.names                                                                                                                                          

标签: pythonpython-3.xdictionaryastropy

解决方案


Pandas 可以使用以下方法轻松完成此操作groupby

In [1]: df = pd.DataFrame([
   ...: dict(A='U', B=10, C=1, D=1),
   ...: dict(A='U', B=10, C=2, D=2),
   ...: dict(A='U', B=20, C=3, D=3),
   ...: dict(A='V', B=20, C=4, D=4),
   ...: dict(A='V', B=20, C=5, D=5)
   ...: ])

In [2]: list(df.groupby(['A', 'B']))
Out[2]:
[(('U', 10),
     A   B  C  D
  0  U  10  1  1
  1  U  10  2  2),
 (('U', 20),
     A   B  C  D
  2  U  20  3  3),
 (('V', 20),
     A   B  C  D
  3  V  20  4  4
  4  V  20  5  5)]

该列表中的每个元素都是键(“A”和“B”的值)和数据框(技术上是原始数据框的视图)的元组,其中仅包含具有“A”和“B”值的行”。您可以循环分组结果并从“C”和“D”中提取您想要的任何信息,因为您通常会从数据框中获取数据。


推荐阅读