首页 > 解决方案 > 数一数树的叶子数?

问题描述

我正在尝试以有效的方式实现树(不一定是二叉树)中的叶子计数。

我有一个像这样的边缘输入文件(有两个分支,因此有两个叶子):

# nodeID     treeID    parentID level M
8915218131 11994463802 9019194630 84 1152
8915218132 11994463802 9019194631 84 1018
9019194630 11994463802 9122013305 85 1152
9122013305 11994463802 9224080882 86 1152
9122013306 11994463802 9224080883 86 1009
9224080882 11994463802 9325374898 87 1153
9325374898 11994463802 9426419957 88 1153
9426419957 11994463802 9526231677 89 1153
9526231677 11994463802 9625289594 90 1153
9625289594 11994463802 9723827035 91 1154
9723827035 11994463802 9822081910 92 1154
9822081910 11994463802 9919855221 93 1154
9919855221 11994463802 10017373825 94 1154
10017373825 11994463802 10114414643 95 1154
10114414643 11994463802 10211212821 96 1154
10114414644 11994463802 10211212822 96 1000
10211212821 11994463802 10307531761 97 1154
10307531761 11994463802 10403632357 98 1154
10403632357 11994463802 10499245188 99 1154
10499245188 11994463802 10594618828 100 1154
10594618828 11994463802 10689537668 101 1154
10689537668 11994463802 10784230758 102 1154
10784230758 11994463802 10878482733 103 1154
10878482733 11994463802 10972591178 104 1154
10972591178 11994463802 11066270407 105 1154
11066270407 11994463802 11159769310 106 1154
11159769310 11994463802 11252883435 107 1154
11252883435 11994463802 11345851308 108 1154
11345851308 11994463802 11438596137 109 1153
11438596137 11994463802 11531494001 110 1153
11531494001 11994463802 11624604030 111 1153
11624604030 11994463802 11718806817 112 1153
11718806817 11994463802 11811844824 113 1152
11811844824 11994463802 11904133551 114 1141
11904133551 11994463802 11994463802 115 1139
11994463802 11994463802 -1          116 1140

如您所见,最后一行是根(始终为 116 级)。在这种情况下,有两片叶子。

对于数百万棵这样的树,我需要过滤树M > 1100(这里的所有人都可以)并计算叶子的数量。这使得像 NetworkX 这样的图形库有点问题,因为构建树需要太长时间。

我目前计算叶子的方法是只计算没有任何边缘指向它们的节点。以下是我当前的代码。有没有我可以使用的更好的工具或技术?

我当前的 Python 代码:

import sys
from operator import itemgetter
Mmin = 1100

def handle(tree, treeid, Mfinal):
    # sort by levelid, if input not already presorted
    #tree.sort(itemgetter(3))
    nleafs = 0
    nonleaf = set()
    for id, rootid, parentid, layerid, M in tree:
        if M >= Mmin:
            nonleaf.add(parentid)
            if id not in nonleaf:
                nleafs += 1
    print(treeid, Mfinal, nleafs)
    sys.stdout.flush()

Mfinal = None
lasttreeid = None
tree = []
for l in sys.stdin:
    parts = l.strip().split()
    id, treeid, parentid, levelid, M = parts
    if lastrootid != treeid:
        if Mfinal is not None and Mfinal > Mmin and snapnum == "116":
            handle(tree, lasttreeid, Mfinal)
        tree = []
        lasttreeid = treeid
        Mfinal = None
    if parentid == '-1':
        Mfinal = int(M)
    tree.append((id, treeid, parentid, int(levelid), int(M)))

标签: pythontreegraph-theory

解决方案


收集所有节点并从中减去作为父节点的节点就足够了。用集合和集合差异操作来实现它是最容易的。

def leaves(filename):
    lasttreeid = None
    nodes = set()
    parents = set()
    for l in open(filename):
        nodeid, treeid, parentid, levelid, M = l.strip().split()
        if lasttreeid is None:
            lasttreeid = treeid
        if lasttreeid != treeid:
            print 'Tree %s, number of leaves %d' % (lasttreeid, len(nodes - parents))
            lasttreeid = treeid
            nodes = set()
            parents = set()
        nodes.add(nodeid)
        parents.add(parentid)
    print 'Tree %s, number of leaves %d' % (lasttreeid, len(nodes - parents))

推荐阅读