首页 > 解决方案 > Transform AST nodes into vectors/numbers

问题描述

Good day, I have a collection of Python 3 code in abstract syntax trees (ASTs). I have been trying for multiple days now to figure out the best way to convert the nodes into usable vector/number representations.

For example, here is one AST dumped (without annotated fields):

Module([Import([alias('time', None), alias('sys', None), alias('pygame', None)]), Import([alias('random', None)]), Import([alias('sequence', None)]), Assign([Name('S_SIZE', Store()), Tuple([Name('S_WID', Store()), Name('S_HGT', Store())], Store())], Tuple([Num(600), Num(400)], Load())), Assign([Name('screen', Store())], Call(Attribute(Attribute(Name('pygame', Load()), 'display', Load()), 'set_mode', Load()), [Name('S_SIZE', Load())], [])), Assign([Name('NUMB_COUNT', Store())], Num(200)), Assign([Name('nlist', Store())], ListComp(BinOp(Call(Attribute(Name('random', Load()), 'random', Load()), [], []), Mult(), Name('S_HGT', Load())), [comprehension(Name('_', Store()), Call(Name('range', Load()), [Name('NUMB_COUNT', Load())], []), [], 0)])), Assign([Name('num', Store())], Call(Attribute(Name('sequence', Load()), 'NumGroup', Load()), [Name('nlist', Load()), Name('S_WID', Load()), Name('S_HGT', Load())], [])), FunctionDef('draw_all', arguments([], None, [], [], None, []), [Expr(Call(Attribute(Name('screen', Load()), 'fill', Load()), [Tuple([Num(0), Num(0), Num(0)], Load())], [])), Expr(Call(Attribute(Name('num', Load()), 'draw', Load()), [Name('screen', Load())], [])), Expr(Call(Attribute(Attribute(Name('pygame', Load()), 'display', Load()), 'flip', Load()), [], []))], [], None), FunctionDef('bubble_sort', arguments([arg('nlist', None), arg('i', None), arg('end_ind', None)], None, [], [], None, []), [If(Compare(Name('i', Load()), [Eq()], [Name('end_ind', Load())]), [Return(Tuple([Num(0), BinOp(Name('end_ind', Load()), Sub(), Num(1))], Load()))], []), If(Compare(Attribute(Subscript(Name('nlist', Load()), Index(Name('i', Load())), Load()), 'val', Load()), [Gt()], [Attribute(Subscript(Name('nlist', Load()), Index(BinOp(Name('i', Load()), Add(), Num(1))), Load()), 'val', Load())]), [Expr(Call(Attribute(Name('nlist', Load()), 'swap', Load()), [Name('i', Load()), BinOp(Name('i', Load()), Add(), Num(1))], []))], []), Return(Tuple([BinOp(Name('i', Load()), Add(), Num(1)), Name('end_ind', Load())], Load()))], [], None), If(Compare(Name('__name__', Load()), [Eq()], [Str('__main__')]), [Expr(Call(Attribute(Name('pygame', Load()), 'init', Load()), [], [])), Assign([Name('i', Store())], Num(0)), Assign([Name('end_ind', Store())], BinOp(Call(Name('len', Load()), [Name('num', Load())], []), Sub(), Num(1))), While(Num(1), [For(Name('event', Store()), Call(Attribute(Attribute(Name('pygame', Load()), 'event', Load()), 'get', Load()), [], []), [If(Compare(Attribute(Name('event', Load()), 'type', Load()), [Eq()], [Attribute(Name('pygame', Load()), 'QUIT', Load())]), [Expr(Call(Attribute(Name('sys', Load()), 'exit', Load()), [], []))], [])], []), For(Name('n', Store()), Name('num', Load()), [Expr(Call(Attribute(Name('n', Load()), 'set_color', Load()), [Tuple([Num(255), Num(255), Num(255)], Load())], []))], []), If(Compare(Name('end_ind', Load()), [NotEq()], [Num(0)]), [Expr(Call(Attribute(Subscript(Name('num', Load()), Index(Name('i', Load())), Load()), 'set_color', Load()), [Tuple([Num(0), Num(255), Num(0)], Load())], []))], []), If(Compare(Name('end_ind', Load()), [NotEq()], [Num(0)]), [Assign([Tuple([Name('i', Store()), Name('end_ind', Store())], Store())], Call(Name('bubble_sort', Load()), [Name('num', Load()), Name('i', Load()), Name('end_ind', Load())], []))], []), Expr(Call(Attribute(Name('num', Load()), 'update', Load()), [], [])), Expr(Call(Name('draw_all', Load()), [], []))], [])], [])])

I want to make it into something like this so I can feed it into TensorFlow:

[1,2,3,1,2,13,56,12,53,41,31...etc]

I found a copy of all the nodes (transformed into a dictionary):

NODE_LIST = [
'Module','Interactive','Expression','FunctionDef','ClassDef','Return',
'Delete','Assign','AugAssign','Print','For','While','If','With','Raise',
'TryExcept','TryFinally','Assert','Import','ImportFrom','Exec','Global',
'Expr','Pass','Break','Continue','attributes','BoolOp','BinOp','UnaryOp',
'Lambda','IfExp','Dict','Set','ListComp','SetComp','DictComp',
'GeneratorExp','Yield','Compare','Call','Repr','Num','Str','Attribute',
'Subscript','Name','List','Tuple','Load','Store','Del',
'AugLoad','AugStore','Param','Ellipsis','Slice','ExtSlice','Index','And','Or',
'Add','Sub','Mult','Div','Mod','Pow','LShift','RShift','BitOr','BitXor',
'BitAnd','FloorDiv','Invert','Not','UAdd','USub','Eq','NotEq','Lt',
'LtE','Gt','GtE','Is','IsNot','In','NotIn','comprehension','ExceptHandler',
'arguments','keyword','alias']

NODE_MAP = {x: i for (i, x) in enumerate(NODE_LIST)}

For example,

{'Module':1,'Interactive':2,...etc}

I experimented with ASTWalkers and generators, but I still can't find a good way to accomplish this. Any help is appreciated :)

EDIT: I think I may be looking for ast.NodeVisitor's def visit_Name

class ToInteger(ast.NodeVisitor):

    def visit_Name(self, node):
        print(node.id)
        print(NODE_MAP[node.id])

This gives exactly what I'm looking for (slice of output):

Module
0
Import
18
alias
91
alias
91
alias
91
Import
18
alias

My main issue now is extracting NODE_MAP[node.id], since return can only be used for returning a modified tree.

标签: pythonpython-3.xtensorflowabstract-syntax-treetensorflow2.0

解决方案


ast.dump显示node.__class__.__name__,所以我猜这是您要映射到数字的字符串,而数字又由NODE_LIST.

class CustomNodeVisitor(ast.NodeVisitor):
    def visit(self, node):
        print(node.__class__.__name__)
        return ast.NodeVisitor.visit(self, node)

推荐阅读