python - Transform AST nodes into vectors/numbers
问题描述
Good day, I have a collection of Python 3 code in abstract syntax trees (ASTs). I have been trying for multiple days now to figure out the best way to convert the nodes into usable vector/number representations.
For example, here is one AST dumped (without annotated fields):
Module([Import([alias('time', None), alias('sys', None), alias('pygame', None)]), Import([alias('random', None)]), Import([alias('sequence', None)]), Assign([Name('S_SIZE', Store()), Tuple([Name('S_WID', Store()), Name('S_HGT', Store())], Store())], Tuple([Num(600), Num(400)], Load())), Assign([Name('screen', Store())], Call(Attribute(Attribute(Name('pygame', Load()), 'display', Load()), 'set_mode', Load()), [Name('S_SIZE', Load())], [])), Assign([Name('NUMB_COUNT', Store())], Num(200)), Assign([Name('nlist', Store())], ListComp(BinOp(Call(Attribute(Name('random', Load()), 'random', Load()), [], []), Mult(), Name('S_HGT', Load())), [comprehension(Name('_', Store()), Call(Name('range', Load()), [Name('NUMB_COUNT', Load())], []), [], 0)])), Assign([Name('num', Store())], Call(Attribute(Name('sequence', Load()), 'NumGroup', Load()), [Name('nlist', Load()), Name('S_WID', Load()), Name('S_HGT', Load())], [])), FunctionDef('draw_all', arguments([], None, [], [], None, []), [Expr(Call(Attribute(Name('screen', Load()), 'fill', Load()), [Tuple([Num(0), Num(0), Num(0)], Load())], [])), Expr(Call(Attribute(Name('num', Load()), 'draw', Load()), [Name('screen', Load())], [])), Expr(Call(Attribute(Attribute(Name('pygame', Load()), 'display', Load()), 'flip', Load()), [], []))], [], None), FunctionDef('bubble_sort', arguments([arg('nlist', None), arg('i', None), arg('end_ind', None)], None, [], [], None, []), [If(Compare(Name('i', Load()), [Eq()], [Name('end_ind', Load())]), [Return(Tuple([Num(0), BinOp(Name('end_ind', Load()), Sub(), Num(1))], Load()))], []), If(Compare(Attribute(Subscript(Name('nlist', Load()), Index(Name('i', Load())), Load()), 'val', Load()), [Gt()], [Attribute(Subscript(Name('nlist', Load()), Index(BinOp(Name('i', Load()), Add(), Num(1))), Load()), 'val', Load())]), [Expr(Call(Attribute(Name('nlist', Load()), 'swap', Load()), [Name('i', Load()), BinOp(Name('i', Load()), Add(), Num(1))], []))], []), Return(Tuple([BinOp(Name('i', Load()), Add(), Num(1)), Name('end_ind', Load())], Load()))], [], None), If(Compare(Name('__name__', Load()), [Eq()], [Str('__main__')]), [Expr(Call(Attribute(Name('pygame', Load()), 'init', Load()), [], [])), Assign([Name('i', Store())], Num(0)), Assign([Name('end_ind', Store())], BinOp(Call(Name('len', Load()), [Name('num', Load())], []), Sub(), Num(1))), While(Num(1), [For(Name('event', Store()), Call(Attribute(Attribute(Name('pygame', Load()), 'event', Load()), 'get', Load()), [], []), [If(Compare(Attribute(Name('event', Load()), 'type', Load()), [Eq()], [Attribute(Name('pygame', Load()), 'QUIT', Load())]), [Expr(Call(Attribute(Name('sys', Load()), 'exit', Load()), [], []))], [])], []), For(Name('n', Store()), Name('num', Load()), [Expr(Call(Attribute(Name('n', Load()), 'set_color', Load()), [Tuple([Num(255), Num(255), Num(255)], Load())], []))], []), If(Compare(Name('end_ind', Load()), [NotEq()], [Num(0)]), [Expr(Call(Attribute(Subscript(Name('num', Load()), Index(Name('i', Load())), Load()), 'set_color', Load()), [Tuple([Num(0), Num(255), Num(0)], Load())], []))], []), If(Compare(Name('end_ind', Load()), [NotEq()], [Num(0)]), [Assign([Tuple([Name('i', Store()), Name('end_ind', Store())], Store())], Call(Name('bubble_sort', Load()), [Name('num', Load()), Name('i', Load()), Name('end_ind', Load())], []))], []), Expr(Call(Attribute(Name('num', Load()), 'update', Load()), [], [])), Expr(Call(Name('draw_all', Load()), [], []))], [])], [])])
I want to make it into something like this so I can feed it into TensorFlow:
[1,2,3,1,2,13,56,12,53,41,31...etc]
I found a copy of all the nodes (transformed into a dictionary):
NODE_LIST = [
'Module','Interactive','Expression','FunctionDef','ClassDef','Return',
'Delete','Assign','AugAssign','Print','For','While','If','With','Raise',
'TryExcept','TryFinally','Assert','Import','ImportFrom','Exec','Global',
'Expr','Pass','Break','Continue','attributes','BoolOp','BinOp','UnaryOp',
'Lambda','IfExp','Dict','Set','ListComp','SetComp','DictComp',
'GeneratorExp','Yield','Compare','Call','Repr','Num','Str','Attribute',
'Subscript','Name','List','Tuple','Load','Store','Del',
'AugLoad','AugStore','Param','Ellipsis','Slice','ExtSlice','Index','And','Or',
'Add','Sub','Mult','Div','Mod','Pow','LShift','RShift','BitOr','BitXor',
'BitAnd','FloorDiv','Invert','Not','UAdd','USub','Eq','NotEq','Lt',
'LtE','Gt','GtE','Is','IsNot','In','NotIn','comprehension','ExceptHandler',
'arguments','keyword','alias']
NODE_MAP = {x: i for (i, x) in enumerate(NODE_LIST)}
For example,
{'Module':1,'Interactive':2,...etc}
I experimented with ASTWalkers and generators, but I still can't find a good way to accomplish this. Any help is appreciated :)
EDIT:
I think I may be looking for ast.NodeVisitor
's def visit_Name
class ToInteger(ast.NodeVisitor):
def visit_Name(self, node):
print(node.id)
print(NODE_MAP[node.id])
This gives exactly what I'm looking for (slice of output):
Module
0
Import
18
alias
91
alias
91
alias
91
Import
18
alias
My main issue now is extracting NODE_MAP[node.id], since return can only be used for returning a modified tree.
解决方案
ast.dump
显示node.__class__.__name__
,所以我猜这是您要映射到数字的字符串,而数字又由NODE_LIST
.
class CustomNodeVisitor(ast.NodeVisitor):
def visit(self, node):
print(node.__class__.__name__)
return ast.NodeVisitor.visit(self, node)
推荐阅读
- c - 宏 __HAL_TIM_SET_Compare 不起作用
- react-native - 如何清除 Windows 上的所有 react-native 缓存,文件在哪里?
- c# - 为什么 Bootstrap CSS 类 table-striped 在 asp.net 中不起作用?
- typescript - 在具有索引类型签名的类上使用省略类型会导致不需要最少的属性
- java - 我应该如何设计这个程序?
- powerbi - Power BI 中的日期转换
- react-native - 使用 react-native-image-picker 从图库中删除图像
- c++ - 为什么 _Printf_format_string_ 宏不产生任何警告?
- xaml - Xamarin 表单实现带有透明切口的底栏
- java - Apache poi 迁移到 jdk 11 问题