首页 > 解决方案 > 在 Python 中为多个数据帧动态使用连接条件

问题描述

我通过 JSON 获得 1-n 个表,这些表可以通过多个条件连接在一起。JOIN 列的名称在两个表之间并不总是相同的。在 JSON 片段中,JOIN 是重复的,但这只是一个示例:

"relationships" : [   
        {
            "order" : 1,
            "tableName" : "Production.Product",
            "destination" : "Production.ProductSubcategory",
            "joinDirection" : "LEFT",
            "conditions" : [
                {
                    "sourceColumnName" : "ProductSubcategoryID",
                    "destinationColumnName" : "ProductSubcategoryID",
                    "operator" : "=",
                    "combineCondition" : null
                },
                {
                    "sourceColumnName" : "ProductSubcategoryID",
                    "destinationColumnName" : "ProductSubcategoryID",
                    "operator" : "=",
                    "combineCondition" : "AND"
                }
            ]
        }
    ],

我现在想在我的表达式中动态存储连接条件。我可以正确地创建我的表达式,但它返回一个文本并且无法在表达式中正确评估它。谁能告诉我如何将表达式从文本更改为表达式?

sourceTableListN = []
  for tableName in sourceTableList:
    if tableName['sqlQuery'] is None:
      sourceTableListN.append([tableName['sourceTableName'], DataFrameReaderSourceClass.readSourceFiles(tableName['sourceTableName'])])
  df_result = sourceTableListN[0][1].alias(sourceTableListN[0][0])
  for df_next in sourceTableListN[1:]:
    df_next = df_next[0:][1].alias(sourceTableListN[0][0])
    cond = SEQ[String]    
    for rel in relationships:
      sourceTableName = rel['tableName']      
      destinationTableName = rel['destination']
      for c in rel['conditions']:
        #print(c)
        cond = cond + str(c['combineCondition'] or '') + ' (df_result.'  + c['sourceColumnName'] + ' ' + c['operator'] + ' ' + 'df_next.' + c['destinationColumnName'] + ') '
        print(cond)

    df_result = df_result.join(df_next,  #(df_result.ProductSubcategoryID == df_next.ProductSubcategoryID) & (df_result.ProductSubcategoryID == df_next.ProductSubcategoryID)
                               cond
                               , how='inner')  
    
  df_result.show()

谢谢,托马斯

标签: pythondataframejoin

解决方案


推荐阅读