首页 > 解决方案 > BigQuery Update where WHERE condition has multiple rows (Limit Update)

问题描述

I have a problem when trying to map data from a table with data from another table in Google BigQuery. To make it simple: I have the dataTable with column data_modelname, data_version, data_date. The mappingTable contains mapping_modelname, mapping_version, mapping_date.

When I try this code,

UPDATE dataTable
SET
    dataTable.data_date = mappingTable.data_date,
    dataTable.data_version = mappingTable.data_version

FROM mappingTable
WHERE 
    LOWER(data_modelname) IN (
    SELECT DISTINCT LOWER(mapping_modelname) AS distinctModel
    FROM mappingTable 
    WHERE mappingTable.data_version IS NOT NULL 
    )

I am getting the following error message from BigQuery:

UPDATE/MERGE must match at most one source row for each target row

The select distinct statement returns the list of ~300 distinct models. Thus, when considering the WHERE-condition, there will be only one result where e.g., LOWER(data_modelname) is LOWER(ABC123).

Unfortunately my mappingTable contains more than one entry per model (duplicates), but I only want the first found entry. Updating with LIMIT 1 did not work. I also tried this code, with a mapping table with less entries and thus no duplicates. This worked well. However, it returns the same error when there is more than one modelname.

WHERE LOWER(data_modelname) = LOWER(mapping_modelname)

Could you please help me to insert (update) the from the mappingTable.data_date and mappingTable.data_version from the mapping table into the data table where the data_modelname is mapping_modelname. Further it would be great to be able to use some wildcards or regular expressions for mapping the modelnames, e.g. WHERE LOWER(data_modelname) IS LIKE '% mapping_modelname %'.

Thanks a lot.


Edit, new problem:

I now came across another problem. Same situation and tables, but what I now need to do is to update the field data_date_Installed where model and data_version_Installed are the same. So in my data-table there is a version-field "3.4" and data_model "ABC123". In the mapping-table there is also mapping_model "ABC123" and many different versions in the mapping_version_Installed, and of course also "3.4" and the corresponding date of that version in the field mapping_date_installed. I have tried this code, however it only matches 88 values out of 1m.

UPDATE dataTable dt
    SET dt.data_date_installed = mapping_date_installed,
    FROM (SELECT ARRAY_AGG(mt LIMIT 1)[ORDINAL(1)].*
          FROM mappingTable mt
          WHERE mt.data_installed_version IS NOT NULL
          GROUP BY LOWER(mt.mapping_modelname)
         ) mt
    WHERE LOWER(mt.mapping_modelname) = LOWER(dt.data_modelname)
    AND
    data_version_Installed = mapping_version_Installed

Actually I can exclude an error in the tables: mapping values are correct and data-type is also correct (date = DATE and version = STRING) in both tables. Also comparing the strings with LIKE instead of = did not result in more matching values. Maybe the GROUP BY statement omits many possible installed_versions.

I am very happy and thankful, if you have some advice.

Edit, SOLVED:

It was possible to use the problem with simple SQL WHERE:

WHERE LOWER(mt.mapping_model) = LOWER(FINAL_modelName)
AND
data_date_installed = mt.mapping_date_installed

标签: sqlgoogle-bigquerysql-update

解决方案


Use aggregation to get one row per LOWER(model_name):

UPDATE dataTable dt
    SET dt.data_date = mt.data_date,
        dt.data_version = mt.data_version
    FROM (SELECT ARRAY_AGG(mt LIMIT 1)[ORDINAL(1)].*
          FROM mappingTable mt
          WHERE mt.data_version IS NOT NULL
          GROUP BY LOWER(mt.mapping_modelname)
         ) mt
    WHERE LOWER(mt.mapping_modelname) = LOWER(dt.data_modelname)

推荐阅读