首页 > 解决方案 > BigQuery 向定义的 STRING 添加不必要的小数

问题描述

我在 BigQuery 中定义了一个架构,如下所示:

+------------------+----------+----------+
|       name       |   type   |   mode   |
+------------------+----------+----------+
| warehouse        | INTEGER  | NULLABLE |
| transaction_date | DATETIME | NULLABLE |
| style            | STRING   | NULLABLE |
| piece            | STRING   | NULLABLE |
| fabric_1         | STRING   | NULLABLE |
| fabric_2         | STRING   | NULLABLE |
| serial           | STRING   | NULLABLE |
| customer_po      | STRING   | NULLABLE |
| order_number     | STRING   | NULLABLE |
+------------------+----------+----------+

我关注的两个字段是 serial 和 order_number,在 R 中预览时,如下所示:

+-----------+------------------+--------+-------+-----------+----------+------------+--------------+--------------+
| warehouse | transaction_date | style  | piece | fabric_1  | fabric_2 |   serial   | customer_po  | order_number |
+-----------+------------------+--------+-------+-----------+----------+------------+--------------+--------------+
|        80 | 4/3/19           | K28300 | ARMH  | ALL CHAR  | NA       | 8040418253 | 1486838165   |       464374 |
|        80 | 4/3/19           | K28300 | ARMH  | ALL CHAR  | NA       | 9040542252 | 1485798731-P |       464069 |
|        80 | 4/3/19           | K28300 | ARMH  | ELEG NAVY | NA       | 8040355550 | 1486826068   |       464369 |
|        80 | 4/3/19           | K28300 | ARMH  | ELEG NAVY | NA       | 8040532364 | 1485366411-R |       464071 |
+-----------+------------------+--------+-------+-----------+----------+------------+--------------+--------------+

在 R 中,这两个字段似乎被读取为我正在上传的数据框中的字符,这正是我正在寻找的。然而,当我将数据推送到 BigQuery 时,这两个字段最终会变成这样:

   +-----------+------------------+--------+-------+-----------+----------+------------+--------------+--------------+
    | warehouse | transaction_date | style  | piece | fabric_1  | fabric_2 |   serial   | customer_po  | order_number |
    +-----------+------------------+--------+-------+-----------+----------+------------+--------------+--------------+
    |        80 | 4/3/19           | K28300 | ARMH  | ALL CHAR  | NA       | 8040418253.0 | 1486838165   |       464374.0 |
    |        80 | 4/3/19           | K28300 | ARMH  | ALL CHAR  | NA       | 9040542252.0 | 1485798731-P |       464069.0 |
    |        80 | 4/3/19           | K28300 | ARMH  | ELEG NAVY | NA       | 8040355550.0 | 1486826068   |       464369.0 |
    |        80 | 4/3/19           | K28300 | ARMH  | ELEG NAVY | NA       | 8040532364.0 | 1485366411-R |       464071.0 |
    +-----------+------------------+--------+-------+-----------+----------+------------+--------------+--------------+

为什么会发生这种情况,我该如何改变它?作为参考,我上传它的代码:

bqr_upload_data(projectId = "project-test", 
                datasetId = "orders", 
                tableId = "daily_orders", 
                upload_data = df_daily_orders,
                maxBadRecords = 1000,
                overwrite = TRUE)

标签: rgoogle-bigquery

解决方案


现在我的答案并不完全确定,因为我仍然是初学者,但它可能会对你有所帮助。我会将其添加为评论,但我还没有足够的声誉。

如果我理解正确,您实际上是在进行隐式转换 - 从数值到字符串值,BigQuery 正在捕捉小数点,以确保它正确捕捉到整个值

在此处查看BigQuery 的转换规则 - 第二个表,FLOAT64 到字符串。

在您的位置上,根据您需要对桌子做什么 - 我会:

  1. 重新创建表,但将 serial 和 order_number 列的架构更改为整数类型

或者

  1. 尝试使用更新查询更新已创建的表 - 并在每个字符串值的末尾修改“.0”

推荐阅读