首页 > 解决方案 > 错误:右双引号 (") 和字段分隔符之间的数据

问题描述

我正在尝试使用 Google Apps 脚本从 Google Drive 中获取 CSV 并将其放入 Big Query。当我上传时,我收到此错误:

“读取数据时出错,错误消息:解析从位置开始的行时检测到错误:560550。错误:关闭双引号(”)和字段分隔符之间的数据。”

我试过查看文件的那个字节位置及其在 CSV 范围之外的方式(它只达到 ~501500 字节)。

这是我正在使用的 CSV 的链接,它是一个网站的抓取:https ://drive.google.com/file/d/1k3cGlTSA_zPQCtUkt20vn6XKiLPJ7mFB/view?usp=sharing

这是我的相关代码:

 function csvToBigQuery(exportFolder, csvName, bqDatasetId){
  try{
    //get most recent export from Screaming Frog
    var mostRecentFolder = [];    
    while(exportFolder.hasNext()){
      var folder = exportFolder.next();
      var lastUpdated = folder.getLastUpdated();
      if(mostRecentFolder.length == 0)
        mostRecentFolder = [folder.getLastUpdated(),folder.getId()];
      else if(lastUpdated > mostRecentFolder[0])
        mostRecentFolder = [lastUpdated, folder.getId()];
    }    
    var folderId = mostRecentFolder[1];
    var file = DriveApp.getFolderById(folderId).getFilesByName(csvName + '.csv').next();
    
    if(!file)
      throw "File doesn't exist";
    
    //get csv and add date column.
    //getBlob().getDataAsString().replace(/(["'])(?:(?=(\\?))\2[\s\S])*?\1/g, function(e){return e.replace(/\r?\n|\r/g, ' ')})
    var rows = Utilities.parseCsv(file.getBlob().getDataAsString());
    Logger.log(rows);
    var numColumns = rows[0].length;    
    
    rows.forEach(function(row){
      row[numColumns] = date;
    });
    rows[0][numColumns] = 'Date';
    
    let csvRows = rows.map(values =>values.map(value => JSON.stringify(value).replace(/\\"/g, '""')));
    let csvData = csvRows.map(values => values.join(',')).join('\n');
    //log(csvData)
    var blob = Utilities.newBlob(csvData, 'application/octet-stream');
    
    
    //create job for inserting to BQ.
    var loadJob = {
      configuration: {
        load: {
          destinationTable: {
            projectId: bqProjectId,
            datasetId: bqDatasetId,
            tableId: csvName
          },
          autodetect: true,  // Infer schema from contents.
          writeDisposition: 'WRITE_APPEND',
        }
      }
    };
    
    //append to table in BQ.
    BigQuery.Jobs.insert(loadJob, bqProjectId, blob);
    
    
  }catch(e){
    Logger.log(e); 
  }
}

标签: javascriptcsvgoogle-apps-scriptgoogle-bigquery

解决方案


修改点:

从您的错误消息中,我认为可能存在未包含在双重配额中的部分。因此,我搜索了当我看到您的 CSV 数据并将您的 CSV 数据替换\"(|.+?)\"""使用以下脚本时,发现第 711 行具有该值。

function sample() {
  var id = "###";  // File ID of your CSV file.

  // This is your script.
  var file = DriveApp.getFileById(id);
  var rows = Utilities.parseCsv(file.getBlob().getDataAsString());
  var numColumns = rows[0].length;
  var date = "sample";
  rows.forEach(function(row){
    row[numColumns] = date;
  });
  rows[0][numColumns] = 'Date';
  let csvRows = rows.map(values =>values.map(value => JSON.stringify(value).replace(/\\"/g, '""')));
  let csvData = csvRows.map(values => values.join(',')).join('\n');
  
  // I added below script for checking your CSV data.
  var res = csvData.replace(/\"(|.+?)\"/g, "");
  DriveApp.createFile("sample.txt", res);
}

行711如下。

"https://supergoop.com/products/lip-shield-trio/?utm_source=Gorgias&utm_medium=CustomerCare&utm_campaign=crosssellhello\","text/html; charset=utf-8","200","OK","Non-Indexable","Canonicalised","Lip Shield Trio - Restores, Protects + Water-resistant – Supergoop!","67","595","Moisturizing lip protection made from antioxidant-rich coconut, avocado, and grape seed oil.","92","576","","0","Lip Shield Trio","15","Lip Shield Trio","15","Why We Love It","14","Ingredients","11","","","","https://supergoop.com/products/lip-shield-trio","","","","","451488","754","1.686","5","","12","4","0.590","205","80","8","5","","","","","f6d1476960d22b1c5964581e161bdd49","0.064","","","","","HTTP/1.1","https://supergoop.com/products/lip-shield-trio/?utm_source=Gorgias&utm_medium=CustomerCare&utm_campaign=crosssellhello%5C"

从这个值,我发现它\"被用于"https://supergoop.com/products/lip-shield-trio/?utm_source=Gorgias&utm_medium=CustomerCare&utm_campaign=crosssellhello\"。我认为您的问题的原因可能是由于这个。

那么为了避免这个问题,下面的修改呢?

修改后的脚本:

从:
let csvRows = rows.map(values =>values.map(value => JSON.stringify(value).replace(/\\"/g, '""')));
到:
let csvRows = rows.map(values =>values.map(value => JSON.stringify(value).replace(/\\"/g, '""').replace(/\\"/g, '')));

或者

从:
var rows = Utilities.parseCsv(file.getBlob().getDataAsString());
到:
var rows = Utilities.parseCsv(file.getBlob().getDataAsString().replace(/\\/g, ''));
  • 通过这种修改,我可以确认文件大小在您的脚本和修改后的脚本之间减少了 2 个字节。而且,当使用修改后的脚本将上述检查脚本用于 CSV 数据时,我可以确认所有行都没有值。

推荐阅读