javascript - 错误:右双引号 (") 和字段分隔符之间的数据
问题描述
我正在尝试使用 Google Apps 脚本从 Google Drive 中获取 CSV 并将其放入 Big Query。当我上传时,我收到此错误:
“读取数据时出错,错误消息:解析从位置开始的行时检测到错误:560550。错误:关闭双引号(”)和字段分隔符之间的数据。”
我试过查看文件的那个字节位置及其在 CSV 范围之外的方式(它只达到 ~501500 字节)。
这是我正在使用的 CSV 的链接,它是一个网站的抓取:https ://drive.google.com/file/d/1k3cGlTSA_zPQCtUkt20vn6XKiLPJ7mFB/view?usp=sharing
这是我的相关代码:
function csvToBigQuery(exportFolder, csvName, bqDatasetId){
try{
//get most recent export from Screaming Frog
var mostRecentFolder = [];
while(exportFolder.hasNext()){
var folder = exportFolder.next();
var lastUpdated = folder.getLastUpdated();
if(mostRecentFolder.length == 0)
mostRecentFolder = [folder.getLastUpdated(),folder.getId()];
else if(lastUpdated > mostRecentFolder[0])
mostRecentFolder = [lastUpdated, folder.getId()];
}
var folderId = mostRecentFolder[1];
var file = DriveApp.getFolderById(folderId).getFilesByName(csvName + '.csv').next();
if(!file)
throw "File doesn't exist";
//get csv and add date column.
//getBlob().getDataAsString().replace(/(["'])(?:(?=(\\?))\2[\s\S])*?\1/g, function(e){return e.replace(/\r?\n|\r/g, ' ')})
var rows = Utilities.parseCsv(file.getBlob().getDataAsString());
Logger.log(rows);
var numColumns = rows[0].length;
rows.forEach(function(row){
row[numColumns] = date;
});
rows[0][numColumns] = 'Date';
let csvRows = rows.map(values =>values.map(value => JSON.stringify(value).replace(/\\"/g, '""')));
let csvData = csvRows.map(values => values.join(',')).join('\n');
//log(csvData)
var blob = Utilities.newBlob(csvData, 'application/octet-stream');
//create job for inserting to BQ.
var loadJob = {
configuration: {
load: {
destinationTable: {
projectId: bqProjectId,
datasetId: bqDatasetId,
tableId: csvName
},
autodetect: true, // Infer schema from contents.
writeDisposition: 'WRITE_APPEND',
}
}
};
//append to table in BQ.
BigQuery.Jobs.insert(loadJob, bqProjectId, blob);
}catch(e){
Logger.log(e);
}
}
解决方案
修改点:
从您的错误消息中,我认为可能存在未包含在双重配额中的部分。因此,我搜索了当我看到您的 CSV 数据并将您的 CSV 数据替换\"(|.+?)\"
为""
使用以下脚本时,发现第 711 行具有该值。
function sample() {
var id = "###"; // File ID of your CSV file.
// This is your script.
var file = DriveApp.getFileById(id);
var rows = Utilities.parseCsv(file.getBlob().getDataAsString());
var numColumns = rows[0].length;
var date = "sample";
rows.forEach(function(row){
row[numColumns] = date;
});
rows[0][numColumns] = 'Date';
let csvRows = rows.map(values =>values.map(value => JSON.stringify(value).replace(/\\"/g, '""')));
let csvData = csvRows.map(values => values.join(',')).join('\n');
// I added below script for checking your CSV data.
var res = csvData.replace(/\"(|.+?)\"/g, "");
DriveApp.createFile("sample.txt", res);
}
行711如下。
"https://supergoop.com/products/lip-shield-trio/?utm_source=Gorgias&utm_medium=CustomerCare&utm_campaign=crosssellhello\","text/html; charset=utf-8","200","OK","Non-Indexable","Canonicalised","Lip Shield Trio - Restores, Protects + Water-resistant – Supergoop!","67","595","Moisturizing lip protection made from antioxidant-rich coconut, avocado, and grape seed oil.","92","576","","0","Lip Shield Trio","15","Lip Shield Trio","15","Why We Love It","14","Ingredients","11","","","","https://supergoop.com/products/lip-shield-trio","","","","","451488","754","1.686","5","","12","4","0.590","205","80","8","5","","","","","f6d1476960d22b1c5964581e161bdd49","0.064","","","","","HTTP/1.1","https://supergoop.com/products/lip-shield-trio/?utm_source=Gorgias&utm_medium=CustomerCare&utm_campaign=crosssellhello%5C"
从这个值,我发现它\"
被用于"https://supergoop.com/products/lip-shield-trio/?utm_source=Gorgias&utm_medium=CustomerCare&utm_campaign=crosssellhello\"
。我认为您的问题的原因可能是由于这个。
那么为了避免这个问题,下面的修改呢?
修改后的脚本:
从:let csvRows = rows.map(values =>values.map(value => JSON.stringify(value).replace(/\\"/g, '""')));
到:
let csvRows = rows.map(values =>values.map(value => JSON.stringify(value).replace(/\\"/g, '""').replace(/\\"/g, '')));
或者
从:var rows = Utilities.parseCsv(file.getBlob().getDataAsString());
到:
var rows = Utilities.parseCsv(file.getBlob().getDataAsString().replace(/\\/g, ''));
- 通过这种修改,我可以确认文件大小在您的脚本和修改后的脚本之间减少了 2 个字节。而且,当使用修改后的脚本将上述检查脚本用于 CSV 数据时,我可以确认所有行都没有值。
推荐阅读
- swift - Swift 中同一个文件中的 `private` 和 `fileprivate` 类有什么区别?
- javascript - 如何将 chess.js 导入 Create React App
- python - Matplotlib 注释更改字体
- javascript - 如果输入值为空或 null,如何防止函数运行?香草 JavaScript
- c# - 如何让两个物体在碰撞时粘在一起?(统一 3d) C#
- java - 快速 Java 匿名函数/类问题
- ios - 带有链接的 UITextView:VoiceOver 如何选择要点击的链接?
- html - 为什么子元素不相对于其父元素移动?
- java - Jetty如何处理Threads和ThreadPools,它占用了太多内存
- uwp - 打包一套 UWP 应用