首页 > 解决方案 > 如何解析存储在我的谷歌驱动器中但突出为 html 类型的 XML 文件?

问题描述

如何解析存储在我的谷歌驱动器中但突出为 html 类型的 XML 文件?!

我在我的 google Drive 云上保存了一份源 xml 的副本:http ://api.allocine.fr/rest/v3/movie?media=mp4-lc&partner=YW5kcm9pZC12Mg&profile=large&version=2&code=265621 我可以解析源但我无法解析看起来像 html 类型的副本!我有解析错误,例如:元素类型“meta”必须由匹配的结束标记“”终止或元素类型“a.length”必须后跟属性规范,“>”或“/>”我分享了它在https://drive.google.com/file/d/16kJ5Nko-waVb8s2T12LaTEKaFY01603n/view?usp=sharing上为您提供访问权限并测试我的脚本。我知道我可以使用 cacheService 并且它可以工作但是对于缓冲的其他控制我会尝试这种方式

function xmlParsingXmlStoreOnGoogleDrive(){
     //So , this is the original xml that is good parsed
 var fetched=UrlFetchApp.fetch("http://api.allocine.fr/rest/v3/movie?media=mp4-lc&partner=YW5kcm9pZC12Mg&profile=large&version=2&code=265621")
 var blob=fetched.getBlob();
 var getAs=blob.getAs("text/xml")
 var data=getAs.getDataAsString("UTF-8")
 Logger.log(data.substring(1,350)); // substring to not saturate the debug display this expected code XML:
 /*
    ?xml version="1.0" encoding="utf-8"?>
    <!-- Copyright © 2019 AlloCiné -->
    <movie code="265621" xmlns="http://www.allocine.net/v6/ns/">
    <movieType code="4002">Long-métrage</movieType>
    <originalTitle>Mise à jour sur Google play</originalTitle>
    <title>Mise à jour sur Google play</title>
    <keywords>Portrait of a Lady on Fire </keywords>
 */
 var xmlDocument=XmlService.parse(data);
 var root=xmlDocument.getRootElement();
 var keywords=root.getChild("keywords",root.getNamespace()).getText();
 Logger.log(keywords);  // Display the expected result :"Portrait of a Lady on Fire "

 // And this my copie of the original xml, that i can't parsing
 var fetched=UrlFetchApp.fetch("https://drive.google.com/file/d/1K3-9dHy-h0UoOOY5jYfiSoYPezSi55h1/view?usp=sharing")
 var blob=fetched.getBlob();
 var getAs=blob.getAs("text/xml")
 var data=getAs.getDataAsString("UTF-8")
 Logger.log(data.substring(1,350)); // substring to not saturate the debug display this non expected code HTML !:
 /*
   !DOCTYPE html><html><head><meta name="google" content="notranslate"><meta http-equiv="X-UA-Compatible" content="IE=edge;">
   <style>@font-face{font-family:'Roboto';font-style:italic;font-weight:400;src:local('Roboto Italic'),local('Roboto-Italic'),
   url(//fonts.gstatic.com/s/roboto/v18/KFOkCnqEu92Fr1Mu51xIIzc.ttf)format('truetype');}@font-face{font-fam......
 */
 var xmlDocument=XmlService.parse(data); // ABORT WITH THE ERROR: Element type "a.length" must be followed by either attribute specifications, ">" or "/>"
 var root=xmlDocument.getRootElement();
 var keywords=root.getChild("keywords",root.getNamespace()).getText();
 Logger.log(keywords);
}

我读过这个类似的问题:Parse XML file (which is stored on GoogleDrive) with Google app script

那“不幸的是,我们不能直接在谷歌驱动器中获取xml文件”!是否正确,这是否意味着我无法实现我的脚本?

标签: google-apps-scriptxml-parsinggoogle-drive-api

解决方案


  • You want to retrieve the data from the file on Google Drive and parse as XML data using XmlService.
  • You want to achieve this using Google Apps Script.

If my understanding is correct, how about this answer?

Modification points:

  • About var fetched=UrlFetchApp.fetch("https://drive.google.com/file/d/16kJ5Nko-waVb8s2T12LaTEKaFY01603n/view?usp=sharing"), in this case, the file content cannot be retrieved from this endpoint. If you want to retrieve the file content with UrlFetchApp, please use the endpoint of https://drive.google.com/uc?id=16kJ5Nko-waVb8s2T12LaTEKaFY01603n&export=download. This is webContentLink.
  • When the file is in your Google Drive and/or shared publicly, you can retrieve the data with the script of DriveApp.getFileById(fileId).getBlob().getDataAsString().

Modified script:

For example, when your shared sample file of https://drive.google.com/file/d/16kJ5Nko-waVb8s2T12LaTEKaFY01603n/view?usp=sharing is used, the script becomes as follows.

Sample script 1:

In this pattern, the file content is retrieved from your shared file with UrlFetchApp.fetch().

var data = UrlFetchApp.fetch("https://drive.google.com/uc?id=16kJ5Nko-waVb8s2T12LaTEKaFY01603n&export=download").getContentText(); // Modified
var xmlDocument=XmlService.parse(data);
var root=xmlDocument.getRootElement();
var keywords=root.getChild("keywords",root.getNamespace()).getText();
Logger.log(keywords); // <--- You can see "Portrait of a Lady on Fire" at log.
  • In this case, the script is required to be shared publicly. If you want to retrieve the file content without sharing, please use the access token for requesting.

Sample script 2:

In this pattern, the file content is retrieved from your shared file with DriveApp.getFileById().

var fileId = "16kJ5Nko-waVb8s2T12LaTEKaFY01603n"; // Added
var data = DriveApp.getFileById(fileId).getBlob().getDataAsString(); // Added
var xmlDocument=XmlService.parse(data);
var root=xmlDocument.getRootElement();
var keywords=root.getChild("keywords",root.getNamespace()).getText();
Logger.log(keywords); // <--- You can see "Portrait of a Lady on Fire" at log.
  • 16kJ5Nko-waVb8s2T12LaTEKaFY01603n of https://drive.google.com/file/d/16kJ5Nko-waVb8s2T12LaTEKaFY01603n/view?usp=sharing is the file ID.
  • In this case, the file is not required to be shared. But the file is required to be in your Google Drive.

References:

  • Files of Drive API
    • webContentLink: A link for downloading the content of the file in a browser using cookie based authentication. In cases where the content is shared publicly, the content can be downloaded without any credentials.
  • getFileById(id)

If I misunderstood your question and this was not the direction you want, I apologize.


推荐阅读