首页 > 解决方案 > 当我尝试解析 mysql 数据库中的数据库时,Simple.pm 中的错误

问题描述

我正在使用以下脚本来解析我的数据库中的数据库。

很少有人询问输入。这是一个大文件,我不能把它全部粘贴在这里,你能检查一下这个http://www.unimod.org/xml/unimod.xml 如果没有,你能给我一个选项把它粘贴到我可以的地方吗与您分享?我尝试在这里粘贴一些输入

GIST acetyl light PT and GIST acetyl light O-acetyl glyoxal-derived hydroimidazolone AA0048 RESID AA0049 RESID AA0041 RESID AA0052 RESID AA0364 RESID AA0056 RESID AA0046 RESID AA0051 RESID AA0045 RESID AA0354 RESID AA0044 RESID AA0043 RESID 11999733 PubMed PMID Chemical Reagents for Protein Modification 3rd edition, pp 215-221, Roger L. Lundblad, CRC Press, New York, N.Y., 2005 Book IonSource acetylation tutorial Misc. URL http://www.ionsource.com/Card/acetylation/acetylation.htm AA0055 RESID 14730666 PubMed PMID 15350136 PubMed PMID AA0047 RESID 12175151 PubMed PMID 11857757 PubMed PMID AA0042 RESID AA0050 RESID AA0053 RESID AA0054 RESID ACET FindMod PNAS 2006 103: 18574-18579 Journal http://dx.doi.org/10.1073/pnas.0608995103 MS/MS experiments of mass spectrometric c-ions (MS^3) can be used for protein identification by library searching. T3-sequencing is such a technique (see reference). Search engines must recognize this as a virtual modification. Top-Down sequencing c-type fragment ion AA0088 RESID AA0087 RESID AA0086 RESID AA0085 RESID AA0084 RESID AA0083 RESID AA0082 RESID AA0081 RESID AA0089 RESID AA0090 RESID AA0091 RESID AA0092 RESID AA0093 RESID AA0094 RESID AA0095 RESID AA0096 RESID AA0097 RESID AA0098 RESID AA0099 RESID AA0100 RESID AMID FindMod 14588022 PubMed PMID AA0117 RESID BIOT FindMod Carboxyamidomethylation 11510821 PubMed PMID 12422359 PubMed PMID Boja, E. S., Fales, H. M., Anal. Chem. 73 3576-82 (2001) Journal Creasy, D. M., Cottrell, J. S., Proteomics 2 1426-34 (2002) Journal 12203680 PubMed PMID Stark; Modification of proteins with cyanate. Meth Enz 25B, 579-584 (1972) Journal AA0343 RESID 10978403 PubMed PMID AA0332 RESID Smyth; Carbamylation of amino and tyrosine hydroxyl groups. J Biol Chem 242, 1579-1591 (1967) Journal IonSource carbamylation tutorial Misc. URL http://www.ionsource.com/Card/carbam/carbam.htm Carbamylation is an irreversible process of non-enzymatic modification of proteins by the breakdown products of urea isocyanic acid reacts with the N-term of a proteine or side chains of lysine and arginine residues Hydroxylethanone Carboxymethylation Protein which is post-translationally modified by the de-imination of one or more arginine residues; Peptidylarginine deiminase (PAD) converts protein bound to citrulline Convertion of glycosylated asparagine residues upon deglycosylation with PNGase F in H2O phenyllactyl from N-term Phe Citrullination FLAC FindMod AA0128 RESID CITR FindMod IonSource

我收到这个错误

/srv/myscr/script/../extern/cpan/lib/perl5/XML/Simple.pm 第 391 行第 13 行第 3 列第 569 字节的不匹配标记

我用来解析数据的代码如下,如果有人能告诉我为什么会收到这样的错误以及如何修复它,我将不胜感激。

添加代码后,我收到以下错误

Fetching unimod.xml from unimod web site
Connecting to pipeline database
Emptying modifications table
Parsing XML
mismatched tag at line 13, column 3, byte 569 at /srv/myscr/script/../extern/cpan/lib/perl5/XML/Simple.pm line 39

标签: xmlperl

解决方案


为了将来参考,这里是您的代码的精简版本,足以证明问题。这是您应该在原始问题中向我们展示的内容。

use strict;
use warnings;

use XML::Simple;
use LWP::UserAgent;

print "Fetching unimod.xml from unimod web site\n";

# Retrieve latest xml version of Unimod from the website
my $ua = LWP::UserAgent->new();
$ua->env_proxy();
my $response = $ua->get( "http://www.unimod.org/xml/unimod.xml" );

my $xml = $response->content;

print "Parsing XML\n";

# Use XML::Simple DOM parser - Okay as unimod.xml is small
# Force specificity and neutral losses into an array to simplify code
my $xs = new XML::Simple(
    KeyAttr    => { "umod:mod" => "+title" },
    ForceArray => [ "umod:specificity", "umod:NeutralLoss" ]
);
my $ref = $xs->XMLin( $xml );

看看我是如何消除所有关于配置文件或更新数据库的干扰的。它只是从网站上抓取 XML 并对其进行解析。

坏消息是,对我来说,这很好用。它解析 XML 而不抛出任何错误。作为参考,我使用的是 XML::Simple 版本 2.25 和 Perl 5.26.2。

了解该程序在运行时是否给出与原始代码相同的错误会很有用。

正如评论中提到的,看看您实际上从网站获得了什么 XML 也会很有趣。您可以通过获取$xml变量并将其内容写入文件来获得它:

open my $xml_fh, '>', 'test.xml' or die $!;
print $xml_fh $xml;

然后,一旦您运行了代码,您将拥有一个名为的文件test.xml,其中包含网站提供给您的 XML。您可以检查该文件的第 13 行以确定错误是什么。

对于它的价值,我怀疑您出于某种原因没有恢复 XML。我怀疑您的网络上的代理或网站本身阻止了您自动提取数据的尝试,并向您返回 404 或 503 HTML 页面。这只是一个猜测,除非您运行我上面建议的测试,否则我们无法确定。


推荐阅读