首页 > 解决方案 > 如何在 Nim 中解析 JSON(长整数问题)?

问题描述

我正在处理 Nim 中的一段代码,它从 Shodan API 中提取一个 JSON 对象。这是来自 Shodan 的完整 JSON 字符串:

{"city": "Alverca", "region_code": "14", "os": null, "tags": ["self-signed"], "ip": 1435234906, "isp": "Nos Comunicacoes, S.A.", "area_code": null, "dma_code": null, "last_update": "2019-11-01T17:56:18.470438", "country_code3": "PRT", "country_name": "Portugal", "hostnames": ["a85-139-242-90.static.cpe.netcabo.pt"], "postal_code": "2619-510", "longitude": -9.038600000000002, "country_code": "PT", "ip_str": "85.139.242.90", "latitude": 38.899, "org": "ZON Tv Cabo", "data": [{"_shodan": {"id": null, "options": {}, "ptr": true, "module": "telnet", "crawler": "82488cbcb7dd25da13f728d04775390417d9ee4e"}, "hash": 1329569225, "os": null, "opts": {"telnet": {"will": ["SGA", "STATUS", "ECHO"], "do": ["TTYPE", "TSPEED", "XDISPLOC", "NEW_ENVIRON", "ECHO", "NAWS", "LFLOW"], "dont": [], "wont": []}}, "ip": 1435234906, "isp": "Nos Comunicacoes, S.A.", "port": 23, "hostnames": ["a85-139-242-90.static.cpe.netcabo.pt"], "location": {"city": "Alverca", "region_code": "14", "area_code": null, "longitude": -9.038600000000002, "country_code3": "PRT", "country_name": "Portugal", "postal_code": "2619-510", "dma_code": null, "country_code": "PT", "latitude": 38.899}, "timestamp": "2019-11-01T17:56:18.470438", "domains": ["netcabo.pt"], "org": "ZON Tv Cabo", "data": "\r\nBODET PUNCHING BOARD\r\nLinux/ppc 2.4.20_mvl31-BODET_V1.1B2\r\n\r\nWelcome to 172.17.30.99\r\nFri Nov  1 17:53:58 2019\r\nTech-code: ", "asn": "AS2860", "transport": "tcp", "ip_str": "85.139.242.90"}, {"_shodan": {"id": "7afc2cf1-2b4a-4074-9343-cd576d240364", "options": {}, "ptr": true, "module": "https", "crawler": "0636e1e6dd371760aeaf808ed839236e73a9e74d"}, "hash": 1484578305, "os": null, "tags": ["self-signed"], "opts": {"vulns": [], "heartbleed": "2019/10/31 06:28:03 85.139.242.90:443 - SAFE\n"}, "ip": 1435234906, "isp": "Nos Comunicacoes, S.A.", "http": {"html_hash": -468632088, "robots_hash": null, "redirects": [], "securitytxt": null, "title": "", "sitemap_hash": null, "robots": null, "favicon": null, "host": "85.139.242.90", "html": "<!DOCTYPE html>\n<html>\n<head>\n<meta charset=\"UTF-8\">\n<title></title>\n</head>\n<body>\n<script>location.href = \"./home/index.html\";</script>\n</body>\n</html>", "location": "/", "components": {}, "server": null, "sitemap": null, "securitytxt_hash": null}, "port": 443, "ssl": {"dhparams": null, "tlsext": [{"id": 65281, "name": "renegotiation_info"}], "versions": ["TLSv1", "-SSLv2", "-SSLv3", "TLSv1.1", "TLSv1.2", "-TLSv1.3"], "acceptable_cas": [], "cert": {"sig_alg": "sha256WithRSAEncryption", "issued": "20170520021607Z", "expires": "20250806021607Z", "expired": false, "version": 2, "extensions": [{"data": "\\x03\\x02\\x01\\xa6", "name": "keyUsage"}, {"critical": true, "data": "0\\x03\\x01\\x01\\xff", "name": "basicConstraints"}, {"data": "\\x16\\x1fSelf Signed Certificate(System)", "name": "nsComment"}, {"data": "0\\x10\\x82\\x0eXC8f45bb.local", "name": "subjectAltName"}], "fingerprint": {"sha256": "317aadb5fb5ddaf97232cdfb8c4a8da23d2f3f11f7229f028235f6545d08ef1f", "sha1": "3d2a2dcdb25b76b3ddddc740c2e4660ff07009d5"}, "serial": 46474876880932987910930945182556062189, "subject": {"CN": "XC-8F45BB"}, "pubkey": {"type": "rsa", "bits": 2048}, "issuer": {"CN": "XC-8F45BB"}}, "cipher": {"version": "TLSv1/SSLv3", "bits": 256, "name": "AES256-SHA256"}, "chain": ["-----BEGIN CERTIFICATE-----\nMIIDHTCCAgWgAwIBAgIQIva8VyQosXPBl/OnC/WB7TANBgkqhkiG9w0BAQsFADAU\nMRIwEAYDVQQDEwlYQy04RjQ1QkIwHhcNMTcwNTIwMDIxNjA3WhcNMjUwODA2MDIx\nNjA3WjAUMRIwEAYDVQQDEwlYQy04RjQ1QkIwggEiMA0GCSqGSIb3DQEBAQUAA4IB\nDwAwggEKAoIBAQC7ypFTTvDMJ0wYR0LGFOOJf/g6CyRFqJvAmtY0SZKw8EOXC365\n+ajGtJQ0qcsOqmFEFUmC5J0dUsuljbkqECx9cnVtXLtUUQ8pPfTz7Tphz+0zB/KS\nbG7NdrjWbHhVikPLCMrna6cxbI+d1vWA9NoLty02x1fpR8MH9SEqHlO89KbPaDwo\nmw6gjwNS+ImBnF6yzfslUQkcR3J3KGfCrNWsP+mYl7yx4+Awk3wI6vwkUpWmJX+T\nTEUV8rrTSyrHocc7hDYTN/bg5FgUsMLwuuHkEg+JzBTEmdVp0mI0Liq9B/hoVpKz\niX1si/yYkdqKQgNP4SALOqFdmB0+nkqN7rYzAgMBAAGjazBpMAsGA1UdDwQEAwIB\npjAPBgNVHRMBAf8EBTADAQH/MC4GCWCGSAGG+EIBDQQhFh9TZWxmIFNpZ25lZCBD\nZXJ0aWZpY2F0ZShTeXN0ZW0pMBkGA1UdEQQSMBCCDlhDOGY0NWJiLmxvY2FsMA0G\nCSqGSIb3DQEBCwUAA4IBAQCC8CGt0dtiRn6e79Rtjpr383RJdk2d8VfFbQSWj0Ct\nzZUdgktJiQR9+cNKYCoHvJ8E4mm1sb+Wgz2/CrP+7J8ZNRsb8UOabwrREeBvz0wl\nwiIwmrnuCYKZ8AMIEI4f3BmXVSz5baIFTHWWCuS22np5jz8bpYYKLIK4Pc6r+sEf\nfhd7H6YAPEPqAMlC/UTicDmXHKqKbLFDTHNyKiouO3DGFqpNDd4zOWsyDrHkbl91\nVAk6xEPha5Y0QyIlpkfcIAG0e/VxgzMxfiGPSV2kxgaVq+wbNq9T61GsXZ4ZD00L\nj8Q+YW28opH0OZe1h1V8uTytGnKnt295Z1X7hEae04XQ\n-----END CERTIFICATE-----\n"], "alpn": []}, "hostnames": ["a85-139-242-90.static.cpe.netcabo.pt"], "location": {"city": "Alverca", "region_code": "14", "area_code": null, "longitude": -9.038600000000002, "country_code3": "PRT", "country_name": "Portugal", "postal_code": "2619-510", "dma_code": null, "country_code": "PT", "latitude": 38.899}, "timestamp": "2019-10-31T05:27:57.891394", "domains": ["netcabo.pt"], "org": "ZON Tv Cabo", "data": "HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nContent-Length: 153\r\nX-Frame-Options: SAMEORIGIN\r\nX-Content-Type-Options: nosniff\r\nX-XSS-Protection: 1; mode=block\r\n\r\n", "asn": "AS2860", "transport": "tcp", "ip_str": "85.139.242.90"}, {"_shodan": {"id": "921aea7c-4258-40f4-90b0-73088269f39b", "options": {}, "ptr": true, "module": "rsync", "crawler": "339d3eded941e01ca426596e93f3fdf4c9346ccd"}, "product": "rsyncd", "hash": 1601166835, "version": "26", "opts": {}, "ip": 1435234906, "isp": "Nos Comunicacoes, S.A.", "os": null, "rsync": {"authentication": false, "modules": {"punching": "Punching home", "root": "Root filesystem"}}, "port": 873, "hostnames": ["a85-139-242-90.static.cpe.netcabo.pt"], "location": {"city": "Alverca", "region_code": "14", "area_code": null, "longitude": -9.038600000000002, "country_code3": "PRT", "country_name": "Portugal", "postal_code": "2619-510", "dma_code": null, "country_code": "PT", "latitude": 38.899}, "timestamp": "2019-10-30T12:11:50.048579", "domains": ["netcabo.pt"], "org": "ZON Tv Cabo", "data": "@RSYNCD: 26\nroot           \tRoot filesystem\npunching       \tPunching home\n@RSYNCD: EXIT\n", "asn": "AS2860", "transport": "tcp", "ip_str": "85.139.242.90"}, {"_shodan": {"id": null, "options": {}, "ptr": true, "module": "whois", "crawler": "122dd688b363c3b45b0e7582622da1e725444808"}, "hash": -1288910848, "os": null, "opts": {}, "ip": 1435234906, "isp": "Nos Comunicacoes, S.A.", "port": 43, "hostnames": ["a85-139-242-90.static.cpe.netcabo.pt"], "location": {"city": "Alverca", "region_code": "14", "area_code": null, "longitude": -9.038600000000002, "country_code3": "PRT", "country_name": "Portugal", "postal_code": "2619-510", "dma_code": null, "country_code": "PT", "latitude": 38.899}, "timestamp": "2019-10-28T18:52:53.093633", "domains": ["netcabo.pt"], "org": "ZON Tv Cabo", "data": "676478697\n", "asn": "AS2860", "transport": "tcp", "ip_str": "85.139.242.90"}, {"_shodan": {"id": "99dd6dfe-b491-4691-8b62-c8957bb045e2", "options": {}, "ptr": true, "module": "http-simple-new", "crawler": "122dd688b363c3b45b0e7582622da1e725444808"}, "hash": 1240885964, "os": null, "opts": {}, "ip": 1435234906, "isp": "Nos Comunicacoes, S.A.", "http": {"html_hash": 1670855880, "robots_hash": null, "redirects": [], "securitytxt": null, "title": "Identification", "sitemap_hash": null, "robots": null, "favicon": null, "host": "85.139.242.90", "html": "<html>\r<head>\r<title>Identification</title>\r<meta http-equiv='Content-Type' content='text/html; charset=iso-8859-1'>\r<script>\rvar clicable = false;\rdocument.oncontextmenu = menuContextuelHandler;\rfunction menuContextuelHandler(){event.srcElement.click();return false;}\rfunction valider(arg){\rif(clicable){\rdocument.getElementById('nomMethode').value=arg;\rdocument.forms[0].submit();}clicable=false;}\rfunction loadBody(){clicable = true;\rtry{init();}catch(e){};}\rfunction doBlink(elt){\rwindow.setInterval(function(){showHide(elt);}, 1000)}\rfunction showHide(elt){if (elt){\relt.style.visibility = (elt.style.visibility == \"hidden\") ? \"visible\" : \"hidden\";}}\r</script>\r</head>\r<body onload='loadBody();'  onclick='return clicable;' id='corps'>\r<form action=Login.do method=post name='formulaire'>\r<input type='hidden' id='nomMethode' name='nomMethode' value='MainPage'>\r<input type='hidden' id='sessionId' name='sessionId' value='1571047569808'>\r<table style='border:1px solid #000000;width:100%;text-align:center'>\r<tr><td style='width:20%;text-align:left'><b>&nbsp;&nbsp;19/10/2019 10:36:05</b>\r</td><td style='width:60%;text-align:center'><i>\rKelio visio : <b><font color=blue>Kelio Visio Lavradio</font></b><font color=green> 85.139.242.90</font></i></td><td style='width:20%;text-align:right'><img src='bodet.png' align=right></td></tr><tr><td colspan=3 style='width:100%;text-align:center'><h2>Identification\r</h2></td></tr></table>\r<table style=\"width:50%\"><tr><td style=\"width:50%;text-align:center\">\r</td><td style=\"width:50%;text-align:center\">\r</td><td style=\"width:50%;text-align:center\">\r</td></tr></table><br>\r<br><br><br><br><br><br>\r<div style=\"width:60%;text-align:right\">\r<h2><img src=\"password.png\">\r&nbsp;&nbsp;Login:&nbsp;&nbsp;&nbsp;&nbsp;<input type=\"password\" name=\"password\"/>\r<script type='text/javascript'>document.formulaire.password.focus();</script>\r<input type=submit name=\"OK\" value=\"OK\" onClick=javascript:valider(\"MainPage\"); style=\"color:#000000;background-color:#CCCCCC\">\r</h2></div>\r</form>\r<br><br><br><table width=100% border=0><tr><td><h6>\r</h6></td></tr></table>\r</body>\r</html>\r", "location": "/", "components": {}, "server": null, "sitemap": null, "securitytxt_hash": null}, "port": 8008, "hostnames": ["a85-139-242-90.static.cpe.netcabo.pt"], "location": {"city": "Alverca", "region_code": "14", "area_code": null, "longitude": -9.038600000000002, "country_code3": "PRT", "country_name": "Portugal", "postal_code": "2619-510", "dma_code": null, "country_code": "PT", "latitude": 38.899}, "timestamp": "2019-10-19T09:36:09.751093", "domains": ["netcabo.pt"], "org": "ZON Tv Cabo", "data": "HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nContent-Length: 2117\r\n\r\n", "asn": "AS2860", "transport": "tcp", "ip_str": "85.139.242.90"}, {"_shodan": {"id": null, "options": {}, "ptr": true, "module": "line-printer-daemon", "crawler": "f7946cbe2dc20c40fcbcb81ad90aa01731b690ab"}, "hash": -372273874, "os": null, "opts": {}, "ip": 1435234906, "isp": "Nos Comunicacoes, S.A.", "port": 515, "hostnames": ["a85-139-242-90.static.cpe.netcabo.pt"], "location": {"city": "Alverca", "region_code": "14", "area_code": null, "longitude": -9.038600000000002, "country_code3": "PRT", "country_name": "Portugal", "postal_code": "2619-510", "dma_code": null, "country_code": "PT", "latitude": 38.899}, "timestamp": "2019-10-13T12:06:45.139731", "domains": ["netcabo.pt"], "org": "ZON Tv Cabo", "data": "no entries\n", "asn": "AS2860", "transport": "tcp", "ip_str": "85.139.242.90"}], "asn": "AS2860", "ports": [23, 443, 873, 43, 8008, 515]}

所有处理 API 接口的代码都可以正常工作,但我无法解析生成的 JSON 对象。当对象很简单时,Nim 的解析器工作正常,但是当我尝试解析上面的 JSON 时,我得到一个错误。用于解析 JSON 的 Nim 代码是:

let jsonRsp = parseJson(rspJson)

而且,编译器产生的错误是:

/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/json.nim(870) parseJson
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/json.nim(862) parseJson
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/json.nim(820) parseJson
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/json.nim(829) parseJson
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/json.nim(820) parseJson
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/json.nim(820) parseJson
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/json.nim(820) parseJson
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/json.nim(797) parseJson
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/strutils.nim(1107) parseBiggestInt
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/parseutils.nim(447) parseBiggestInt
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/parseutils.nim(423) rawParseInt
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/parseutils.nim(401) integerOutOfRangeError
Error: unhandled exception: Parsed integer outside of valid range [ValueError]

我明白错误的意思,即其中一个整数对于解析器来说太长了。由于我无法更改数据(它是 API 吐出的任何内容),我正在寻找是否有人有在 Nim 中解析这种 JSON 数据的策略。除了编译器的抱怨之外,所有其他 JSON 验证器都将字符串显示为有效的 JSON。

标签: jsonnim-lang

解决方案


这是有问题的字段:

“系列”:46474876880932987910930945182556062189

大于 2^64。这很棘手,请参阅JSON integers: limit on size

我通过三个不同的 JSON 格式化程序/验证器为您的示例 JSON 提供了数据,当它通过验证时,验证器还将上面的整数转换为浮点值,从而在此过程中丢失有效数字。即格式化/验证的结果与原始结果不同。

在 Safari 和 Firefox JS 控制台上:

JSON.parse("{\"序列号\": 46474876880932987910930945182556062189}") {序列号: 4.647487688093299e+37}

所以一些解析器默默地将那个大整数转换为不同的数字。我对这种行为的直接反应是,默默地失去精度比报告错误更糟糕。我在这里看到三个问题:

  1. 解析器默默地失去精度。
  2. 发出 JSON 时没有考虑到即使是流行的 Web 浏览器中的解析器也无法在不损失精度的情况下解析它。
  3. Nim 的 JSON 解析器可能不支持任意大整数。

第一个是三个 IMO 中最差的一个,但它不会消失。通过将序列号作为字符串而不是大整数发出,可以提高 Shodan API 的互操作性。您可以在 Nim 的问题跟踪器上报告该问题以供考虑。例如,Python 的 JSON 解析器解析该特定整数而不会损失精度。


推荐阅读