首页 > 解决方案 > 正则表达式在 JS 对象中找到匹配项

问题描述

我正在抓取一个站点,并且我想要的数据包含在 html 页面的脚本标记中,我编写了一个re代码来查找匹配项,但似乎我做错了。

    Hub = {};
    Hub.config = {
        config: {},
        get: function(key) {
            if (key in this.config) {
                return this.config[key];
            } else {
                return null;
            }
        },
        set: function(key, val) {
            this.config[key] = val;
        }
    };

    Hub.config.set('sku', {
        valCartInfo      : {
            itemId : '576938415361',
            cartUrl: '//cart.mangolane.com/cart.htm'
        },
        apiRelateMarket  : '//tui.mangolane.com/recommend?appid=16&count=4&itemid=576938415361',
        apiAddCart       : '//cart.mangolane.com/add_cart_item.htm?item_id=576938415361',
        apiInsurance     : '',
        wholeSibUrl      : '//detailskip.mangolane.com/service/getData/1/p1/item/detail/sib.htm?itemId=576938415361&sellerId=499095250&modules=dynStock,qrcode,viewer,price,duty,xmpPromotion,delivery,upp,activity,fqg,zjys,amountRestriction,couponActivity,soldQuantity,page,originalPrice,tradeContract',
        areaLimit        : '',
        bigGroupUrl      : '',
        valPostFee       : '',
        coupon           : {
            couponApi         : '//detailskip.mangolane.com/json/activity.htm?itemId=576938415361&sellerId=499095250',
            couponWidgetDomain: '//assets.mgcdn.com',
            cbUrl             : '/cross.htm?type=weibo'
        },
        valItemInfo      : {

            defSelected: -1,
            skuMap     : {";20549:103189693;1627207:811754571;":{"price":"528.00","stock":"2","skuId":"4301611864655","oversold":false},
                          ";20549:59280855;1627207:412796441;":{"price":"528.00","stock":"2","skuId":"4432149803707","oversold":false},
                          ";20549:59280855;1627207:196576508;":{"price":"528.00","stock":"2","skuId":"4018119863100","oversold":false},
                          ";20549:72380707;1627207:28341;":{"price":"528.00","stock":"2","skuId":"4166690818570","oversold":false},
                          ";20549:418624880;1627207:28341;":{"price":"528.00","stock":"2","skuId":"4166690818566","oversold":false},
                          ";20549:418624880;1627207:196576508;":{"price":"528.00","stock":"2","skuId":"4018119863098","oversold":false},
                          ";20549:72380707;1627207:3224419;":{"price":"528.00","stock":"2","skuId":"4166690818571","oversold":false},
                          ";20549:147478970;1627207:196576508;":{"price":"528.00","stock":"2","skuId":"4018119863094","oversold":false},
                          ";20549:72380707;1627207:384366805;":{"price":"528.00","stock":"2","skuId":"4432149803708","oversold":false},
                          ";20549:296172561;1627207:811754571;":{"price":"528.00","stock":"2","skuId":"4301611864659","oversold":false},
                          ";20549:72380707;1627207:1150336209;":{"price":"528.00","stock":"2","skuId":"4301611864664","oversold":false},
                          ";20549:147478970;1627207:93586002;":{"price":"528.00","stock":"2","skuId":"4018119863095","oversold":false}}
            ,propertyMemoMap: {"1627207:811754571":"黑色单里(预售) 年后2.29发货","1627207:93586002":"黑色加绒 现货","1627207:412796441":"黑色(兔毛) 现货","1627207:384366805":"米白色(兔毛) 现货","1627207:3224419":"驼色 现货","1627207:1150336209":"驼色单里(预售) 年后2.29发货","1627207:28341":"黑色 现货","1627207:196576508":"驼色加绒 现货"}


        }
    });

我只需要获取数据Hub.config.set('sku'

我这样做了,但没有用

config_base_str = re.findall("Hub.config.set ({[\s\S]*?});", config)config数据字符串在哪里

标签: python

解决方案


句号和括号在正则表达式中具有特殊含义。如果要搜索文字字符,则需要先使用反斜杠对其进行转义。

例如假设字符串:

config = """
    Hub.config.set('sku', {
    valCartInfo      : {
        itemId : '576938415361',
        cartUrl: '//cart.mangolane.com/cart.htm'
    },
.........
};
"""

如果您只想要密钥,则可以执行以下操作:

config_base_str = re.findall("Hub\.config\.set\('(\w*)", config)  # ['sku']

如果您想要括号内的键后面的所有内容,您可以执行以下操作:

config_base_str = re.findall("Hub\.config\.set\('\w*',\s*({[\s\S]*})", config)  #  ["{\n valCartInfo : {} ...}"]

https://regex101.com/r/QHdaG2/3/


推荐阅读