首页 > 解决方案 > 从网站抓取库存可用性数据

问题描述

我想从以下网站抓取某种产品的库存情况。

[{"@type":"Offer","availability":"https://schema.org/InStock","price":"479.00","priceCurrency":"EUR","url":"https: //www.mantel.com/cube-aim-pro&spec[]=9470&spec[]=2756&spec[]=285"}, {"@type":"Offer","availability":"http://schema.org /OutOfStock","price":"479.00","priceCurrency":"EUR","url":"https://www.mantel.com/cube-aim-pro&spec[]=9470&spec[]=2768&spec[] =285"},{"@type":"Offer","availability":"http://schema.org/OutOfStock","price":"479.00","priceCurrency":"EUR","url" :"https://www.mantel.com/cube-aim-pro&spec[]=9470&spec[]=2811&spec[]=285"},{"@type":"Offer","availability":"http://schema.org/OutOfStock","price":"479.00", "priceCurrency":"EUR","url":"https://www.mantel.com/cube-aim-pro&spec[]=9470&spec[]=2757&spec[]=285"}],"aggregateRating":{" @type":"AggregateRating","ratingValue":"9.0","ratingCount":"6","bestRating":"10"}}spec[]=285"}],"aggregateRating":{"@type":"AggregateRating","ratingValue":"9.0","ratingCount":"6","bestRating":"10"}}spec[]=285"}],"aggregateRating":{"@type":"AggregateRating","ratingValue":"9.0","ratingCount":"6","bestRating":"10"}}

我需要 schema.org/Instock 或 schema.org/OutOfStock 最终在产品有货时收到通知,以便我可以购买它。这对我来说是个人的,因为目前山地自行车的可用性非常有限。所以我想构建一个快速程序来在库存挂载 MTB 大小时收到通知字段是“库存”。我熟悉 SSIS 和 SQL 服务器。有人可以帮我获取从网站获取的数据吗?

标签: sqlssis

解决方案


可以在SSIS中直接做json,也可以用SQL Server

使用 ssis 将 Json 插入表中,然后使用 Openjson 对其进行解析:

在这里,我将您的示例 json 插入到临时表中,并使用 tsql 进行查询:

DECLARE @json NVARCHAR(MAX) =
N'
[{"@type":"Offer","availability":"https://schema.org/InStock","price":"479.00","priceCurrency":"EUR","url":"https://www.mantel.com/cube-aim-pro&spec[]=9470&spec[]=2756&spec[]=285"}
,{"@type":"Offer","availability":"http://schema.org/OutOfStock","price":"479.00","priceCurrency":"EUR","url":"https://www.mantel.com/cube-aim-pro&spec[]=9470&spec[]=2768&spec[]=285"}
,{"@type":"Offer","availability":"http://schema.org/OutOfStock","price":"479.00","priceCurrency":"EUR","url":"https://www.mantel.com/cube-aim-pro&spec[]=9470&spec[]=2811&spec[]=285"}
,{"@type":"Offer","availability":"http://schema.org/OutOfStock","price":"479.00","priceCurrency":"EUR","url":"https://www.mantel.com/cube-aim-pro&spec[]=9470&spec[]=2757&spec[]=285"}]
,"aggregateRating":{"@type":"AggregateRating","ratingValue":"9.0","ratingCount":"6","bestRating":"10"}}'


CREATE TABLE #tmp (
      id INT IDENTITY (1, 1) NOT NULL
    , json NVARCHAR(MAX) NOT NULL
)

INSERT INTO #tmp (json)
VALUES (@json)

SELECT [AdType]
     , [availability]
     , [price]
     , [priceCurrency]
     , [url]
FROM (
    SELECT TOP 1 json
    FROM #tmp
    ORDER BY id DESC
) a
    OUTER APPLY OPENJSON(a.json)
    WITH
    (
    AdType VARCHAR(100) '$."@type"'
    , availability NVARCHAR(256)
    , price DECIMAL(19, 2)
    , priceCurrency NVARCHAR(3)
    , url NVARCHAR(512)
    )

你的标签中有python。如果您使用 Python 来获取数据,您可以直接将 json 解析为 python 对象,而无需使用 SSIS 或 SQL 服务器


推荐阅读