sql - 从网站抓取库存可用性数据
问题描述
我想从以下网站抓取某种产品的库存情况。
[{"@type":"Offer","availability":"https://schema.org/InStock","price":"479.00","priceCurrency":"EUR","url":"https: //www.mantel.com/cube-aim-pro&spec[]=9470&spec[]=2756&spec[]=285"}, {"@type":"Offer","availability":"http://schema.org /OutOfStock","price":"479.00","priceCurrency":"EUR","url":"https://www.mantel.com/cube-aim-pro&spec[]=9470&spec[]=2768&spec[] =285"},{"@type":"Offer","availability":"http://schema.org/OutOfStock","price":"479.00","priceCurrency":"EUR","url" :"https://www.mantel.com/cube-aim-pro&spec[]=9470&spec[]=2811&spec[]=285"},{"@type":"Offer","availability":"http://schema.org/OutOfStock","price":"479.00", "priceCurrency":"EUR","url":"https://www.mantel.com/cube-aim-pro&spec[]=9470&spec[]=2757&spec[]=285"}],"aggregateRating":{" @type":"AggregateRating","ratingValue":"9.0","ratingCount":"6","bestRating":"10"}}spec[]=285"}],"aggregateRating":{"@type":"AggregateRating","ratingValue":"9.0","ratingCount":"6","bestRating":"10"}}spec[]=285"}],"aggregateRating":{"@type":"AggregateRating","ratingValue":"9.0","ratingCount":"6","bestRating":"10"}}
我需要 schema.org/Instock 或 schema.org/OutOfStock 最终在产品有货时收到通知,以便我可以购买它。这对我来说是个人的,因为目前山地自行车的可用性非常有限。所以我想构建一个快速程序来在库存挂载 MTB 大小时收到通知字段是“库存”。我熟悉 SSIS 和 SQL 服务器。有人可以帮我获取从网站获取的数据吗?
解决方案
可以在SSIS中直接做json,也可以用SQL Server
使用 ssis 将 Json 插入表中,然后使用 Openjson 对其进行解析:
在这里,我将您的示例 json 插入到临时表中,并使用 tsql 进行查询:
DECLARE @json NVARCHAR(MAX) =
N'
[{"@type":"Offer","availability":"https://schema.org/InStock","price":"479.00","priceCurrency":"EUR","url":"https://www.mantel.com/cube-aim-pro&spec[]=9470&spec[]=2756&spec[]=285"}
,{"@type":"Offer","availability":"http://schema.org/OutOfStock","price":"479.00","priceCurrency":"EUR","url":"https://www.mantel.com/cube-aim-pro&spec[]=9470&spec[]=2768&spec[]=285"}
,{"@type":"Offer","availability":"http://schema.org/OutOfStock","price":"479.00","priceCurrency":"EUR","url":"https://www.mantel.com/cube-aim-pro&spec[]=9470&spec[]=2811&spec[]=285"}
,{"@type":"Offer","availability":"http://schema.org/OutOfStock","price":"479.00","priceCurrency":"EUR","url":"https://www.mantel.com/cube-aim-pro&spec[]=9470&spec[]=2757&spec[]=285"}]
,"aggregateRating":{"@type":"AggregateRating","ratingValue":"9.0","ratingCount":"6","bestRating":"10"}}'
CREATE TABLE #tmp (
id INT IDENTITY (1, 1) NOT NULL
, json NVARCHAR(MAX) NOT NULL
)
INSERT INTO #tmp (json)
VALUES (@json)
SELECT [AdType]
, [availability]
, [price]
, [priceCurrency]
, [url]
FROM (
SELECT TOP 1 json
FROM #tmp
ORDER BY id DESC
) a
OUTER APPLY OPENJSON(a.json)
WITH
(
AdType VARCHAR(100) '$."@type"'
, availability NVARCHAR(256)
, price DECIMAL(19, 2)
, priceCurrency NVARCHAR(3)
, url NVARCHAR(512)
)
你的标签中有python。如果您使用 Python 来获取数据,您可以直接将 json 解析为 python 对象,而无需使用 SSIS 或 SQL 服务器
推荐阅读
- matlab - 我很难找到一个代码来估计 x=pi/4 处 sin(x) 的二阶导数,h^-1 为八度
- python - django搜索页面抛出数据
- rust - Rust 会自动将函数返回的所有数据包装在 Box 中吗?
- appcelerator - 为什么我收到这个问题的错误?
- python - 计算一个字符串以另一个子字符串开头的次数
- algorithm - 当输入大小改变时,为什么算法的优先级会改变
- spring-boot - 在我的单元 tescases 中,我只想加载一次 h2 脚本
- django - 验证(请求,用户名=用户名,密码=pswd)为自定义用户模型返回无
- google-chrome-devtools - Chrome DevTools 性能火焰图的时间分辨率
- firebase - Firebase 电话身份验证无法在 IOS 上运行