sparql - Prefered rank in Wikidata not working properly for population in some cases?
问题描述
So I'm currently working on a project where Im using data that I'm getting from Wikidata and I noticed a lot of duplicate elements in my database. Reason for that is that I'm receiving population numbers for different points in time.
I've read that Wikidata has rankings for statements with multipile values and for the population property that seems to be the most recent value-which is true for about 99.9% of the entries. What I don't understand is why it doesn't work for the other 0.1%.
One example would be: Wikidata query
The same happens for example with the elements
and I have no idea why.
I've already tried the solution from this topic but it didn't change the result.
Any ideas?
Edit based on the filter option from the thread: wikidata query 2
Edit 2: Full query
解决方案
一些 Wikidata 属性由PreferentialBot(源代码)处理。
简而言之,机器人会优先选择最近的陈述,从而使它们成为真实的。
有时机器人不处理属性的语句。例如,机器人不会处理具有没有相应限定符的语句的项目。
在您的特定情况下:
SELECT DISTINCT ?city ?cityLabel ?population ?date ?rank WHERE {
VALUES (?settlement) {(wd:Q515) (wd:Q15284)}
VALUES (?city) {(wd:Q1658752)}
?city wdt:P31/wdt:P279* ?settlement .
?city p:P1082 ?statement .
?statement ps:P1082 ?population .
?statement wikibase:rank ?rank
OPTIONAL { ?statement pq:P585 ?date }
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
} ORDER by ?date
结果:
+-------------+-----------+------------+----------------------+---------------------+
| city | cityLabel | population | date | rank |
+-------------+-----------+------------+----------------------+---------------------+
| wd:Q1658752 | Kagan | 86745 | | wikibase:NormalRank |
| wd:Q1658752 | Kagan | 17656 | 1939-01-01T00:00:00Z | wikibase:NormalRank |
| wd:Q1658752 | Kagan | 21103 | 1959-01-01T00:00:00Z | wikibase:NormalRank |
| wd:Q1658752 | Kagan | 34117 | 1970-01-01T00:00:00Z | wikibase:NormalRank |
| wd:Q1658752 | Kagan | 41565 | 1979-01-01T00:00:00Z | wikibase:NormalRank |
| wd:Q1658752 | Kagan | 48054 | 1989-01-01T00:00:00Z | wikibase:NormalRank |
+-------------+-----------+------------+----------------------+---------------------+
您更喜欢最近的声明还是“永恒”的声明?
这是您可以找到最新人口的方法:
SELECT DISTINCT ?city ?cityLabel ?population WHERE {
VALUES (?settlement) {(wd:Q515) (wd:Q15284)}
VALUES (?city) {(wd:Q1658752)}
?city wdt:P31/wdt:P279* ?settlement .
?city p:P1082 [ ps:P1082 ?population; pq:P585 ?date1 ]
FILTER NOT EXISTS {
?city p:P1082 [ pq:P585 ?date2 ]
FILTER (?date2 > ?date1) }
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}
这就是你如何找到“永恒”的:
SELECT DISTINCT ?city ?cityLabel ?population WHERE {
VALUES (?settlement) {(wd:Q515) (wd:Q15284)}
VALUES (?city) {(wd:Q1658752)}
?city wdt:P31/wdt:P279* ?settlement .
?city p:P1082 ?statement .
?statement ps:P1082 ?population .
FILTER NOT EXISTS {?statement pq:P585 []}
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}
事实上,几乎 70%(不是 0.1%)的具有该P1082
属性的条目没有该属性的首选语句。您应该更确切地说具有该属性的条目,该P1082
属性对该属性具有多个真实陈述。回顾:
真实陈述代表对给定属性具有最佳非弃用等级的陈述。也就是说,如果有一个关于 property 的首选陈述
P2
,那么只有首选陈述 forP2
才会被认为是真实的。否则,所有正常等级的陈述P2
都被认为是真实的。
是的,大约 0.5% 的条目有P1082
-statements 有两个或更多真实的P1082
-statements。
推荐阅读
- python - Tkinter - 将多列添加到 Treeview
- angularjs - 指令“selectUser”的模板必须只有一个根元素
- selenium - 如何通过Selenium和Java点击CRM应用的登录按钮?
- c - 同时在两个无限循环中创建多个线程
- angular - 将输入加载到 DOM 后将值绑定到输入角度 4
- forms - 如何在角度 6 中从 FormGroup 重置特定控件
- windows - Windows 激活在迁移实例的谷歌计算平台中失败
- jquery-ui - 拖放可交换 DIV JQuery UI
- mfc - 为什么我使用 MsgWaitForMultipleObjects 时无法退出 MFC 程序?
- java - 如何将嵌入文档从 Mongo DB 导出到 java 中的 excel 工作表