首页 > 解决方案 > Prefered rank in Wikidata not working properly for population in some cases?

问题描述

So I'm currently working on a project where Im using data that I'm getting from Wikidata and I noticed a lot of duplicate elements in my database. Reason for that is that I'm receiving population numbers for different points in time.

I've read that Wikidata has rankings for statements with multipile values and for the population property that seems to be the most recent value-which is true for about 99.9% of the entries. What I don't understand is why it doesn't work for the other 0.1%.

One example would be: Wikidata query

The same happens for example with the elements

and I have no idea why.

I've already tried the solution from this topic but it didn't change the result.

Any ideas?


Edit based on the filter option from the thread: wikidata query 2

Edit 2: Full query

标签: sparqlwikidata

解决方案


一些 Wikidata 属性由PreferentialBot源代码)处理。

简而言之,机器人会优先选择最近的陈述,从而使它们成为真实的

有时机器人不处理属性的语句。例如,机器人不会处理具有没有相应限定符的语句的项目。

在您的特定情况下:

SELECT DISTINCT ?city ?cityLabel ?population ?date ?rank WHERE {
  VALUES (?settlement) {(wd:Q515) (wd:Q15284)}
  VALUES (?city) {(wd:Q1658752)}
  ?city wdt:P31/wdt:P279* ?settlement . 
  ?city p:P1082 ?statement .
  ?statement ps:P1082 ?population .
  ?statement wikibase:rank ?rank
  OPTIONAL { ?statement pq:P585 ?date }  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }   
} ORDER by ?date

试试看

结果:

+-------------+-----------+------------+----------------------+---------------------+
|    city     | cityLabel | population |        date          |         rank        |
+-------------+-----------+------------+----------------------+---------------------+
| wd:Q1658752 | Kagan     |      86745 |                      | wikibase:NormalRank |
| wd:Q1658752 | Kagan     |      17656 | 1939-01-01T00:00:00Z | wikibase:NormalRank |
| wd:Q1658752 | Kagan     |      21103 | 1959-01-01T00:00:00Z | wikibase:NormalRank |
| wd:Q1658752 | Kagan     |      34117 | 1970-01-01T00:00:00Z | wikibase:NormalRank |
| wd:Q1658752 | Kagan     |      41565 | 1979-01-01T00:00:00Z | wikibase:NormalRank |
| wd:Q1658752 | Kagan     |      48054 | 1989-01-01T00:00:00Z | wikibase:NormalRank |
+-------------+-----------+------------+----------------------+---------------------+

您更喜欢最近的声明还是“永恒”的声明?

这是您可以找到最新人口的方法:

SELECT DISTINCT ?city ?cityLabel ?population WHERE {
  VALUES (?settlement) {(wd:Q515) (wd:Q15284)}
  VALUES (?city) {(wd:Q1658752)}
  ?city wdt:P31/wdt:P279* ?settlement . 
  ?city p:P1082 [ ps:P1082 ?population; pq:P585 ?date1 ]  
  FILTER NOT EXISTS {
    ?city p:P1082 [ pq:P585 ?date2 ]
    FILTER (?date2 > ?date1) }
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }   
}

试试看

这就是你如何找到“永恒”的:

SELECT DISTINCT ?city ?cityLabel ?population WHERE {
  VALUES (?settlement) {(wd:Q515) (wd:Q15284)}
  VALUES (?city) {(wd:Q1658752)}
  ?city wdt:P31/wdt:P279* ?settlement . 
  ?city p:P1082 ?statement .
  ?statement ps:P1082 ?population .
  FILTER NOT EXISTS {?statement pq:P585 []}
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }   
}

试试看


事实上,几乎 70%(不是 0.1%)的具有该P1082属性的条目没有该属性的首选语句。您应该更确切地说具有该属性的条目,该P1082属性对该属性具有多个真实陈述。回顾:

真实陈述代表对给定属性具有最佳非弃用等级的陈述。也就是说,如果有一个关于 property 的首选陈述P2,那么只有首选陈述 forP2才会被认为是真实的。否则,所有正常等级的陈述P2都被认为是真实的。

是的,大约 0.5% 的条目有P1082-statements 有两个或更多真实的P1082-statements。


推荐阅读