首页 > 解决方案 > 使用 R 抓取 SoFifa.com 时为玩家的国籍选择什么 css 元素?

问题描述

所以我一直在尝试使用 rvest 包在 SoFifa.com 上获取玩家详细信息。加倍努力从表格中逐列刮取。这是我卡住的地方。我无法获得玩家国籍。也许我选择了错误的 CSS 元素。尝试使用选择器小工具工具,但仍然没有运气。以下是代码。任何帮助,将不胜感激!!

#Website link to be scraped with selected columns.

link= "https://sofifa.com/"


#Name of each player. This works perfectly fine as all names are retrived

Name <- link %>% read_html() %>% 
        html_nodes(".nowrap") %>% 
        html_text() 


#Nationality is not displayed. While inspecting this section, I observed that title of the element < a rel=nofollow> under <div class="bp3-text-overflow-ellipsis">
needs to be selected. Need help to how to do that!!

Nationality <- link %>% read_html() %>% 
        html_nodes(".flag") %>% 
        html_text() 

#Tried .flag as the selector gadget suggested it but still it doesnt retrieve the Nationality for a player 





标签: rweb-scrapingrvest

解决方案


你可以结合两个属性来得到你所追求的。尝试 :

#<a rel="nofollow" href="/players?na=14" title="England">...</a>
# the *= in css selectors means that attribute contains a certain text
# here is the css selecot     
#.bp3-text-overflow-ellipsis a[rel="nofollow"][href*="players?"]

page <- read_html(link)
Nationality  <- page %>% html_nodes('.bp3-text-overflow-ellipsis a[rel="nofollow"][href*="players?"]') %>% html_attr('title')
print(Nationality )

输出:

[1] "Italy"          "England"        "Togo"           "France"
 [5] "Ghana"          "Brazil"         "Norway"         "Spain"
 [9] "Nigeria"        "Argentina"      "Spain"          "England"
[13] "Portugal"       "England"        "Denmark"        "England"
[17] "Italy"          "Argentina"      "England"        "Portugal"
[21] "Argentina"      "Norway"         "Brazil"         "Norway"
[25] "Netherlands"    "Germany"        "England"        "Uruguay"
[29] "United States"  "Argentina"      "Netherlands"    "Czech Republic"
[33] "Brazil"         "France"         "Argentina"      "Brazil"
[37] "Poland"         "Brazil"         "Italy"          "Portugal"
[41] "Netherlands"    "Netherlands"    "Netherlands"    "Morocco"
[45] "Argentina"      "Spain"          "Argentina"      "France"
[49] "Netherlands"    "Brazil"         "Argentina"      "France"
[53] "Canada"         "Canada"         "Switzerland"    "Brazil"
[57] "Germany"        "Netherlands"    "Jamaica"        "France"

推荐阅读