python - 使用“更多”文本抓取评论
问题描述
正如标题所述,我需要帮助从这个名为 TripAdivsor 的网站上抓取评论。我使用的具体链接是https://www.tripadvisor.co.uk/Restaurant_Review-g60834-d4106745-Reviews-McDonald_s-Page_Arizona.html
问题是在某些评论中,有“更多”文本可以查看评论的其余部分(例如,上面链接上的第二次评论)。如何抓取包含此“更多”文本的评论?
当我点击链接时,有没有办法可以打开它们,或者这是找到包含整个评论的正确标签的问题?
解决方案
使用 Selenium 和 Beautiful soup.Check for More 按钮,如果点击它并获取 page_source。
from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Chrome()
driver.get('https://www.tripadvisor.co.uk/Restaurant_Review-g60834-d4106745-Reviews-McDonald_s-Page_Arizona.html')
if len(driver.find_elements_by_xpath("//span[@class='taLnk ulBlueLinks'][contains(.,'More')]"))>0:
driver.find_elements_by_xpath("//span[@class='taLnk ulBlueLinks'][contains(.,'More')]")[0].click()
time.sleep(3)
soup=BeautifulSoup(driver.page_source,'html.parser')
driver.quit()
items=[item.text for item in soup.select("p.partial_entry")]
print(items)
输出:
['Stopped by to get some chicken strips to go. They were out of soft drinks, but I was getting coffee. Restrooms were clean.', "We live in page Arizona and go to McDonald's on the occasion that we don't want to cook but almost every time that we stop in the service is horrible. There has been times where the drive thru would not say anything to us until we decided to drive back around to really let them know we were ready to order food. The manager whom i have talked to on multiple occasions acts like it's bo big deal that their restaurant shows no respect for the customers. Finally i decided to write a review before calling corporate. I understand not wanting or liking your job at McDonald's but you made the life decisions to be where you are the least you could do is show some respect for your customers especially the locals of this tourist town.", 'The location was newer, clean and kept up very well. The hot fudge sundaes were great . Stopped by for a snack', 'We stopped in to grab a little snack before heading to Horseshoe Bend. My husband got a double cheeseburger, I ordered an apple pie. His burger was fine. The apples in the pie were all shriveled up. It looked old. I looked at the time on the box and it had expired 4 hours before. I walked back in and asked for a new one, explaining the one they just gave me was quite old. Then he handed me one and said try this one. I looked at the date and it expired 2 hours before. I asked if the had any fresh ones. He went into the back for awhile and came out with a new one.', 'I like the coffee, there was few times they messed up coffee 3x in a row. but its okay i had patience for them to get it right. I only like their fries, coffee, and a very few sandwiches. plus the nuggets. clean restrooms. clean tables but rude managers', 'Ordered mg nuggets and Big Mac for two and waited 25 minutes I decided to go ask for a refund or compensation but the manager did not want He said if I refund you ,you will not have your mealI find that not acceptable to wait that long and Big Macs were coldI am a big traveller and never saw a Manager like that Don’t go there Go to Taco Bell ...', "the employees were very fast and efficient at the service they provided whilst giving me my food. McDonald's is always reliable whenever you want a quick snack.", "It is a newer looking location with a huge amount of parking. The dining area was very large and quite clean. The service was very good. The food was just like any other McD's.", 'win i eat at the best restaurant the meals are the best i love the fries it gives me taste of joy . i like to eat their again i like to eat their win im on the road and i like to never stop eating its my great place to eat', "This is a new facility in what looks like a newer area of Page. Typical McDonald's but great service and new building makes this a good stop if you are looking for a quick fill up."]
推荐阅读
- amazon-web-services - 如何从 serverless.yml 中的秘密管理器中引用对象值?
- angular - 如何定义`http客户端`
- gerrit - 重命名 gerrit 项目
- reactjs - 导入类型两个不同的项目
- php - 我正在将我的 API 从 slim-3 转换为 slim-4,我正在努力弄清楚如何将 JWT 添加到中间件
- sql - 接收 ORA-01843: 检索两个日期之间的数据时出现无效月份错误
- c++ - 将嵌套的“for循环”转换为模块
- go - 在 Go 中使用 with context.WithTimeout() 时的最佳实践是什么?
- php - 在php中提取数据
- javascript - 是否有一种微妙的方法可以在 Javascript 中传递日历事件变量