首页 > 解决方案 > 过滤网站上的特定评论

问题描述

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import urllib2
#import re
from BeautifulSoup import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0'}

req = urllib2.Request('https://www.sikayetvar.com/onedio', 
None,headers)
resp  = urllib2.urlopen(req)
html = resp.read()
soup = BeautifulSoup(html)

complaints = soup.findAll('p', attrs = {'class' : 'complaint-summary'})


for complaint in complaints:
   if complaint.text.find("genç") is not -1:
      print complaint.text

我想在网站上过滤某些包含特定单词的投诉,但我无法搜索其中包含非 ASCII 字符的单词。我正在使用 python 2.7 和 beautifulsoup。知道为什么会这样吗?

标签: pythonbeautifulsoupweb

解决方案


如果您的测试在 p 标签内,YouTube 应该将 od 语句更改为

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import urllib2
from BeautifulSoup import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0'}

req = urllib2.Request('https://www.sikayetvar.com/onedio', 
None,headers)
resp  = urllib2.urlopen(req)
html = resp.read()
soup = BeautifulSoup(html)

complaints = soup.findAll('p', attrs = {'class' : 'complaint-summary'})

for complaint in complaints:
    if b"genç".decode("utf-8") in complaint.text:
        print(complaint.text)

推荐阅读