python-3.x - 如何从 utf-8 LIST 中去除无用的字符
问题描述
我有以下片段。
def profile_details(): #function to fetch people
payload = 'grab'
global result_people
result_people = []
for i in range(0,5):
git_url = "https://github.com/search?p="+str(i)+"&q="+str(payload)+"&type=Users"
rr = requests.get(git_url, headers=burp0_headers, cookies=burp0_cookies)
page = bs4.BeautifulSoup(rr.text,"lxml")
page_parse = page.select('.user-list-info p')
for i in range(len(page_parse)):
test = page_parse[i].text
if ('@ Grab' in test) or ('at Grab' in test) or ('@Grab' in test) or ('@grab' in test):
a = result_people.append(page_parse[i].text.encode("utf-8"))
else:
pass
profile_details()
for i in result_people:
print(i)
输出看起来像这样
[b'\n Front End @facebook \xf0\x9f\x8c\x9d \xc2\xb7 Maintaining Docusaurus \xc2\xb7 Ex-@grab \xf0\x9f\x87\xb8\xf0\x9f\x87\xac\r\n\n ', b'\n Coding at Amazon, previously @Grab\n', b'\n Software Engineer @grab \r\nPreviously @shopback \n ', b'\n Front End @facebook \xf0\x9f\x8c\x9d \xc2\xb7 Maintaining Docusaurus \xc2\xb7 Ex-@grab \xf0\x9f\x87\xb8\xf0\x9f\x87\xac\r\n\n ', b'\n Coding at Amazon, previously @Grab\n', b'\n Software Engineer @grab \r\nPreviously @shopback \n ', b'\n UX Engineer @ Grab\n', b'\n Designer at @Grab. Design Systems. Emerging tech (AR).\n ', b'\n Mobile Developer (iOS) @Grab. Previously Flipkart.\n ', b'\n Data science and engineering at Grab\n', b'\n Software Engineer @ Grab.\n ', b"\n Finding top #talent for @Grab's #mobile #app development teams, software engineering, #iOS & #Android in #Singapore\n ", b'\n Frontend Software Engineer at Grab\n', b'\n Developer @Grab(GrabTaxi)\n ', b'\n Full Stack - Software Engineer @ Grab | AI Enthusiast\n ', b'\n Software Engineer at Grab\n', b'\n Software Engineer @Grab | Previous @udacity @disney | Open Source nut, right now juggling with iOS and Swift\n ', b'\n Ex-Engineering Lead @grab, Ex-DoE @90seconds\n ', b'\n Software Engineer/ Gopher. Worked @grab, @microsoft\n ']
我想从列表中删除诸如 \xf0\x9f\x8c\x9d \ 之类的字符。
输出看起来像一团糟:
b'\n Front End @facebook \xf0\x9f\x8c\x9d \xc2\xb7 Maintaining Docusaurus \xc2\xb7 Ex-@grab \xf0\x9f\x87\xb8\xf0\x9f\x87\xac\r\n\n '
b'\n 在亚马逊编码,以前是@Grab\n' b'\n 软件工程师 @grab \r\n以前是 @shopback \n ' b'\n 前端 @facebook \xf0\x9f\x8c\x9d \xc2\ xb7 维护 Docusaurus \xc2\xb7 Ex-@grab \xf0\x9f\x87\xb8\xf0\x9f\x87\xac\r\n\n ' b'\n 在亚马逊编码,以前是@Grab\n' b' \n 软件工程师@grab \r\n以前@shopback \n '
什么是实现这一目标的最简单方便的方法。
提前致谢
解决方案
欢迎来到 StackOverflow!
您可以通过从每个字符串中删除所有非 ASCII 字符来实现
for i in result_people:
print(i.decode('utf8').encode('ascii', errors='ignore'))
推荐阅读
- wordpress - 移动端:点击菜单链接后隐藏菜单
- typescript - 使用在 Typescript 中导入的图像
- html - HTML 相对链接链接到导航栏覆盖的锚点
- angular - 使用无效数据调用 DocumentReference.set()。不支持的字段值:未定义
- windows - StoreMI - 创建可启动的 StoreMI 灰显
- php - 使用基于特定标记 ID 的 Foreach 循环提取 XML 数据
- excel - 动态条件格式
- python - 添加新小部件时,滚动区域无法展开(滚动)
- linear-regression - 在回归中解释百分比单位和每 1000 人的比率
- amazon-web-services - Jenkins AWS Spot 队列插件不会自动扩展 Spot 实例