python - 在 CSV 文件中搜索特定字符串时遇到问题
问题描述
我有 .csv 文件形式的 youtube 评论,我想做的是在评论中搜索特定的单词。我有一个列表,我认为我正在将包含评论的行与之进行比较,但是当它遇到该列表中的一个术语时,它似乎并没有添加到 slurCount 中,而 noSlurCount 计算了所有评论。
import csv
slurCount = 0
noSlurCount = 0
with open('target_file.csv', encoding="utf8") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
slurList = ["slurX", "slurY", "SlurZ", "slurETC"]
line_count = 0
for row in csv_reader:
if line_count == 0:
print(f'Column names are {", ".join(row)}')
line_count += 1
else:
commentText = {row[2]}
if commentText in {row[2]}:
slurCount += 1
print ("\t Comment contained a slur")
else:
noSlurCount += 1
print ("\t Comment didn't contain a slur")
print(f'\t The comment ID is: {row[0]}')
print(f'\t Their comment was: {row[2]}')
print(f'\t The comment received: {row[3]} likes.')
line_count += 1
print(f'Processed {line_count} lines.')
print(f'Found {slurCount} comments with slurs.')
print(f'Found {noSlurCount} comments without slurs.')`
任何帮助都是极好的
解决方案
您至少应该针对您的 slur-list 进行测试。这是错误的:
commentText = {row[2]} if commentText in {row[2]}:
这永远不是真的,因为您测试:
if {"something"} in { "something" }:
这是False
因为..它不在里面:o)
更好的是使用 set 和set.intersection():
创建模糊文件:
with open('target_file.csv', "w", encoding="utf8") as f:
f.write("id,no idea,comment,likes, what columns,you,have\n")
f.write("1,,bla SlurZ bla,10,,,\n")
f.write("2,,bla SlurZ bla,20,,,\n")
f.write("3,,bla SlurZ. bla,30,,,\n")
f.write("4,,bla no bla,40,,,\n")
f.write("5,,bla no bla,50,,,\n")
f.write("6,,bla no bla,60,,,\n")
f.write("7,,bla no bla,70,,,\n")
f.write("8,,bla slurX- bla,80,,,\n")
f.write("9,,bla SlurZ bla,90,,,\n")
f.write("10,,bla SlurZ bla,100,,,\n")
f.write("11,,bla SlurZ bla,110,,,\n")
程序:
import csv
slurCount = 0
noSlurCount = 0
line_count = 0
with open('target_file.csv', encoding="utf8") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
# use a set
slurs = {"slurX", "slurY", "SlurZ", "slurETC"}
# get the header
header = ", ".join(next(csv_reader))
print(f'Column names are {header}')
for row in csv_reader:
line_count += 1
# you need to clean the comment-words from punctuation marks
# so it detects slurY. or slurY- as slur as well
if slurs.intersection( (x.strip(",.-!?: ") for x in row[2].split() ) ):
slurCount += 1
print ("\t Comment contained a slur:")
print (f"\t\t{row[2]}")
else:
noSlurCount += 1
print ("\t Comment didn't contain a slur")
print(f'\t\t The comment ID is: {row[0]}')
print(f'\t\t Their comment was: {row[2]}')
print(f'\t\t The comment received: {row[3]} likes.')
print(f'Processed {line_count} lines.')
print(f'Found {slurCount} comments with slurs.')
print(f'Found {noSlurCount} comments without slurs.')
输出:
Column names are id, no idea, comment, likes, what columns, you, have
Comment contained a slur:
bla SlurZ bla
Comment contained a slur:
bla SlurZ bla
Comment contained a slur:
bla SlurZ. bla
Comment didn't contain a slur
The comment ID is: 4
Their comment was: bla no bla
The comment received: 40 likes.
Comment didn't contain a slur
The comment ID is: 5
Their comment was: bla no bla
The comment received: 50 likes.
Comment didn't contain a slur
The comment ID is: 6
Their comment was: bla no bla
The comment received: 60 likes.
Comment didn't contain a slur
The comment ID is: 7
Their comment was: bla no bla
The comment received: 70 likes.
Comment contained a slur:
bla slurX- bla
Comment contained a slur:
bla SlurZ bla
Comment contained a slur:
bla SlurZ bla
Comment contained a slur:
bla SlurZ bla
Processed 11 lines.
Found 7 comments with slurs.
Found 4 comments without slurs.
独库:
推荐阅读
- javascript - Javascript 可以检测到 POST 操作已完成吗?
- javascript - 重置循环 setTimeout()
- pyspark - py spark成功写入无效日期但读取时抛出异常
- r - 将数字添加到数据帧中的所有零并进行对数转换
- node.js - 通过事件网关异步回答 Alexa Smart Home Skill 时,如何回答 AWS Lambda?
- sas - 如何为 SAS 中的每个类生成不同的协方差矩阵?
- jquery - 如何在javascript中以最佳方式从给定的字符串“交货时间(上午8点 - 下午5点)”中获取“(上午8点 - 下午5点)”?
- apache-spark - Parquet 文件中的 Null 值最佳实践
- python - 在给定条件下检索系列行
- javascript - 仅通过单击特定元素的按钮更改样式表