r - 使用字典过滤出所需的推文
问题描述
我试图过滤掉推文但卡住了。我想要一本包含所有英语单词以及我添加的单词的字典,并通过使用它来过滤掉我的推文。因为我有一个包含推文的数据框,并且文本列是:
text
1 | @a_siab @sardarbabak999 @BushraGohar @jafarshahmp @Palwasha_Abbas @Khadimhussain4 @Khushal_Khattak @SPOX_ANP @mjdawar @AsgharAchakzaii @AmirHaiderKH @MianIftikharHus @khaistak50 @SangeenKhan
2 | @KPKUpdates @ImranKhanPTI @jhagra @Shah_FarmanPTI @AsadQaiserPTI @MushtaqGhaniPTI @PervezKhattakCM @ziataj @AtifKhanpti It's much better than anp and mma time
3 | Easy load was started by PTI and not ANP #KaptaanFailedInKP
4 | @Palwasha_Abbas @Gulalai_Ismail This was much needed and people are happy with using it which avoided traffic issues. Your only issue is PTI buried ANP easyload shops forever now it’s obvious you will cry
5 | @x_anp <U+304A><U+3081><U+3067><U+3068><U+3046>!!
6 | Tourism & Poor Condition of Swat Roads, Part-2 #Swat #Tourism #Kpk #CareTakerPM #NasarullMuik #MaryamNawaz #PMLN #PTI #ImranKhan #Newsonepk #ANP #Nadia @nadia_a_mirza #Pakistan
7 | @Palwasha_Abbas Kuch samjhayein in ko... Articles likhnay say character thek nhe hotay... ANP is history. @BushraGohar
8 | <U+062F> <U+06A9><U+0644><U+064A> <U+0631><U+0648><U+063A> <U+062A><U+0631><U+06CC><U+0646><U+0647> <U+0686><U+0627><U+067E><U+06D0><U+0631><U+0647> <U+062F><U+064A> <U+0627><U+0648> <U+062E><U+0627><U+0646><U+062F><U+064A> <U+067E><U+0633><U+06D0> <U+06CC><U+0648> <U+0644><U+06CC><U+0648><U+0646><U+06CC> <U+0648><U+0631><U+062A><U+0647> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+062E><U+0628><U+0631><U+06D0> <U+06A9><U+0648><U+064A> @AbdullahMalikJ @a_baittani @SherazMmd @MianIftikharHus @anp_mohmand @MisbahUtmani @takkar1234 @anp_mohmand
9 | <U+0648> <U+0644><U+06CC><U+0648><U+0646><U+06CC> <U+0648><U+0631><U+062A><U+0647> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+062E><U+0628><U+0631><U+06D0> <U+06A9><U+0648><U+064A> @AbdullahMalikJ @a_baittani @SherazMmd @MianIftikharHus @anp_mohmand @MisbahUtmani @takkar1234 @anp_mohmand @AmirHaiderKH @MianIftikharHus @khaistak50 @SangeenKhan ANP is history
10 | @MianIftikharHus Pti ki govt bilkul bhi ideal nhi ti magar mazrat k sath anp or mma ki pechli govt se pti ki govt kafi behtr ti .hospitals or schools me tabdeeli ayi hy koi mane ya na mane.
11 | <U+062A><U+06CC><U+0627> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+062E><U+0628><U+0631><U+06D0> <U+06A9><U+0648><U+064A> @AbdullahMalikJ @a_baittani @SherazMmd @MianIftikharHus @anp_mohmand @MisbahUtmani @takkar1234 @anp_mohmand @AmirHaiderKH @MianIftikharHus @khaistak50 @SangeenKhan ANP ki koi history nahi hai
所需的数据框应如下所示:
text
1 | @a_siab @sardarbabak999 @BushraGohar @jafarshahmp @Palwasha_Abbas @Khadimhussain4 @Khushal_Khattak @SPOX_ANP @mjdawar @AsgharAchakzaii @AmirHaiderKH @MianIftikharHus @khaistak50 @SangeenKhan
2 | @KPKUpdates @ImranKhanPTI @jhagra @Shah_FarmanPTI @AsadQaiserPTI @MushtaqGhaniPTI @PervezKhattakCM @ziataj @AtifKhanpti It's much better than anp and mma time
3 | Easy load was started by PTI and not ANP #KaptaanFailedInKP
4 | @Palwasha_Abbas @Gulalai_Ismail This was much needed and people are happy with using it which avoided traffic issues. Your only issue is PTI buried ANP easyload shops forever now it’s obvious you will cry
5 | NA
6 | Tourism & Poor Condition of Swat Roads, Part-2 #Swat #Tourism #Kpk #CareTakerPM #NasarullMuik #MaryamNawaz #PMLN #PTI #ImranKhan #Newsonepk #ANP #Nadia @nadia_a_mirza #Pakistan
7 | NA
8 | <U+062F> <U+06A9><U+0644><U+064A> <U+0631><U+0648><U+063A> <U+062A><U+0631><U+06CC><U+0646><U+0647> <U+0686><U+0627><U+067E><U+06D0><U+0631><U+0647> <U+062F><U+064A> <U+0627><U+0648> <U+062E><U+0627><U+0646><U+062F><U+064A> <U+067E><U+0633><U+06D0> <U+06CC><U+0648> <U+0644><U+06CC><U+0648><U+0646><U+06CC> <U+0648><U+0631><U+062A><U+0647> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+062E><U+0628><U+0631><U+06D0> <U+06A9><U+0648><U+064A> @AbdullahMalikJ @a_baittani @SherazMmd @MianIftikharHus @anp_mohmand @MisbahUtmani @takkar1234 @anp_mohmand
9 | <U+0648> <U+0644><U+06CC><U+0648><U+0646><U+06CC> <U+0648><U+0631><U+062A><U+0647> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+062E><U+0628><U+0631><U+06D0> <U+06A9><U+0648><U+064A> @AbdullahMalikJ @a_baittani @SherazMmd @MianIftikharHus @anp_mohmand @MisbahUtmani @takkar1234 @anp_mohmand @AmirHaiderKH @MianIftikharHus @khaistak50 @SangeenKhan ANP is history
10 | NA
11 | NA
然后,我可以通过将数据帧转换为语料库来轻松删除。我只想要这个。如何获得这样的字典?使用字典可以吗?或者我应该使用分类器还是其他东西。请解释你的答案我该怎么办?感谢帮助!
解决方案
推荐阅读
- kotlin - 比较两个字符串并使用 diff 显示结果 - 这可以快速发现差异
- c - 将文件行存储在 char 双指针中的代码将同一行存储在所有元素中
- c# - '无法将'System.Security.Claims.ClaimsPrincipal'类型的对象转换为'TestApp.SecurityUser'类型。' 在 asp.net 核心中
- r - 将向量设置为月年 tsibble
- java - 如何限制某些传感器每隔几秒发送一次数据?
- c# - Admob 广告未在 Unity、C# 中显示
- linux - nohup 命令 & : 如何丢弃输出?
- c# - 为什么垃圾收集器不垃圾我的实例?
- sql-server - SELF JOIN 基于层次结构匹配行
- java - 前台服务类错误通知