首页 > 解决方案 > 使用字典过滤出所需的推文

问题描述

我试图过滤掉推文但卡住了。我想要一本包含所有英语单词以及我添加的单词的字典,并通过使用它来过滤掉我的推文。因为我有一个包含推文的数据框,并且文本列是:

        text
1 |  @a_siab @sardarbabak999 @BushraGohar @jafarshahmp @Palwasha_Abbas @Khadimhussain4 @Khushal_Khattak @SPOX_ANP @mjdawar @AsgharAchakzaii @AmirHaiderKH @MianIftikharHus @khaistak50 @SangeenKhan 
2 |  @KPKUpdates @ImranKhanPTI @jhagra @Shah_FarmanPTI @AsadQaiserPTI @MushtaqGhaniPTI @PervezKhattakCM @ziataj @AtifKhanpti It's much better than anp and mma time
3 |  Easy load was started by PTI and not ANP #KaptaanFailedInKP
4 |  @Palwasha_Abbas @Gulalai_Ismail This was much needed and people are happy with using it which avoided traffic issues. Your only issue is PTI buried ANP easyload shops forever now it’s obvious you will cry
5 |  @x_anp <U+304A><U+3081><U+3067><U+3068><U+3046>!!
6 |  Tourism &amp; Poor Condition of Swat Roads, Part-2 #Swat #Tourism #Kpk #CareTakerPM #NasarullMuik #MaryamNawaz #PMLN #PTI #ImranKhan #Newsonepk #ANP #Nadia @nadia_a_mirza #Pakistan 
7 |  @Palwasha_Abbas Kuch samjhayein in ko... Articles likhnay say character thek nhe hotay... ANP is history. @BushraGohar
8 |  <U+062F> <U+06A9><U+0644><U+064A> <U+0631><U+0648><U+063A> <U+062A><U+0631><U+06CC><U+0646><U+0647> <U+0686><U+0627><U+067E><U+06D0><U+0631><U+0647> <U+062F><U+064A> <U+0627><U+0648> <U+062E><U+0627><U+0646><U+062F><U+064A> <U+067E><U+0633><U+06D0> <U+06CC><U+0648> <U+0644><U+06CC><U+0648><U+0646><U+06CC> <U+0648><U+0631><U+062A><U+0647> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+062E><U+0628><U+0631><U+06D0> <U+06A9><U+0648><U+064A> @AbdullahMalikJ @a_baittani @SherazMmd @MianIftikharHus @anp_mohmand @MisbahUtmani @takkar1234 @anp_mohmand
9 |  <U+0648> <U+0644><U+06CC><U+0648><U+0646><U+06CC> <U+0648><U+0631><U+062A><U+0647> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+062E><U+0628><U+0631><U+06D0> <U+06A9><U+0648><U+064A> @AbdullahMalikJ @a_baittani @SherazMmd @MianIftikharHus @anp_mohmand @MisbahUtmani @takkar1234 @anp_mohmand @AmirHaiderKH @MianIftikharHus @khaistak50 @SangeenKhan ANP is history
10 |  @MianIftikharHus Pti ki govt bilkul bhi ideal nhi ti magar mazrat k sath anp or mma ki pechli govt se pti ki govt kafi behtr ti .hospitals or schools me tabdeeli ayi hy koi mane ya na mane.
11 |  <U+062A><U+06CC><U+0627> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+062E><U+0628><U+0631><U+06D0> <U+06A9><U+0648><U+064A> @AbdullahMalikJ @a_baittani @SherazMmd @MianIftikharHus @anp_mohmand @MisbahUtmani @takkar1234 @anp_mohmand @AmirHaiderKH @MianIftikharHus @khaistak50 @SangeenKhan ANP ki koi history nahi hai

所需的数据框应如下所示:

        text
1 |  @a_siab @sardarbabak999 @BushraGohar @jafarshahmp @Palwasha_Abbas @Khadimhussain4 @Khushal_Khattak @SPOX_ANP @mjdawar @AsgharAchakzaii @AmirHaiderKH @MianIftikharHus @khaistak50 @SangeenKhan 
2 |  @KPKUpdates @ImranKhanPTI @jhagra @Shah_FarmanPTI @AsadQaiserPTI @MushtaqGhaniPTI @PervezKhattakCM @ziataj @AtifKhanpti It's much better than anp and mma time
3 |  Easy load was started by PTI and not ANP #KaptaanFailedInKP
4 |  @Palwasha_Abbas @Gulalai_Ismail This was much needed and people are happy with using it which avoided traffic issues. Your only issue is PTI buried ANP easyload shops forever now it’s obvious you will cry
5 |  NA
6 |  Tourism &amp; Poor Condition of Swat Roads, Part-2 #Swat #Tourism #Kpk #CareTakerPM #NasarullMuik #MaryamNawaz #PMLN #PTI #ImranKhan #Newsonepk #ANP #Nadia @nadia_a_mirza #Pakistan 
7 |  NA
8 |  <U+062F> <U+06A9><U+0644><U+064A> <U+0631><U+0648><U+063A> <U+062A><U+0631><U+06CC><U+0646><U+0647> <U+0686><U+0627><U+067E><U+06D0><U+0631><U+0647> <U+062F><U+064A> <U+0627><U+0648> <U+062E><U+0627><U+0646><U+062F><U+064A> <U+067E><U+0633><U+06D0> <U+06CC><U+0648> <U+0644><U+06CC><U+0648><U+0646><U+06CC> <U+0648><U+0631><U+062A><U+0647> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+062E><U+0628><U+0631><U+06D0> <U+06A9><U+0648><U+064A> @AbdullahMalikJ @a_baittani @SherazMmd @MianIftikharHus @anp_mohmand @MisbahUtmani @takkar1234 @anp_mohmand
9 |  <U+0648> <U+0644><U+06CC><U+0648><U+0646><U+06CC> <U+0648><U+0631><U+062A><U+0647> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+062E><U+0628><U+0631><U+06D0> <U+06A9><U+0648><U+064A> @AbdullahMalikJ @a_baittani @SherazMmd @MianIftikharHus @anp_mohmand @MisbahUtmani @takkar1234 @anp_mohmand @AmirHaiderKH @MianIftikharHus @khaistak50 @SangeenKhan ANP is history
10 |  NA
11 |  NA

然后,我可以通过将数据帧转换为语料库来轻松删除。我只想要这个。如何获得这样的字典?使用字典可以吗?或者我应该使用分类器还是其他东西。请解释你的答案我该怎么办?感谢帮助!

标签: rdictionaryfiltertweets

解决方案


推荐阅读