首页 > 解决方案 > 如何在 Pandas、python 中搜索 csv 中的特定文本

问题描述

您好我想在标题栏中找到帐户文本@,并将其保存在新的csv中。熊猫可以做到,我尝试过,但没有成功。这是我的 csv http://www.sharecsv.com/s/c1ed9790f481a8d452049be439f4e3d8/Newnormal.csv

这是我的代码:

import pandas as pd 

  
data = pd.read_csv("Newnormal.csv")
data.dropna(inplace = True) 

  

sub ='@'
  
data["Indexes"]= data["title"].str.find(sub)
print(data)

我想要这样的结果

从,到,标题 Xavier5501,KudiiThaufeeq,RT @KudiiThaufeeq:皇家强奸,皇家骚扰,皇家鸡尾酒会,皇家佩多,皇家竞标,皇家 Maalee Bayaan,皇家奴隶制..et

谢谢你。

标签: python-3.xpandascsv

解决方案


  1. 将记录减少到仅标题中带有“@”的记录
  2. 定义新列,它是“@”和“:”之间的文本
  3. 你留下了一些记录,这些记录将 NaN 留在列中。我刚刚过滤掉了这些
df = pd.read_csv("Newnormal.csv")
df = df[df["title"].str.contains("@")==True]
df["to"] = df["title"].str.extract(r".*([@][A-Z,a-z,0-9,_]+[:])")
df = df[["from","to","title"]]
df[~df["to"].isna()].to_csv("ToNewNormal.csv", index=False)
df[~df["to"].isna()]

输出

    from    to  title
1   Xavier5501  @KudiiThaufeeq: RT @KudiiThaufeeq: Royal Rape, Royal Harassmen...
2   Suzane24979006  @USAID_NISHTHA: RT @USAID_NISHTHA: Don't step outside your hou...
3   sandeep_sprabhu @USAID_NISHTHA: RT @USAID_NISHTHA: Don't step outside your hou...
4   oliLince    @Timothy_Hughes:    RT @Timothy_Hughes: How to Get a Salesforce Th...
7   rismadwip   @danielepermana:    RT @danielepermana: Pak kasus covid per hari s...
... ... ... ...
992 Reptoid_Hunter  @sapiofoxy: RT @sapiofoxy: I literally can't believe we ha...
994 KPCResearch @sapiofoxy: RT @sapiofoxy: I literally can't believe we ha...
995 GreySparkUK @VoxSmartGlobal:    RT @VoxSmartGlobal: The #newnormal will see mo...
997 Gabboa10    @HuShameem: RT @HuShameem: One of @PGO_MV admin staff test...
999 wanjirunjendu   @ntvkenya:  RT @ntvkenya: AAK's Mugure Njendu shares insig...


推荐阅读