首页 > 解决方案 > 从.txt文件python中提取主题标签

问题描述

所以我启动了一个 TikTok 工具进行数据分析,但我无法从保存的 .txt 文件中提取主题标签。这就是我所做的:

from tiktok_bot import TikTokBot  # TikTok API
import csv
import os 
import sys
import re # attempt to use findall, but it didn't work

try:
     os.mkdir("./data") . # Creating data folder
except OSError as e:
   print("Directory exists")


def getData(): # date in file name
    return datetime.datetime.now().strftime ("%Y-%m-%d")

def buildFileName(type): # building .csv name
    return ("./data/") + getData() + (type) + ".csv"

def buildText(type): # building .txt name
    return ("./data/") + getData() + (type) + ".txt"

with open(buildFileName("_shares"), mode='a') as csv_file:   # writing .csv file
    fieldnames = ['User ID', 'URL', 'Description', 'Comments', 'Likes']
    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
    writer.writeheader()

    for post in most_shared_posts:
        print(str(post.author_user_id) , str(post.share_url) , str(post.desc) , post.statistics.comment_count , post.statistics.digg_count)
        writer.writerow({'User ID': str(post.author_user_id), 'URL': str(post.share_url), 'Description': str(post.desc), 'Comments': post.statistics.comment_count, 'Likes': post.statistics.digg_count})

with open(buildFileName("_shares"), mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file, delimiter=',')
    for lines in csv_reader:
      print(lines['Description'])     # save .csv
sys.stdout = open(buildText("_shares"), "w") . # .csv saved into .txt
print (lines['Description'])

我现在可以做些什么来从 .txt 文件中打印的描述中提取主题标签?注意:描述是由 .txt 和标签组成的,所以基本上我认为是一个字符串。

标签: pythonregexpython-3.x

解决方案


你可以做


import re
m = re.findall(r'#(\w+)', lines['Description'])
print(m)

推荐阅读