python - Twitter Data:有没有办法根据条件进行拆分?
问题描述
代码截图
#Date Bool
def isDate(string):
elem = []
splits = string.split()
for element in splits:
elem.append(element)
if len(elem) > 5:
return True if elem[2].isdigit() else False
else:
return False
#LOAD HANDLER
def loader(file):
lines = []
with open(file,encoding='utf8') as f:
for line in f:
lines.append(line)
return lines
class define:
def __init__(self, date, token, tweet):
self.date = date
self.token = token
self.tweet = tweet
数据截图
免责声明:这些推文是公共信息。这纯粹是教育研究,不反映该机构或其内部人员的任何形象
Tue Feb 04 12:36:05 EST 2020|@WishYouWereMe__|RT @coriyonmarie: I’ll never forget how somebody did me.
Tue Feb 04 12:36:05 EST 2020|@c1Leonn|RT @nxlimaa: WHY am i incapable of doing natural makeup?????? why does everything always escalate ?????????
Tue Feb 04 12:36:05 EST 2020|@Oootentog|@staydilated13 Thank youuuu! ♥️
Tue Feb 04 12:36:05 EST 2020|@SushreeRonali|@GautamGambhir Jai Hind
Tue Feb 04 12:36:05 EST 2020|@Tank9trACE|4 months old at that
Tue Feb 04 12:36:05 EST 2020|@mathewpoptartm|RT @Flashyasf: Aye be careful who you catch feelings for, Shit don't be real onna other side
Tue Feb 04 12:36:05 EST 2020|@wakemeup0320|RT @NookNickn_r: Good night na~ ❤️ [LINK]
Tue Feb 04 12:36:05 EST 2020|@AkanniTheKing|@KiKardashiann We Got You
Tue Feb 04 12:36:05 EST 2020|@nuggythebear|@MarcusRashford Sheryar is a strong Mancunian name. Heralds back to the Sheryars of the 1700's.
Tue Feb 04 12:36:05 EST 2020|@Iam_Adrii|RT @iRealPedro: PUBLIC @TANNEDja ANNOUNCEMENT
The Road Marshall speaks ‼️⚠️‼️ [LINK]
Tue Feb 04 12:36:05 EST 2020|@blushkths|how much do i need to pay for jungkook to step on my neck
理论
所以我的想法是根据该行的第一个元素是否是日期进行拆分,并且函数 isdate() 执行此操作,但我不确定如何将前一个元素附加到当前元素以便加入项目?不知道这是多么容易理解,但我试图说明它:
Tue Feb 04 12:36:05 EST 2020|@Iam_Adrii|RT @iRealPedro: PUBLIC @TANNEDja ANNOUNCEMENT
The Road Marshall speaks ‼️⚠️‼️ [LINK]
所以在这个片段中,我们看到推文有多行,我需要一种方法将这两行连接在一起,以便我可以对其进行操作。因此,如果加入,这将类似于:
['Tue Feb 04 12:36:05 EST 2020|@Iam_Adrii|RT @iRealPedro: PUBLIC @TANNEDja ANNOUNCEMENT The Road Marshall speaks ‼️⚠️‼️ [LINK]']
没有 \n 或类似的,所以我不确定如何继续。最终,我会将其放入字典中,但我需要先弄清楚基本原理。
解决方案
我建议首先像这样重写你的函数:
def isDate(string):
splits = string.split(maxsplit=3)
return len(splits) > 3 and splits[2].isdigit()
然后以这种方式使用它:
def loader(file):
lines = []
with open(file,encoding='utf8') as f:
for line_with_newline in f:
line = line_with_newline.rstrip()
if isDate(line):
lines.append(line)
else:
lines[-1] += line
return lines
推荐阅读
- javascript - JavaScript 中的上一个问题按钮和保存用户答案
- javascript - 运行“exec('npm version minor')”和“exec(`git push --delete origin)”不断询问用户名和密码
- javascript - 材料表复选框 [React.js]
- java - 如何通过@Aspect 为字段设置值?
- ios - 在可见区域仅显示字符串的最后一部分。iOS平台问题
- html - 输入'FormGroup | null' 不可分配给类型 'FormGroup'。类型“null”不能分配给类型“FormGroup”。角 11
- r - R中的ggvis:鼠标悬停时更改颜色
- python - Python 在可选数量的列表中找到相同的元素
- c# - 方法参数错误中的可选/默认参数和参数
- nginx - 在 Docker Swarm 集群中自动重新加载 Nginx 配置