r - 使用 GmailR 提取所有电子邮件
问题描述
我正在尝试从我的 gmail 帐户中提取所有电子邮件以进行一些分析。最终目标是电子邮件数据框。我正在使用 gmailR 包。
到目前为止,我已经提取了所有电子邮件线程并通过将所有线程 ID 映射到 gm_thread() 来“扩展”它们。这是代码:
threads <- gm_threads(num_results = 5)
thread_ids <- gm_id(threads)
#extract all the thread ids
threads_expanded <- map(thread_ids, gm_thread)
这将返回所有线程的列表。它的结构是 gmail_thread 对象的列表。当您在线程对象列表中向下钻取一层时str(threads_expanded[[1]], max.level = 1)
,您会得到一个如下所示的单个线程对象:
List of 3
$ id : chr "xxxx"
$ historyId: chr "yyyy"
$ messages :List of 3
- attr(*, "class")= chr "gmail_thread"
然后,如果您进一步深入了解组成线程的消息,您就会开始获得有用的信息。str(threads_expanded[[1]]$messages, max.level = 1)
为您获取该线程的 gmail_message 对象列表:
List of 3
$ :List of 8
..- attr(*, "class")= chr "gmail_message"
$ :List of 8
..- attr(*, "class")= chr "gmail_message"
$ :List of 8
..- attr(*, "class")= chr "gmail_message"
我卡住的地方实际上是从所有线程中的每封电子邮件中提取所有有用的信息。最终目标是一个数据框,其中包含 message_id、thread_id、to、from 等列。我在想象这样的事情:
message_id | thread_id | to | from | ... |
-------------------------------------------------------------------------
1234 | abcd | me@gmail.com | pam@gmail.com | ... |
1235 | abcd | pam@gmail.com | me@gmail.com | ... |
1236 | abcf | me@gmail.com | tim@gmail.com | ... |
解决方案
这不是最漂亮的答案,但它有效。稍后我将致力于对其进行矢量化:
threads <- gm_threads(num_results = 5)
thread_ids <- gm_id(threads)
#extract all the thread ids
threads_expanded <- map(thread_ids, gm_thread)
msgs <- vector()
for(i in (1:length(threads_expanded))){
msgs <- append(msgs, values = threads_expanded[[i]]$messages)
}
#extract all the individual messages from each thread
msg_ids <- unlist(map(msgs, gm_id))
#get the message id for each message
msg_body <- vector()
#get message body, store in vector
for(msg in msgs){
body <- gm_body(msg)
attchmnt <- nrow(gm_attachments(msg))
if(length(body) != 0 && attchmnt == 0){
#does not return a null value, rather an empty list or list
of length 0, so if,
#body is not 0 (there is something there) and there are no attachemts,
#add it to vector
msg_body <- append(msg_body, body)
#if there is no to info, fill that spot with an empty space
}
else{
msg_body <- append(msg_body, "")
#if there is no attachment but the body is also empty add "" to the list
}
}
msg_body <- unlist(msg_body)
msg_datetime <- msgs %>%
map(gm_date) %>%
unlist()%>%
dmy_hms()
#get datetime info, store in vector
message_df <- tibble(msg_ids, msg_datetime, msg_body)
#all the other possible categories, e.g., to, from, cc, subject, etc.,
#either use a similar for loop or a map call
推荐阅读
- javascript - 如何使此功能异步?
- php - 如何比较laravel中的两个日期
- php - api.php Laravel 中的“未定义类型‘App’”
- reactjs - 如何从 Keycloak 获取用户 ID 和用户 UUID/GUID?
- react-native - 如何使用 Jest React Native 使用 testing-library/react-native 模拟 RNEncryptedStorage
- r - 使用 scale_fill_stepsn() 进行标记,“中断和标签的长度不同”
- java - 点击一个RecyclerView项,startActivityForResult发送数据,但是返回的Intent没有数据
- single-sign-on - 访问论坛时 DiscourseConnect 自动登录
- python - python requirements.txt - 意外的依赖项
- sql - SSRS SQL 问题:每页行号,其中小计为行之一