首页 > 解决方案 > 如何查找包含流行表情符号的推文

问题描述

我正在做一个使用 R 进行情绪分析的项目。我正在尝试收集使用一些最受欢迎的表情符号的推文。如何通过表情符号收集推文?

#devtools::install_github("dill/emoGG")
library(emoGG)   # source of the "emoji_search" function
library(twitteR) # source of the "searchTwitter" and "twListToDF" functions

emoji_search("BALLOON")

emoji <- searchTwitter("BALLOON")
emoji
emojidf <- twListToDF(emoji)

标签: r

解决方案


经过一些谷歌搜索和实验,我了解到表情符号以一种令人困惑的方式编码在推文中(至少对我而言)。

一种快捷方式是使用像Kate Lyons的表情符号字典来搜索表情符号。关于她如何编译它的更多背景信息

这为我们提供了一种更直接的方式来搜索带有表情符号的推文。例如,如果我们查找以下字符串,字典显示我们可以查找“气球”表情符号:

<ed><a0><bc><ed><be><88>

我比较熟悉rtweet,这里是搜索气球表情符号的样子:

[编辑:我不确定这是否正常工作。这些看起来都是非英语推文,可能没有气球表情符号......] :-(

> rtweet::search_tweets("<ed><a0><bc><ed><be><88>")
# A tibble: 16 x 90
   user_id status_id created_at          screen_name text  source display_text_wi… reply_to_status… reply_to_user_id
   <chr>   <chr>     <dttm>              <chr>       <chr> <chr>             <dbl> <chr>            <chr>           
 1 111373… 11429734… 2019-06-24 01:51:30 SPR1NGD4Y_  " 엠… Twitt…              154 NA               NA              
 2 100224… 11428523… 2019-06-23 17:50:11 quark_kim   "탐라에… Twitt…              140 NA               NA              
 3 109648… 11428194… 2019-06-23 15:39:14 _4CC1D3N7_… "부장 … Twitt…              127 114281934863914… 109648624150199…
 4 113448… 11428090… 2019-06-23 14:58:01 MAX_commu   "자캐앤… Twitt…              140 NA               NA              
 5 819116… 11428062… 2019-06-23 14:46:46 jinimwoo    "자캐앤… Twitt…              140 NA               NA              
 6 103612… 11428013… 2019-06-23 14:27:27 00gY0       "자캐앤… Twitt…              140 NA               NA              
 7 107972… 11428003… 2019-06-23 14:23:32 YN_DGY      "탐라에… Twitt…              140 NA               NA              
 8 111199… 11427952… 2019-06-23 14:03:19 coffee_101… "탐라에… Twitt…              140 NA               NA              
 9 967054… 11427941… 2019-06-23 13:58:57 mphp0001    "탐라에… Twitt…              140 NA               NA              
10 928447… 11426751… 2019-06-23 06:06:06 yangE___    "탐라에… Twitt…              140 NA               NA              
11 836222… 11426745… 2019-06-23 06:03:32 sunseul_ma… "탐라에… Twitt…              140 NA               NA              
12 110802… 11426637… 2019-06-23 05:20:51 4th_month__ "탐라에… Twitt…              140 NA               NA              
13 113990… 11413476… 2019-06-19 14:10:47 Dream_Merr… "공지 … Twitt…               62 NA               NA              
14 777381… 11409418… 2019-06-18 11:18:24 mi_se2      "@Me… Twitt…              140 NA               NA              
15 330242… 11408761… 2019-06-18 06:57:35 lip_ran     "@Me… Twitt…              140 NA               NA              
16 113519… 11408687… 2019-06-18 06:27:56 barruwach   "@Me… Twitt…              140 NA               NA              
# … with 81 more variables: reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>, favorite_count <int>,
#   retweet_count <int>, quote_count <int>, reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
#   urls_t.co <list>, urls_expanded_url <list>, media_url <list>, media_t.co <list>, media_expanded_url <list>,
#   media_type <list>, ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>, ext_media_type <chr>,
#   mentions_user_id <list>, mentions_screen_name <list>, lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
#   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>, quoted_retweet_count <int>,
#   quoted_user_id <chr>, quoted_screen_name <chr>, quoted_name <chr>, quoted_followers_count <int>,
#   quoted_friends_count <int>, quoted_statuses_count <int>, quoted_location <chr>, quoted_description <chr>,
#   quoted_verified <lgl>, retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>, retweet_source <chr>,
#   retweet_favorite_count <int>, retweet_retweet_count <int>, retweet_user_id <chr>, retweet_screen_name <chr>,
#   retweet_name <chr>, retweet_followers_count <int>, retweet_friends_count <int>, retweet_statuses_count <int>,
#   retweet_location <chr>, retweet_description <chr>, retweet_verified <lgl>, place_url <chr>, place_name <chr>,
#   place_full_name <chr>, place_type <chr>, country <chr>, country_code <chr>, geo_coords <list>, coords_coords <list>,
#   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>, description <chr>, url <chr>, protected <lgl>,
#   followers_count <int>, friends_count <int>, listed_count <int>, statuses_count <int>, favourites_count <int>,
#   account_created_at <dttm>, verified <lgl>, profile_url <chr>, profile_expanded_url <chr>, account_lang <chr>,
#   profile_banner_url <chr>, profile_background_url <chr>, profile_image_url <chr>

推荐阅读