首页 > 解决方案 > Python字典删除键

问题描述

我从 Python 开始,现在卡住了。我必须从 .txt 中的长列表中获取关键的“文本”,例如:

{"delete":"status":"id":294512601600258048,"id_str":"294512601600258048","user_id":90681582,"user_id_str":"90681582"}, "timestamp_ms":"1410368494083"}}

{
    "created_at": "Wed Sep 10 17:01:33 +0000 2014",
    "id": 509748524897292288,
    "id_str": "509748524897292288",
    "text": "@Brenamae_ I WHALE SLAP YOUR FIN AND TELL YOU ONE LAST TIME: GO AWHALE",
    "source": "\u003ca href=\"http:\/\/twitter.com\/download\/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c\/a\u003e",
    "truncated": false,
    "in_reply_to_status_id": 509748106015948800,
    "in_reply_to_status_id_str": "509748106015948800",
    "in_reply_to_user_id": 242563886,
    "in_reply_to_user_id_str": "242563886",
    "in_reply_to_screen_name": "Brenamae_",
    "user": "id": 175160659,
    "id_str": "175160659",
    "name": "Butterfly",
    "screen_name": "VanessaLilyWan",
    "location": "Canada, Montreal",
    "url": "http:\/\/instagram.com\/vanessalilywan",
    "description": "British youtubers. 'Nuff said.",
    "protected": false,
    "verified": false,
    "followers_count": 118,
    "friends_count": 180,
    "listed_count": 2,
    "favourites_count": 319,
    "statuses_count": 10221,
    "created_at": "Thu Aug 05 20:03:16 +0000 2010",
    "utc_offset": -36000,
    "time_zone": "Hawaii",
    "geo_enabled": false,
    "lang": "en",
    "contributors_enabled": false,
    "is_translator": false,
    "profile_background_color": "B2DFDA",
    "profile_background_image_url": "http:\/\/abs.twimg.com\/images\/themes\/theme13\/bg.gif",
    "profile_background_image_url_https": "https:\/\/abs.twimg.com\/images\/themes\/theme13\/bg.gif",
    "profile_background_tile": false,
    "profile_link_color": "93A644",
    "profile_sidebar_border_color": "EEEEEE",
    "profile_sidebar_fill_color": "FFFFFF",
    "profile_text_color": "333333",
    "profile_use_background_image": true,
    "profile_image_url": "http:\/\/pbs.twimg.com\/profile_images\/470701406245376000\/2aXDrauR_normal.jpeg",
    "profile_image_url_https": "https:\/\/pbs.twimg.com\/profile_images\/470701406245376000\/2aXDrauR_normal.jpeg",
    "profile_banner_url": "https:\/\/pbs.twimg.com\/profile_banners\/175160659\/1404361640",
    "default_profile": false,
    "default_profile_image": false,
    "following": null,
    "follow_request_sent": null,
    "notifications": null
}, "geo": null, "coordinates": null, "place": null, "contributors": null, "retweet_count": 0, "favorite_count": 0, "entities": {
    "hashtags": [],
    "trends": [],
    "urls": [],
    "user_mentions": [{
        "screen_name": "Brenamae_",
        "name": "I-G-G-Bye",
        "id": 242563886,
        "id_str": "242563886",
        "indices": [0, 10]
    }],
    "symbols": ]
}, "favorited": false, "retweeted": false, "possibly_sensitive": false, "filter_level": "medium", "lang": "en", "timestamp_ms": "1410368493668"
}

所以我有两种钥匙,我能做到的是:

    import json
with open('salida_tweets.txt') as f:
    for line in f:
        texto=json.loads(line)
        objetos=texto.get('text')           
        print(objetos)

没有任何

@Brenamae_ 我鲸鱼拍了拍你的鳍,最后一次告诉你:去吧

但在打印中,第一个仍然显示为“无”,我需要干净的文本将其与另一个文件混合。

有人能帮我吗?

编辑:对不起,我没有澄清,我需要将第二行中包含的“文本”行分开。我需要它与包含多个单词和数字的文件混合。例如:

为了这

"text": "@Brenamae_ 我鲸鱼拍了你的鳍,最后一次告诉你:去吧"

我必须把它和

巴掌 -3 最后 -1

获得:1.Tweet -4

所以我可以得到每个“文本”的分数。

标签: pythonjsonkey

解决方案


The .get method returns None when searching for a key in a dictionary that doesn't exist, so rather than always printing objetos, you could check the return value from texto.get('text') first.

E.g.

import json

with open('salida_tweets.txt') as f:
    for line in f:
        texto = json.loads(line)
        objetos = texto.get('text')

        # implement logic
        if objetos:
            print(objetos)

That way, your code won't print if the text key does not exist.


推荐阅读