首页 > 解决方案 > 从包含多个 URL 的行中提取一个 URL

问题描述

我正在尝试从列出多个 URL 的行中提取 URL。

具体来说,我想twitter.com/dog_rates/xxxxxxx从行中选择第一个实例并删除剩余的数据。

需要提取的文本示例

输入

1. twitter.com/dog_rates/status/892420643555336193/photo/1 (desired version)

2. www.gofundme.com/3yd6y1c,twitter.com/dog_rates/status/878281511006478336/photo/1

3. m.facebook.com/story.php?story_fbid=1888712391349242&id=1506300642923754&refsrc=ht.co%2FURVffYPPjY&_rdr,twitter.com/dog_rates/status/812503143955202048/photo/1,twitter.com/dog_rates/status/812503143955202048/photo/1

4. www.gofundme.com/sams-smile,twitter.com/dog_rates/status/810984652412424192/photo/1,twitter.com/dog_rates/status/709901256215666688/photo/1,twitter.com/dog_rates/status/709901256215666688/photo/1,twitter.com/dog_rates/status/709901256215666688/photo/1,twitter.com/dog_rates/status/709901256215666688/photo/1

5. twitter.com/dog_rates/status/888804989199671297/photo/1,twitter.com/dog_rates/status/888804989199671297/photo/1

我尝试使用切片提取 URL,但遇到了多个 URL 长度和分隔符位置不同的问题。

预期成绩

  1. twitter.com/dog_rates/status/892420643555336193/photo/1

  2. twitter.com/dog_rates/status/878281511006478336/photo/1

  3. twitter.com/dog_rates/status/812503143955202048/photo/1

  4. twitter.com/dog_rates/status/810984652412424192/photo/1

  5. twitter.com/dog_rates/status/888804989199671297/photo/1

标签: pythonstringpandasextract

解决方案


试试这个,

import pandas as pd

data = [
    'twitter.com/dog_rates/status/892420643555336193/photo/1',         
    'www.gofundme.com/3yd6y1c,twitter.com/dog_rates/status/878281511006478336/photo/1',
    'm.facebook.com/story.php?story_fbid=1888712391349242&id=1506300642923754&refsrc=ht.co%2FURVffYPPjY&_rdr,twitter.com/dog_rates/status/812503143955202048/photo/1,twitter.com/dog_rates/status/812503143955202048/photo/1',
    'www.gofundme.com/sams-smile,twitter.com/dog_rates/status/810984652412424192/photo/1,twitter.com/dog_rates/status/709901256215666688/photo/1,twitter.com/dog_rates/status/709901256215666688/photo/1,twitter.com/dog_rates/status/709901256215666688/photo/1,twitter.com/dog_rates/status/709901256215666688/photo/1',
    'twitter.com/dog_rates/status/888804989199671297/photo/1,twitter.com/dog_rates/status/888804989199671297/photo/1'
]

df=pd.DataFrame({'url':data})
df['res'] = df['url'].str.split(',').str[-1]

只需提取最后一个值,


推荐阅读