首页 > 解决方案 > 删除python 3.7中的特殊字符

问题描述

我一直在测试使用 python 翻录 url,我从 str 得到结果

itdUrlforrip.text 内容:http: //itdmusic.in/category/new-releases/page/4

完整的代码

#!/usr/bin/python
import requests
import re
import regex
from pyquery import PyQuery

#get each
link1 = open('/Users/R/Downloads/itdUrlforrip.txt','r').read()
list1 = link1.split('\n')
list2 = []
for eachlink1 in list1:
    linkSub1 = requests.get(eachlink1).text
    splitContent = linkSub1.split("Facebook")
    splitContent1 = splitContent[0]
    list2.append(splitContent1)

list2GLStr = ("\n".join(list2))
urlAll = regex.findall('itdmusic\.in\/\d\d\/.+\.html', list2GLStr)
allUrlrmDup1 = list(dict.fromkeys(urlAll))

#get list of url from input
allUrlrmDup1Ah = regex.sub('itdmusic', 'http://itdmusic', str(allUrlrmDup1))
allUrlrmDup1Ah2 = regex.sub('\'', '', str(allUrlrmDup1Ah))
allUrlrmDup1Ah3 = regex.sub('\[', '', str(allUrlrmDup1Ah2))
allUrlrmDup1Ah4 = regex.sub('\]', '', str(allUrlrmDup1Ah3))
allUrlrmDup1AhGL = ("\n".join(list(allUrlrmDup1Ah4.split(', '))))
allUrlrmDup1AhList = allUrlrmDup1AhGL.split('\n')

list3 = []
list4 = []
for eachlink2 in allUrlrmDup1AhList:
    linkSub2 = requests.get(eachlink2).text
    urlGdr = regex.findall('drive\.google\.com\/.{41}', linkSub2)
    urlOth = regex.findall('https\:\/\/www\d\d\d\.zippyshare\.com\/v.{19}|https\:\/\/www\d\d\.zippyshare\.com\/v.{19}|https\:\/\/www\d\.zippyshare\.com\/v.{19}|https?:\/\/douploads\.com\/.{12}|https?:\/\/www\.mirrored\.to\/.{14}|https?:\/\/mir\.cr\/.{8}|https?:\/\/hexupload\.net\/.{12}|https?:\/\/intoupload\.net\/.{12}|https?:\/\/www\.dropbox\.com\/s\/.{15}|https?:\/\/dbree\.org\/v\/.{6}|https?:\/\/dropapk\.to\/.{12}|https?:\/\/www\.sendspace\.com\/file\/.{6}|https?:\/\/gestyy\.com\/.{6}|https?:\/\/ouo\.io\/\w{6}|https?:\/\/mega\.nz.{55}|https?:\/\/bit\.ly.{8}', linkSub2)
    urlska = regex.findall('https?\:\/\/itdmusic\.in\/skipads\/.+\/\'', linkSub2)
    urlskaStr = str(urlska)
    urlska2 = regex.sub('\/\'', '', urlskaStr)
    list3.append(urlGdr)
    list3.append(urlOth)
    list4.append(urlska2)

然后我

print(list4)

结果是

'[]', '[]', '[]', '[]', '[]', '[]', '[]', '[]', '["http://itdmusic.in/skipads/2020/03/12/luke-bryan-one-margarita-pre-single"]', '["http://itdmusic.in/skipads/2020/03/12/kota-banks-italiana-single"]'

32s

那么有没有办法摆脱'[]'并在这里获取网址?我尝试了很多东西,但仍然无法弄清楚使用正则表达式和重新。在 xxx 中使用 for xxx 让我有点困惑。

标签: pythonregex

解决方案


事情是 regex.findall() 返回一个列表,并且您将其附加到另一个列表,因此您得到了“[]”。

您应该使用“list4.extend(urlska2)”而不是“list4.append(urlska2)”

这会给你你想要的。


推荐阅读