python - 如何避免 Python 3 中的连接错误
问题描述
我遇到了串联问题。
我试图提取(商业名称)+(电话号码)+(地址)+(网站 url),虽然前 3 个元素一切正常,但我遇到了“网站 url”的问题。
事实上,当我将内容提取到文本文件中时,所有网站的 url 都直接显示在顶部,并且在此处输入的图像描述与正确的业务不匹配。当我打印到命令提示符时,所有内容都与正确的业务相匹配。
很难解释......所以我附上了两张截图(在下面的链接中)。在 excel 文档中,用红色下划线可以看到 url 不在正确的位置,应该在下面。
这是我进行连接的方式:
try:
print("list if contains websites")
for i in range(0, min(len(freeNames),len(fullPhones),len(fullStreets),len(fullWebsites))):
c = ' ~ ' + freeNames[i] + ' ~ ' + fullPhones[i] + ' ~ ' + fullStreets[i] + ' ~ ' + fullWebsites[i] + ' ~ '
contents.append(c)
print(c)
trustedprotxtfile.write(c + '\n')
except Exception as e:
print(e)
pass
try:
print("list if no websites")
for i in range(min(len(freeNames),len(fullPhones),len(fullStreets),len(fullWebsites)), max(len(freeNames),len(fullPhones),len(fullStreets))):
c = ' ~ ' + freeNames[i] + ' ~ ' + fullPhones[i] + ' ~ ' + fullStreets[i] + ' ~ '
contents.append(c)
print(c)
trustedprotxtfile.write(c + '\n')
except Exception as e:
print(e)
pass
你知道如何解决这个问题吗?
非常感谢你的帮助。
解决方案
[回答山姆·梅森]
这是我使用的完整代码:
这是导入库的列表:(re, selenium,lxml,urllib3,numpy,beautifulSoup)
browser = webdriver.Chrome("/Users/gdeange1/dev/chromedriver")
trustedprotxtfile = open("/Users/gdeange1/Dev/trustedpros/test.txt", "w+", encoding='utf-8')
链接 = ['ns/halifax',]
对于链接中的 l:链接 =“ https://trustedpros.ca/ ” + l
driver = browser.get("https://trustedpros.ca/" + l)
page0 = requests.get(link)
soup0 = bs(page0.content, "lxml")
nextpages = soup0.findAll('div', attrs={'class': 'paging-sec'})
pagination = []
if nextpages:
for ul in nextpages:
for li in ul.find_all('li'):
liText = li.text
if liText != '-':
pagination.append(int(liText))
maxpagination = max(pagination)
freeNames = []
fullPhones = []
fullStreets = []
fullWebsites = []
i = 0
while i < maxpagination:
time.sleep(1)
i += 1
try:
inputElement = browser.find_elements_by_xpath('//*[@id="final-search"]/div/div[1]/div[2]/a')
allLinksTim = [];
for url in inputElement:
allLinksTim.append(url.get_attribute("href"))
except:
pass
for eachLink in allLinksTim:
driver = browser.get(eachLink)
page = requests.get(eachLink)
tree = html.fromstring(page.content)
soup = bs(page.content, "lxml")
try:
namess = browser.find_elements_by_class_name('name-alt')
if len(namess) > 0:
for name in namess:
freeNames.append(name.text)
print(name.text)
else:
names = browser.find_elements_by_class_name('name-altimg')
for names1 in names:
freeNames.append(names1.text)
print(names1.text)
except:
print("Error while trying to get the names")
pass
try:
phones = browser.find_elements_by_class_name('taptel')
if phones:
for phone in phones:
fullPhones.append(phone.text)
print(phone.text)
else:
print("No phones found")
except:
print('Error while trying to get the phones')
pass
try:
streets = browser.find_elements_by_class_name('address')
if streets:
for street in streets:
fullStreets.append(street.text)
print(street.text)
else:
print("No street address found")
except:
print('Error while trying to get the streets')
pass
try:
websites = soup.findAll('div', attrs={'class': 'contact-prom'})
#print('Entered the Div!')
if websites:
for div in websites:
for url in div.find_all('a'):
if url.has_attr('target'):
fullWebsites.append(url['href'])
print(url['href'])
else:
print("No websites found")
except:
print('Error while trying to get the websites')
pass
browser.back()
inputElement = browser.find_element_by_class_name('next-page')
inputElement.click()
contents = []
print("Size of free names: ", len(freeNames))
print("Size of full phones: ", len(fullPhones))
print("Size of full streets: ", len(fullStreets))
print("Size of full websites: ", len(fullWebsites))
try:
print("list with everything")
for i in range(min(len(freeNames),len(fullPhones),len(fullStreets),len(fullWebsites))):
c = ' ~ ' + freeNames[i] + ' ~ ' + fullPhones[i] + ' ~ ' + fullStreets[i] + ' ~ ' + fullWebsites[i] + ' ~ '
contents.append(c)
print(c)
trustedprotxtfile.write(c + '\n')
except:
print('not working 1')
pass
try:
print("list without websites")
for i in range(min(len(freeNames),len(fullPhones),len(fullStreets),len(fullWebsites)), max(len(freeNames),len(fullPhones),len(fullStreets))):
c = ' ~ ' + freeNames[i] + ' ~ ' + fullPhones[i] + ' ~ ' + fullStreets[i] + ' ~ '
contents.append(c)
print(c)
trustedprotxtfile.write(c + '\n')
except:
print('not working')
pass
print ('[抓取结束,感谢等待!]')trustedprotxtfile.close()
推荐阅读
- python - 在 1D NumPy 数组中查找值的索引/位置(具有相同的值)
- json - 如何在 kotlin 中编辑嵌套的 JSON
- python - 在 python 函数中运行 webdriver 的问题
- angular - 如何在angular7中的单个formControl中添加多种输入类型?
- regex - 使用正则表达式从字符串中删除数据
- javascript - 如何从firebase角度查询嵌套的孩子?
- android - 在 andoride 中使用 python 和 kivy
- javascript - 平面 JSON 展开为具有多个父级的层次结构作为字符串
- powershell - Powershell Resolve-DnsName 到变量
- html - 如何防止在 CSS 图像过渡期间在网格布局中悬停时调整图像大小