首页 > 解决方案 > Python在输出每个结果之前打印csv列值而不重复

问题描述

我有一个 Python 脚本,它从名为 list.csv 的 CSV 导入 url 列表,将它们刮掉并输出在 csv 中的每个 url 上找到的任何锚文本和 href 链接:

(作为参考,csv中的url列表都在A列中)

from requests_html import HTMLSession
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
import pandas
import csv

contents = []
with open('list.csv','r') as csvf: # Open file in read mode
    urls = csv.reader(csvf)
    for url in urls:
        contents.append(url) # Add each url to list contents
    

for url in contents: 
    page = urlopen(url[0]).read()
    soup = BeautifulSoup(page, "lxml")

    for link in soup.find_all('a'):
        if len(link.text)>0:
            print(url, link.text, '-', link.get('href'))

输出结果如下所示,其中https://www.example.com/csv-url-one/https://www.example.com/csv-url-two/是 csv 中 A 列中的 url :

['https://www.example.com/csv-url-one/'] Creative - https://www.example.com/creative/
['https://www.example.com/csv-url-one/'] Web Design - https://www.example.com/web-design/
['https://www.example.com/csv-url-two/'] PPC - https://www.example.com/ppc/
['https://www.example.com/csv-url-two/'] SEO - https://www.example.com/seo/

问题是我希望输出结果看起来更像这样,即在每个结果之前不要重复打印 CSV 中的 url,并且在 CSV 的每一行之后都有一个中断:

['https://www.example.com/csv-url-one/'] 
Creative - https://www.example.com/creative/
Web Design - https://www.example.com/web-design/

['https://www.example.com/csv-url-two/'] 
PPC - https://www.example.com/ppc/
SEO - https://www.example.com/seo/

这可能吗?

谢谢

标签: pythonbeautifulsoup

解决方案


以下是否解决了您的问题?

for url in contents: 
    page = urlopen(url[0]).read()
    soup = BeautifulSoup(page, "lxml")
    print('\n','********',', '.join(url),'********','\n')
    for link in soup.find_all('a'):
        if len(link.text)>0:
            print(link.text, '-', link.get('href'))

推荐阅读