首页 > 解决方案 > 解析存储在 CSV 文件中的链接

问题描述

我正在尝试解析存储在我的 csv 文件中的链接,然后为每个链接打印标题。当我尝试读取链接并进行解析以获取每个链接的标题时,我在代码底部遇到了一些问题。

import csv
from bs4 import BeautifulSoup
from urllib.request import urlopen

contents = []

filename = 'scrap.csv'

with open(filename,'rt') as f:
    data = csv.reader(f)

    for row  in data:
        links = row[0]
        contents.append(links) #add each url to list of contents

for links in contents: #parse through each url in the list contents
    url = urlopen(links[0].read())
    soup = BeautifulSoup(url,"html.parser")

for title in soup.find_all('title'):
    print(title)

我希望输出是打印的每一行中的标题,但我在 url = urlopen(links[0].read()) 中出现以下错误第 17 行 AttributeError: 'str' object has no attribute 'read'

标签: pythoncsvscreen-scraping

解决方案


import csv
from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests

contents = []

def soup_title():
    for title in soup.find_all('title'):
        title_name = title
        return title_name

filename = 'scrap.csv'

with open(filename,'rt') as f:
    data = csv.reader(f)

    for row  in data:
        links = row[0]
        contents.append(links) #add each url to list of contents

for links in contents: #parse through each url in the list contents
     url = requests.get(links)
     soup = BeautifulSoup(url.text,"html.parser")
     brand_info = soup_title()
     print(brand_info)

推荐阅读