首页 > 解决方案 > 如何在for循环下打开多个TXT文件并为每个文件分配一个名称

问题描述

我试图首先抓取td包含不同工作名称(带有链接)的名称。我想将我将再次从那些“td”链接(来自他们网页的相应工作的数据)中刮取的数据保存在不同的 txt 文件中。我希望将每个网页的抓取数据分别保存在不同的 txt 文件中。我可以这样做吗?如果您对此有所了解,请分享您的想法!!

import requests
from bs4 import BeautifulSoup

main = "https://deltaimmigration.com.au/Australia-jobs/"

def First():
    r = requests.get(main)
    soup = BeautifulSoup(r.text, 'html5lib')
    links = []
    with open("links.txt", 'w', newline="", encoding="UTF-8") as f:
        for item in soup.findAll("td", {'width': '250'}):
            item = item.contents[1].get("href")[3:]
            item = f"https://deltaimmigration.com.au/{item}"
            f.write(item+"\n")
            links.append(item)
    print(f"We Have Collected {len(links)} urls")
    return links

def Second():
    links = First() 
    with requests.Session() as req:
        for link in links:
            print(f"Extracting {link}")
            r = req.get(link,timeout = 100)
            soup = BeautifulSoup(r.text, 'html5lib')
            for item in soup.findAll("table", {'width': '900'}):
                return item

def Third():
    r = requests.get(main)
    soup = BeautifulSoup(r.text, 'html5lib')
    result = Second()
    for item in soup.findAll("td", {'width': '250'}):
        with open(item.text + '.txt', 'w', newline="", encoding="UTF-8") as f:
            f.write('result')           

Third()       

我尝试了以下方法:

with open(item.text + '.txt', 'w', newline="", encoding="UTF-8") as f:

但我收到错误

File "e:/test/check.py", line 10, in Third with open(item.text + '.txt', 'w', newline="", encoding="UTF-8") as f: FileNotFoundError: [Errno 2] No such file or directory: ' Vegetable Grower (Aus)/market Gardener (NZ).txt'"

标签: pythonbeautifulsoup

解决方案


import requests
from bs4 import BeautifulSoup

main = "https://deltaimmigration.com.au/Australia-jobs/"


def First():
    r = requests.get(main)
    soup = BeautifulSoup(r.text, 'html5lib')
    links = []
    names = []
    with open("links.txt", 'w', newline="", encoding="UTF-8") as f:
        for item in soup.findAll("td", {'width': '250'}):
            name = item.contents[1].text
            item = item.contents[1].get("href")[3:]
            item = f"https://deltaimmigration.com.au/{item}"
            f.write(item+"\n")
            links.append(item)
            names.append(name)
    print(f"We Have Collected {len(links)} urls")
    return links, names


def Second():
    links, names = First()
    with requests.Session() as req:
        for link, name in zip(links, names):
            print(f"Extracting {link}")
            r = req.get(link)
            soup = BeautifulSoup(r.text, 'html5lib')
            for item in soup.findAll("table", {'width': '900'}):
                with open(f"{name}.txt", 'w', newline="", encoding="UTF-8") as f:
                    f.write(item.text)


Second()

推荐阅读