python - 如何在for循环下打开多个TXT文件并为每个文件分配一个名称
问题描述
我试图首先抓取td
包含不同工作名称(带有链接)的名称。我想将我将再次从那些“td”链接(来自他们网页的相应工作的数据)中刮取的数据保存在不同的 txt 文件中。我希望将每个网页的抓取数据分别保存在不同的 txt 文件中。我可以这样做吗?如果您对此有所了解,请分享您的想法!!
import requests
from bs4 import BeautifulSoup
main = "https://deltaimmigration.com.au/Australia-jobs/"
def First():
r = requests.get(main)
soup = BeautifulSoup(r.text, 'html5lib')
links = []
with open("links.txt", 'w', newline="", encoding="UTF-8") as f:
for item in soup.findAll("td", {'width': '250'}):
item = item.contents[1].get("href")[3:]
item = f"https://deltaimmigration.com.au/{item}"
f.write(item+"\n")
links.append(item)
print(f"We Have Collected {len(links)} urls")
return links
def Second():
links = First()
with requests.Session() as req:
for link in links:
print(f"Extracting {link}")
r = req.get(link,timeout = 100)
soup = BeautifulSoup(r.text, 'html5lib')
for item in soup.findAll("table", {'width': '900'}):
return item
def Third():
r = requests.get(main)
soup = BeautifulSoup(r.text, 'html5lib')
result = Second()
for item in soup.findAll("td", {'width': '250'}):
with open(item.text + '.txt', 'w', newline="", encoding="UTF-8") as f:
f.write('result')
Third()
我尝试了以下方法:
with open(item.text + '.txt', 'w', newline="", encoding="UTF-8") as f:
但我收到错误
File "e:/test/check.py", line 10, in Third with open(item.text + '.txt', 'w', newline="", encoding="UTF-8") as f: FileNotFoundError: [Errno 2] No such file or directory: ' Vegetable Grower (Aus)/market Gardener (NZ).txt'"
解决方案
import requests
from bs4 import BeautifulSoup
main = "https://deltaimmigration.com.au/Australia-jobs/"
def First():
r = requests.get(main)
soup = BeautifulSoup(r.text, 'html5lib')
links = []
names = []
with open("links.txt", 'w', newline="", encoding="UTF-8") as f:
for item in soup.findAll("td", {'width': '250'}):
name = item.contents[1].text
item = item.contents[1].get("href")[3:]
item = f"https://deltaimmigration.com.au/{item}"
f.write(item+"\n")
links.append(item)
names.append(name)
print(f"We Have Collected {len(links)} urls")
return links, names
def Second():
links, names = First()
with requests.Session() as req:
for link, name in zip(links, names):
print(f"Extracting {link}")
r = req.get(link)
soup = BeautifulSoup(r.text, 'html5lib')
for item in soup.findAll("table", {'width': '900'}):
with open(f"{name}.txt", 'w', newline="", encoding="UTF-8") as f:
f.write(item.text)
Second()
推荐阅读
- python - 带有 Python 和 Google Colab 的 MS Word 文档
- sql - 在选择查询中根据年龄提取不同的行
- visual-studio-2019 - 如何摆脱 Visual Studio“未映射到任何依赖关系验证图”警告?
- servicestack - 根据传入的请求控制servicestack中的cookiedomain
- swift - 应用程序试图以模态方式呈现视图控制器
已经由 迅速 - android - Windows 找不到“C:\Program Files\Android Android Studio1\uninstall.exe”
- angular - 基于 `forRoot` 配置动态导入模块到模块中
- java - 错误:for-each 不适用于表达式类型
- asp.net-mvc - 如何将 Azure AD 身份验证添加到现有的 .NET MVC Web 应用程序?
- php - 在php中,为什么递增字符串'Z'会打印'AA'?