python - 无法使用两个线程在一个脚本中执行两个函数
问题描述
我结合使用 python 创建了一个刮板,Thread
以加快执行速度。刮板应该解析网页中以不同字母结尾的所有可用链接。它确实解析了它们。
但是,我希望再次使用这些单独的链接来解析所有的names
和phone
数字Thread
。我可以设法运行的第一部分,Thread
但我不知道如何创建另一个Thread
来执行脚本的后半部分?
我本可以将它们包装在一个Thread
中,但我的目的是知道如何使用两个Threads
来执行两个功能。
对于第一部分:我尝试如下,它有效
import requests
import threading
from lxml import html
main_url = "https://www.houzz.com/proListings/letter/{}"
def alphabetical_links(mainurl):
response = requests.get(link).text
tree = html.fromstring(response)
return [container.attrib['href'] for container in tree.cssselect(".proSitemapLink a")]
if __name__ == '__main__':
linklist = []
for link in [main_url.format(chr(page)) for page in range(97,123)]:
thread = threading.Thread(target=alphabetical_links, args=(link,))
thread.start()
linklist+=[thread]
for thread in linklist:
thread.join()
我的问题是:如何sub_links()
在另一个函数中使用函数Thread
import requests
import threading
from lxml import html
main_url = "https://www.houzz.com/proListings/letter/{}"
def alphabetical_links(mainurl):
response = requests.get(link).text
tree = html.fromstring(response)
return [container.attrib['href'] for container in tree.cssselect(".proSitemapLink a")]
def sub_links(process_links):
response = requests.get(process_links).text
root = html.fromstring(response)
for container in root.cssselect(".proListing"):
try:
name = container.cssselect("h2 a")[0].text
except Exception: name = ""
try:
phone = container.cssselect(".proListingPhone")[0].text
except Exception: phone = ""
print(name, phone)
if __name__ == '__main__':
linklist = []
for link in [main_url.format(chr(page)) for page in range(97,123)]:
thread = threading.Thread(target=alphabetical_links, args=(link,))
thread.start()
linklist+=[thread]
for thread in linklist:
thread.join()
解决方案
尝试alphabetical_links
使用自己的线程进行更新:
import requests
import threading
from lxml import html
main_url = "https://www.houzz.com/proListings/letter/{}"
def alphabetical_links(mainurl):
response = requests.get(mainurl).text
tree = html.fromstring(response)
links_on_page = [container.attrib['href'] for container in tree.cssselect(".proSitemapLink a")]
threads = []
for link in links_on_page:
thread = threading.Thread(target=sub_links, args=(link,))
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
def sub_links(process_links):
response = requests.get(process_links).text
root = html.fromstring(response)
for container in root.cssselect(".proListing"):
try:
name = container.cssselect("h2 a")[0].text
except Exception: name = ""
try:
phone = container.cssselect(".proListingPhone")[0].text
except Exception: phone = ""
print(name, phone)
if __name__ == '__main__':
linklist = []
for link in [main_url.format(chr(page)) for page in range(97,123)]:
thread = threading.Thread(target=alphabetical_links, args=(link,))
thread.start()
linklist+=[thread]
for thread in linklist:
thread.join()
请注意,这只是如何管理“内部线程”的示例。由于同时启动了许多线程,您的系统可能由于缺乏资源而无法启动其中一些线程,并且您将遇到RuntimeError: can't start new thread
异常。在这种情况下,您应该尝试实现ThreadPool
推荐阅读
- r - 如何将地理坐标从 NAD 27 转换为 WGS84?
- google-apps-script - Apps 脚本的电子表格“copyTo()”将范围写入两次?
- python - Python:按自定义顺序对数据透视表中的索引列进行排序
- r - 如何使用 mailR 保存草稿
- c++ - 同时popen和fgets是否跨线程阻塞?
- amazon-web-services - 为什么我修改后的存储桶策略不允许我的应用程序查看 AWS 对象的 URL?
- android - 在 com.google.android.material.textfield.OutlinedBox 上获得重叠提示
- .net - 是否可以在核心中处理 http 模块?
- image - 大规模图像分类
- javascript - 将变量从 Symfony/Twig 传递给 vue