python - 将脚本部署到 AWS Lambda
问题描述
我遇到的问题是我正在尝试运行一个使用 Selenium 尤其是 webdriver 的脚本。
driver = webdriver.Firefox(executable_path='numpy-test/geckodriver', options=options, service_log_path ='/dev/null')
我的问题是该函数需要 geckodriver 才能运行。Geckodriver 可以在我上传到 AWS 的 zip 文件中找到,但我不知道如何获得在 AWS 上访问它的功能。在本地它不是问题,因为它在我的目录中,所以一切都运行。
通过无服务器运行该功能时,我收到以下错误消息:
{ "errorMessage": "消息: 'geckodriver' 可执行文件需要在 PATH 中。\n", "errorType": "WebDriverException", "stackTrace": [ [ "/var/task/handler.py", 66, " main", "print(TatamiClearanceScrape())" ], [ "/var/task/handler.py", 28, "TatamiClearanceScrape", "driver = webdriver.Firefox(executable_path='numpy-test/geckodriver', options=选项,service_log_path ='/dev/null')"],["/var/task/selenium/webdriver/firefox/webdriver.py",164,"初始化", "self.service.start()" ], [ "/var/task/selenium/webdriver/common/service.py", 83, "start", "os.path.basename(self.path), self .start_error_message)" ] ] }
错误 - - - - - - - - - - - - - - - - - - - - - - - - - -
调用函数失败
任何帮助,将不胜感激。
编辑:
def TatamiClearanceScrape():
options = Options()
options.add_argument('--headless')
page_link = 'https://www.tatamifightwear.com/collections/clearance'
# this is the url that we've already determined is safe and legal to scrape from.
page_response = requests.get(page_link, timeout=5)
# here, we fetch the content from the url, using the requests library
page_content = BeautifulSoup(page_response.content, "html.parser")
driver = webdriver.Firefox(executable_path='numpy-test/geckodriver', options=options, service_log_path ='/dev/null')
driver.get('https://www.tatamifightwear.com/collections/clearance')
labtnx = driver.find_element_by_css_selector('a.btn.close')
labtnx.click()
time.sleep(10)
labtn = driver.find_element_by_css_selector('div.padding')
labtn.click()
time.sleep(5)
# wait(driver, 50).until(lambda x: len(driver.find_elements_by_css_selector("div.detailscontainer")) > 30)
html = driver.page_source
page_content = BeautifulSoup(html)
# we use the html parser to parse the url content and store it in a variable.
textContent = []
tags = page_content.findAll("a", class_="product-title")
product_title = page_content.findAll(attrs={'class': "product-title"}) # allocates all product titles from site
old_price = page_content.findAll(attrs={'class': "old-price"})
new_price = page_content.findAll(attrs={'class': "special-price"})
products = []
for i in range(len(product_title) - 2):
# groups all products together in list of dictionaries, with name, old price and new price
object = {"Product Name": product_title[i].get_text(strip=True),
"Old Price:": old_price[i].get_text(strip=True),
"New Price": new_price[i].get_text(), 'date': str(datetime.datetime.now())
}
products.append(object)
return products
解决方案
您可能想看看 AWS Lambda 层。借助 Layers,您可以使用 Lambda 库,而无需将它们包含在您的功能部署包中。层这样做是为了您不必上传对代码的每次更改的依赖项,您只需创建一个包含所有必需包的附加层。
阅读此处了解有关AWS Lambda 层的更多详细信息
推荐阅读
- swift - 将单个 UICollectionViewCell 对齐到 collectionView 的左侧
- kotlin - Intellij 插件开发和 Kotlin 测试
- angularjs - angualar js获取总复选框值
- logstash-grok - Grok 以匹配新消息行
- postgresql - 42809 从 Asp.Net C# 应用程序执行 PostgreSQL 存储过程时出错
- azure - 是否有任何实例 VM Scale Set 私有 IP 地址更改(Kubernetes 节点)?
- wcf - 在 VS 2017 15.9.14 windows 窗体应用程序中添加服务引用
- kubernetes - Kubenetes ApiService 删除后重新生成
- php - Laravel 选择左连接新数组索引中的所有列
- visual-studio - 可以将 Directory.Build.Props 的范围限制为单个 Visual Studio 解决方案吗?