python - 如何从 Angular 网站中提取文本信息?
问题描述
我正在尝试从该网站提取某些文本字段,但对 Angular 来说是新的。我正在使用 selenium 来构建这个网络爬虫。我注意到确切的文本值没有存储在 html 代码中。有人可以帮助或提供一些提示来解决这个问题。我尝试使用:
find_element_by_id
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector
但是没有任何进展。谢谢 :)
这是我尝试提取文本的一种方法:
def csc():
alpah_list = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P"]
indexOfAlpha = 0
indexOfSheet = 2
for x in range(2,4):
y = x + 2
driver.implicitly_wait(20)
ranSleep()
driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div/div[1]/div[1]/div[2]/div/div/div/div[2]/div[2]/div/div['+ str(x) +']/div/div/div[6]/a').click()
driver.implicitly_wait(20)
worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), str(driver.find_element(By.CSS_SELECTOR("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']"))))
ranSleep()
driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div/ul/li[2]/a/span').click()
ranSleep()
indexOfSheet += 1
但我在终端上收到此错误
Traceback (most recent call last):
File "selTest.py", line 88, in <module>
csc()
File "selTest.py", line 44, in csc
worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), driver.find_element(By.cssSelector("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']")))
AttributeError: type object 'By' has no attribute 'cssSelector'
Shahans-MacBook-Pro:WebScraping Shahan$ python3 selTest.py
Traceback (most recent call last):
File "selTest.py", line 88, in <module>
csc()
File "selTest.py", line 44, in csc
worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), driver.find_element(By.CSS_SELECTOR("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']")))
TypeError: 'str' object is not callable
Shahans-MacBook-Pro:WebScraping Shahan$ python3 selTest.py
Traceback (most recent call last):
File "selTest.py", line 88, in <module>
csc()
File "selTest.py", line 44, in csc
worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), str(driver.find_element(By.CSS_SELECTOR("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']"))))
TypeError: 'str' object is not callable
PS 很抱歉,我无法共享该网站,因为它需要私人登录。
<input class="edited_field ng-pristine ng-untouched ng-valid ng-not-empty" type="text" ng-model="tab.content.site.name" ng-disabled="!tab.content.updateBtnPermission" disabled="disabled">
Qharr 的错误
这是我根据 Qharr 评论编写的代码
def csc():
alpah_list = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P"]
indexOfAlpha = 0
indexOfSheet = 2
for x in range(2,4):
y = x + 2
driver.implicitly_wait(20)
ranSleep()
driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div/div[1]/div[1]/div[2]/div/div/div/div[2]/div[2]/div/div['+ str(x) +']/div/div/div[6]/a').click()
driver.implicitly_wait(20)
worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), driver.find_element_by_css_selector('input.edited_field.ng-pristine.ng-untouched.ng-valid.ng-not-empty'))
ranSleep()
driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div/ul/li[2]/a/span').click()
ranSleep()
indexOfSheet += 1
Traceback (most recent call last):
File "selTest.py", line 88, in <module>
csc()
File "selTest.py", line 44, in csc
worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), str(driver.find_element(By.CSS_SELECTOR("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']"))))
TypeError: 'str' object is not callable
Shahans-MacBook-Pro:WebScraping Shahan$ python3 selTest.py
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/xlsxwriter/worksheet.py", line 469, in _write
f = float(token)
TypeError: float() argument must be a string or a number, not 'WebElement'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "selTest.py", line 88, in <module>
csc()
File "selTest.py", line 44, in csc
worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), driver.find_element_by_css_selector('input.edited_field.ng-pristine.ng-untouched.ng-valid.ng-not-empty'))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/xlsxwriter/worksheet.py", line 67, in cell_wrapper
return method(self, *args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/xlsxwriter/worksheet.py", line 408, in write
return self._write(row, col, *args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/xlsxwriter/worksheet.py", line 474, in _write
raise TypeError("Unsupported type %s in write()" % type(token))
TypeError: Unsupported type <class 'selenium.webdriver.remote.webelement.WebElement'> in write()
解决方案
当前错误抱怨复合类名称。尝试
driver.find_element_by_css_selector('input.edited_field.ng-pristine.ng-untouched.ng-valid.ng-not-empty'))
您可能还需要等待条件,并且可能可以缩短选择器以使用更少的类。
推荐阅读
- php - php中基于角色的登录
- google-bigquery - 带有 LIMIT 子句的 Bigquery CASE SENSITIVE 查询不起作用?
- django - Django 测试更新帖子
- typescript - 打字稿通用“演员”功能
- kubernetes - Kubernetes Ingress (GCE) 导致默认服务选择器停止工作
- android - 如何仅通过地址栏中的 URL 打开应用程序
- wcf - WCF 启动失败消息框
- php - 在 Woocommerce 订单电子邮件的商品名称后添加自定义商品数据
- shiny - 带情节的闪亮仪表板
- android - Firebase Android - 能够将数据写入实时数据库但不能读回