python+selenium web自动化

前置环境：python3.7，pycharm2019.3.3

1.selenium安装

pip install selenium

查看是否安装成功：pip show selenium

可使用不同浏览器（chrome, firefox, IE等）的driver。

使用chromedriver，下载chromedriver.exe，放到python的Script路径下。

chromedriver下载：

Chrome浏览器依次点右上角的三个点–帮助-关于GoogleChrome 查看版本号，然后打开网址（https://npm.taobao.org/mirrors/chromedriver）选择与浏览器版本号最接近的驱动下载。

import os

from selenium import webdriver

from urllib.parse import quote

from bs4 import BeautifulSoup

from time import sleep

def getUrl(url):

#连接Chrome浏览器

browser = webdriver.Chrome()

ids = str(url)

#搜狗微信搜索公众号的网页网址

gzhUrl = 'https://gzh.sogou.com/weixin?type=1&query=' + quote(ids) + '&ie=utf8&s_from=input&_sug_=n&_sug_type_='

#获取网址内容

browser.get(gzhUrl)

sleep(1)

#根据“最近文章”的xpath，点击链接跳转到文章

browser.find_element(by=By.XPATH, value="/html/body/div[2]/div/div[3]/ul/li/dl[3]/dd/a").click()

#browser定位到新标签网址

browser.switch_to.window(browser.window_handles[-1])

sleep(5)

#获取公众号文章内容

html = browser.page_source

#soup = BeautifulSoup(html, 'lxml')

return html

if __name__ == "__main__":

htmltext = getUrl('战略前沿技术')

print(htmltext)

注：xpath获取

Chrome浏览器-右上三个点-更多工具-开发者工具；

鼠标放到右侧的标签上找到“最近文章”下的文章链接，右击-Copy-Copy xPath获取。