首页 > 解决方案 > 为什么有些网站不能使用 Selenium 实现自动化

问题描述

我尝试自动化https://www.westernunion.com/global-service/track-transfer网页,但无法弄清楚为什么网站没有导航到下一页。

我的脚本正在打开页面 -> 将 MTCN 输入为 2587051083 -> 单击继续按钮,但单击后没有任何显示。虽然手动复制相同的步骤效果很好。对于此类网站,我是否缺少任何浏览器设置?我一无所知

public static void main(String ar[]) {
        System.setProperty("webdriver.chrome.driver","D:\\Study\\selenium-java-2.48.2\\selenium-2.48.2\\chromedriver.exe");
        driver=new ChromeDriver();
        driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
        driver.manage().window().maximize();
        driver.get("https://www.westernunion.com/global-service/track-transfer");
        driver.findElement(By.xpath("//input[@id='trackingNumber']")).sendKeys("2587051083");
        driver.findElement(By.xpath("//button[@id='button-track-transfer']")).click();
        }

标签: seleniumgoogle-chromeselenium-webdriverwebdriverselenium-chromedriver

解决方案


https://www.westernunion.com/global-service/track-transfer网页上发送跟踪字段中的字符序列我对您自己的代码进行了一些小的修改,以诱导WebDriverwait以使所需的元素可点击,然后使用文本调用该元素,如下所示继续:click()

  • 代码块:

    import org.openqa.selenium.By;
    import org.openqa.selenium.WebDriver;
    import org.openqa.selenium.chrome.ChromeDriver;
    import org.openqa.selenium.chrome.ChromeOptions;
    import org.openqa.selenium.support.ui.ExpectedConditions;
    import org.openqa.selenium.support.ui.WebDriverWait;
    
    public class westernunion {
    
        public static void main(String[] args) {
    
            System.setProperty("webdriver.chrome.driver","C:\\Utility\\BrowserDrivers\\chromedriver.exe");
            ChromeOptions opt = new ChromeOptions();
            opt.addArguments("start-maximized");
            opt.addArguments("disable-infobars");
            opt.addArguments("--disable-extensions");
            WebDriver driver=new ChromeDriver(opt);
            driver.get("https://www.westernunion.com/global-service/track-transfer");
            new WebDriverWait(driver, 10).until(ExpectedConditions.elementToBeClickable(By.cssSelector("input.new-field.form-control.tt-mtcn.ng-pristine.ng-valid-mask"))).sendKeys("2587051083");
            driver.findElement(By.cssSelector("button.btn.btn-primary.btn-lg.btn-block.background-color-teal.remove-margin#button-track-transfer")).click();
        }
    }
    

似乎click()确实发生了并且微调器在一段时间内可见,但搜索被中断,并且在检查网页时,您会发现某些<script>标签和<link>标签引用了具有关键字dist的css。举个例子:

  • <link rel="stylesheet" type="text/css" href="/content/wucom/dist/20181210075630/css/responsive_css.min.css">
  • <script src="/content/wucom/dist/20181210075630/js/js-bumblebee.js"></script>
  • <link ng-if="trackTransferVm.trackTransferData.newTrackTransfer || trackTransferVm.trackTransferData.isRetail" rel="stylesheet" type="text/css" href="/content/wucom/dist/20181210075630/css/main.min.css" class="ng-scope" style="">

这清楚地表明该网站受到Bot Management服务提供商Distil Networks的保护,并且ChromeDriver的导航被检测到并随后被阻止


蒸馏

根据文章Distil.it 确实有一些东西......

Distil 通过观察网站行为和识别抓取工具特有的模式来保护网站免受自动内容抓取机器人的侵害。当 Distil 在一个站点上识别出恶意机器人时,它会创建一个列入黑名单的行为配置文件,并部署到其所有客户。类似于机器人防火墙的东西,Distil 检测模式并做出反应。

更远,

"One pattern with Selenium was automating the theft of Web content",Distil 首席执行官 Rami Essaid 在上周接受采访时表示。"Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".


参考

您可以在以下位置找到一些详细的讨论:


推荐阅读