首页 > 解决方案 > 如何在 iOS 上使用 XPath 进行抓取?

问题描述

我正在尝试查找有关如何在 iOS 平台上使用 XPath 的信息。在 Apple 文档中,我找到了有关 XPath apple doucumentation的信息,即 XPath 仅在 macOS 上可用。

我的目标是通过 XPath 抓取网页,但我找不到任何关于它的信息,而不是我使用 HTMLKit 的 XPath。

是否存在使用 XPath 脚本在 iOS 平台上抓取网页的方法?

标签: iosswiftparsingweb-scrapingxpath

解决方案


我正在使用 XPath 来抓取我想要的任何网页。对于我的实验,我正在使用 google 网页,您可以使用任何其他网页。

对于抓取,我确实使用了Eric Aya推荐的库Fuzi 对于安装,我使用了 Swift Package Manager,因为使用 Cocoa Pods 和 Carthage 我无法安装这个库,我不知道为什么)

为了测试我使用这个链接:

https://www.google.com/search?q=what+is+swift+programming+language&oq=what+is+swift+programming+lan&aqs=chrome.1.69i57j0l3j0i22i30l5j0i390.9399j0j7&sourceid=chrome&ie=UTF-8

从上面的网页中,我抓取了、titles、urs 和 description 。

我的代码:

extension ViewController: WKNavigationDelegate {
    func webView(_ webView: WKWebView, didFinish navigation: WKNavigation!) {
        parseHTML()
    }
    
    func parseHTML() {
        browser.evaluateJavaScript("document.documentElement.outerHTML") { (result, error) in
            guard let html = result as? String, error == nil else {
                print("Failed to get html string")
                return
            }
            
            do {
                // if encoding is omitted, it defaults to NSUTF8StringEncoding
                let doc = try HTMLDocument(string: html, encoding: String.Encoding.utf8)
 
                let classNameTitle = "LC20lb DKV0Md"
                let classNameDescription = "aCOpRe"
                let classNameURL = "yuRUbf"
                
                // XPath queries
                print("\n TITLE \n")
                for script in doc.xpath("//h3[@class='\(classNameTitle)']") {
                    print(script.stringValue)
                }
                
                print("\n Description \n")
                for script in doc.xpath("//span[@class='\(classNameDescription)']") {
                  print(script.stringValue)
                }
                
                print("\n URL \n")
                for script in doc.xpath("//div[@class='\(classNameURL)']") {
                    let a = script.firstChild(xpath: "a[contains(@href, 'http')]")
                    print(String((a?.attr("href") ?? "Something goes wrong, oops, sorry :(")))
                    
                }
              
            } catch let error {
              print(error)
            }
            
        }
    }
}

同样对于 WKWebView,我安装了一个桌面用户代理:

lazy var browser: WKWebView = {
        let view = WKWebView()
        view.navigationDelegate = self
        view.customUserAgent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36"
        return view
    }()

我不会描述 XPath 的所有语法,因为我是第一次使用 XPath :) 链接到XPath的语法,我建议阅读完整的 XPath 教程。

我的网页输出:

TITLE 

Swift - Apple Developer
Pros and Cons of Swift Programming Language | AltexSoft
Swift (мова програмування) — Вікіпедія
Swift (programming language) - Wikipedia
About Swift — The Swift Programming Language (Swift 5.4)
About Swift - Swift.org
What Is Swift Programming Language and Why Should You ...
Introduction and history of Swift up to 2020 - Exyte
What is the Swift programming language, and why should I ...
What is the Swift programming language? | by Lena Charles ...

 Description 

Great First Language — Swift is a powerful and intuitive programming language for macOS, iOS, watchOS, tvOS, and beyond. Writing Swift code is interactive and fun, the syntax is concise yet expressive, and Swift includes modern features developers love. Swift code is safe by design, yet also produces software that runs lightning-fast.
29 серп. 2019 р. — Swift is a compiled programming language for iOS, macOS, watchOS, tvOS, and Linux applications. Here's what you need to know about Swift.
↑ Apple announces Swift, a new programming language for iOS. ↑ The Swift Programming Language; ↑ Lattner, Chris (June 3, 2014) ...
Swift is a general-purpose, multi-paradigm, compiled programming language developed by Apple Inc. and the open-source community, first released in 2014.
It's a safe, fast, and interactive programming language that combines the best in modern language thinking with wisdom from the wider Apple engineering ...
About Swift. Swift is a general-purpose programming language built using a modern approach to safety, performance, and software design patterns. The goal of ...
29 вер. 2020 р. — Swift was created by Apple in 2014 for iOS, watchOS, tvOS, macOS app development as a logical substitute for Objective-C, which had flaws ...
20 лист. 2020 р. — What is Swift? First released in 2014, Swift is a powerful general-purpose programming language for the Apple ecosystem. Whether you need to ...
Swift, often referred to as “Objective-C, without the C", is an open source programming language developed and maintained by Apple, and it's what the company ...
Swift is the modern programming language that was designed to overcome the several challenges of the versatile Apple programming language Objective C. It ...

 URL 

https://developer.apple.com/swift/
https://www.altexsoft.com/blog/engineering/the-good-and-the-bad-of-swift-programming-language/
https://uk.wikipedia.org/wiki/Swift_(%D0%BC%D0%BE%D0%B2%D0%B0_%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC%D1%83%D0%B2%D0%B0%D0%BD%D0%BD%D1%8F)
https://en.wikipedia.org/wiki/Swift_(programming_language)
https://docs.swift.org/swift-book/
https://swift.org/about/
https://blackthorn-vision.com/blog/what-is-swift-programming-language-and-why-should-you-use-it
https://exyte.com/blog/introduciton-to-swift
https://www.itpro.co.uk/development/34417/what-is-the-swift-programming-language-and-why-should-i-learn-it
https://lenac1884.medium.com/what-is-the-swift-programming-language-b45e271175e2

如果有人可以在我的代码等方面给我建议。我将不胜感激:)


推荐阅读