首页 > 解决方案 > Exercise: Web Crawler - print not working

问题描述

I'm a golang newbie and currently working on Exercise: Web Crawler.

I simply put the keyword 'go' before every place where func Crawl is invoked and hope it can be parallelized, but fmt.Printf doesn't work and prints nothing. Nothing other is changed on the original code besides this one. Would someone like to give me a hand?

func Crawl(url string, depth int, fetcher Fetcher) {
    // TODO: Fetch URLs in parallel.
    // TODO: Don't fetch the same URL twice.
    // This implementation doesn't do either:
    if depth <= 0 {
        return
    }
    body, urls, err := fetcher.Fetch(url)
    if err != nil {
        fmt.Println(err)
        return
    }
    fmt.Printf("found: %s %q\n", url, body)
    for _, u := range urls {
        go Crawl(u, depth-1, fetcher)
    }
    return
}

func main() {
    go Crawl("https://golang.org/", 4, fetcher)
}

标签: go

解决方案


According to the spec

Program execution begins by initializing the main package and then invoking the function main. When that function invocation returns, the program exits. It does not wait for other (non-main) goroutines to complete.

Therefore you have to explicitly wait for the other goroutine to end in main() function.

One way is simply add time.Sleep() at the end of main() function until you think that the other goroutine ends (e.g. maybe 1 second in this case).

Cleaner way is using sync.WaitGroup as follows:

func Crawl(wg *sync.WaitGroup, url string, depth int, fetcher Fetcher) {
    defer wg.Done()
    if depth <= 0 {
        return
    }
    body, urls, err := fetcher.Fetch(url)
    if err != nil {
        fmt.Println(err)
        return
    }
    fmt.Printf("found: %s %q\n", url, body)
    for _, u := range urls {
        wg.Add(1)
        go Crawl(wg, u, depth-1, fetcher)
    }
    return
}

func main() {
    wg := &sync.WaitGroup{}
    wg.Add(1)
    // first call does not need to be goroutine since its subroutine is goroutine.
    Crawl(wg, "https://golang.org/", 4, fetcher)
    //time.Sleep(1000 * time.Millisecond)
    wg.Wait()
}

This code stores counter in WaitGroup, increment it using wg.Add(), decrement using wg.Done() and waits until it goes zero using wg.Wait().

Confirm it in go playground: https://play.golang.org/p/WqQBqe6iFLp


推荐阅读