7 Golang Web Scraping Frameworks with Code Examples and Reviews

Relia Software

Relia Software

Colly, Goquery, Rod, Chromedp, Ferret, Playwright-Go, and net/http + Goquery are the 7 most popular web scraping frameworks for Golang developers in 2026.

7 Golang Web Scraping Frameworks with Code Examples and Reviews

Go is ideal for web scraping at scale without dependency hell, GIL bottleneck, and runtime overhead because it is fast, concurrent, and compiles to a single binary.

Currently, Colly, Goquery, Rod, Chromedp, Ferret, Playwright-Go, and net/http + Goquery are the most popular web scraping frameworks for Golang. But which one should you actually use?

This guide compares the 8 major Go web scraping frameworks above, with working code examples, pros/cons, and a decision matrix so you can pick the right tool for your project.

FrameworkGitHub StarsJS RenderingBest ForLearning Curve
Colly~25K<NoLarge-scale crawlingLow
Goquery~14KNoHTML parsingLow
Rod~6KYesStealth scrapingMedium
Chromedp~11.5KYesBrowser automationMedium
Ferret~3KYesDeclarative extractionLow
Playwright-GoCommunityYesCross-browserMedium
net/http + GoquerystdlibNoSimple one-off scrapesLow

Colly

GitHub: gocolly/colly | Stars: ~25K

Colly is the most popular Go scraping framework and works well for static HTML pages. It manages the entire crawling lifecycle including handling HTTP requests, rate limiting, cookies, caching, and parallel tasks.

Key Features:

  • Built-in concurrency with configurable parallelism;
  • Automatic cookie and session handling;
  • Request/response caching;
  • Robots.txt compliance;
  • Proxy rotation support;
  • Distributed scraping via Redis backend.

Code Example:

go
package main

import (
    "fmt"
    "time"
    "github.com/gocolly/colly/v2"
)

func main() {
    c := colly.NewCollector(
        colly.AllowedDomains("example.com"),
        colly.Async(true),
    )

    // Set concurrency limits
    c.Limit(&colly.LimitRule{
        DomainGlob:  "*",
        Parallelism: 4,
        Delay:       2 * time.Second,
    })

    // Extract data on every matching element
    c.OnHTML("article.post", func(e *colly.HTMLElement) {
        title := e.ChildText("h2")
        link := e.ChildAttr("a", "href")
        fmt.Printf("Title: %s | Link: %s\n", title, link)
    })

    // Handle pagination
    c.OnHTML("a.next-page", func(e *colly.HTMLElement) {
        e.Request.Visit(e.Attr("href"))
    })

    c.OnError(func(r *colly.Response, err error) {
        fmt.Printf("Error: %s | URL: %s\n", err, r.Request.URL)
    })

    c.Visit("https://example.com/blog")
    c.Wait()
}

Pros/Cons:

ProsCons
  • Strong crawling features
  • Good for production static crawlers
  • Built-in rate limits and async support
  • Works well with proxies and custom headers
  • No JavaScript rendering
  • Callback style can become messy in large projects
  • Parsing isn't as flexible as Goquery.
  • Browser-like behavior is limited

Best for: Production crawlers that scrape static HTML at scale. If you're building a data pipeline that hits thousands of pages, Colly is your first choice.

Rating: 9/10 for static content scraping.

Goquery

GitHub: PuerkitoBio/goquery | Stars: ~14K

Goquery brings jQuery-style DOM manipulation to Go for searching by class, ID, tag, child selector, sibling selector, and table structure. In fact, Goquery is not a full scraping framework, so it does not crawl pages, manage retries, or handle rate limits. 

Goquery is actually a parsing library that parse HTML and let you find data with CSS selectors. For many Go scrapers, Goquery is the parsing layer behind the crawler.

Key Features:

  • jQuery-like CSS selector API
  • Chainable method calls
  • Parent/sibling/child traversal
  • Memory-efficient parsing
  • Works with any HTML source (files, strings, HTTP responses)

Code Example:

go
package main

import (
    "fmt"
    "log"
    "net/http"
    "github.com/PuerkitoBio/goquery"
)

func main() {
    resp, err := http.Get("https://example.com")
    if err != nil {
        log.Fatal(err)
    }
    defer resp.Body.Close()

    doc, err := goquery.NewDocumentFromReader(resp.Body)
    if err != nil {
        log.Fatal(err)
    }

    // jQuery-style selectors
    doc.Find("div.product").Each(func(i int, s *goquery.Selection) {
        title := s.Find("h3.title").Text()
        price := s.Find("span.price").Text()
        link, _ := s.Find("a").Attr("href")
        fmt.Printf("#%d: %s — %s (%s)\n", i+1, title, price, link)
    })

    // Chain traversals
    doc.Find("table tbody tr").Each(func(i int, row *goquery.Selection) {
        cells := row.Find("td")
        name := cells.Eq(0).Text()
        value := cells.Eq(1).Text()
        fmt.Printf("%s: %s\n", name, value)
    })
}

Pros/Cons:

ProsCons
  • Simple CSS selector syntax
  • Great for nested HTML
  • Easy to test with saved HTML
  • Works well with Colly
  • No HTTP client or crawler features
  • No JavaScript rendering
  • No retry or concurrency management
  • Selector errors are easy to miss

Best for: When you need precise, complex DOM traversal. Often paired with Colly (which uses Goquery internally) or net/http for simple scrapes.

Rating: 8/10 as a parsing library (not standalone scraper).

Rod

GitHub: go-rod/rod | Stars: ~6K

Rod controls Chrome or Chromium through the Chrome DevTools Protocol and is useful when the page needs JavaScript before your data appears. It uses a decode-on-demand architecture that makes it faster and lighter than Chromedp for most workloads.

Rod is often chosen for scraping single-page apps, dashboards, infinite-scroll pages, product pages with client-side rendering, and sites where you need to click, type, wait, scroll, or intercept network traffic.

Key Features:

  • Auto-wait for elements (no manual sleep);
  • Built-in stealth mode via go-rod/stealth;
  • Chrome version management (auto-download);
  • Thread-safe for concurrent scraping;
  • Screenshot and PDF generation;
  • Network interception.

Code Example:

go
package main

import (
    "fmt"
    "github.com/go-rod/rod"
    "github.com/go-rod/rod/lib/launcher"
    "github.com/go-rod/stealth"
)

func main() {
    // Launch with stealth to bypass bot detection
    url := launcher.New().Headless(true).MustLaunch()
    browser := rod.New().ControlURL(url).MustConnect()
    defer browser.MustClose()

    page := stealth.MustPage(browser)

    page.MustNavigate("https://example.com/spa").MustWaitLoad()

    // Auto-waits for element to appear
    title := page.MustElement("h1.product-title").MustText()
    price := page.MustElement("span.price").MustText()

    fmt.Printf("Product: %s | Price: %s\n", title, price)

    // Handle infinite scroll
    for i := 0; i < 5; i++ {
        page.MustEval(`window.scrollTo(0, document.body.scrollHeight)`)
        page.MustWaitIdle() // Wait for AJAX to complete
    }

    // Extract all loaded items
    items := page.MustElements("div.item")
    for _, item := range items {
        name := item.MustElement(".name").MustText()
        fmt.Println(name)
    }
}

Pros/Cons:

ProsCons
  • Strong fit for JS-rendered pages
  • Auto-wait features reduce sleep hacks
  • Good control over page actions
  • Supports screenshots, PDFs, and network events
  • Requires browser runtime
  • Chrome/Chromium only
  • More memory than HTTP scraping
  • Production error handling needs care

Best for: JavaScript-heavy sites that need high performance. The stealth mode makes it ideal for scraping sites with anti-bot detection.

Rating: 9/10 for JS-rendered scraping.

Chromedp

GitHub: chromedp/chromedp | Stars: ~11.5K

Chromedp provides the simplest API for controlling Chrome/Chromium via CDP. Without WebDriver or external dependencies, Chromedp is a solid choice when you want browser automation.

Chromedp is useful for form submission, screenshots, PDF generation, UI checks, login flows, and scraping rendered HTML from JavaScript-heavy pages.

Key Features:

  • Direct Chrome DevTools Protocol communication;
  • No external dependencies (no Selenium/WebDriver)
  • Context-based execution model;
  • Screenshot and PDF generation;
  • Network event handling;
  • Form interaction and navigation.

Code Example:

go
package main

import (
    "context"
    "fmt"
    "log"
    "github.com/chromedp/chromedp"
)

func main() {
    ctx, cancel := chromedp.NewContext(context.Background())
    defer cancel()

    var html string
    var title string

    err := chromedp.Run(ctx,
        // Navigate to page
        chromedp.Navigate("https://example.com/dynamic-page"),

        // Wait for content to load
        chromedp.WaitVisible("div.content", chromedp.ByQuery),

        // Extract text
        chromedp.Text("h1", &title, chromedp.ByQuery),

        // Get full HTML
        chromedp.OuterHTML("body", &html, chromedp.ByQuery),
    )
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Title: %s\n", title)
    fmt.Printf("HTML length: %d\n", len(html))
}

Form Submission Example:

go
err := chromedp.Run(ctx,
    chromedp.Navigate("https://example.com/login"),
    chromedp.WaitVisible("#email"),
    chromedp.SendKeys("#email", "user@example.com"),
    chromedp.SendKeys("#password", "secret"),
    chromedp.Click("#submit"),
    chromedp.WaitVisible(".dashboard"),
    chromedp.Text(".welcome-message", &welcomeMsg),
)

Pros/Cons:

ProsCons
  • Largest community for Go browser automation
  • Simple, intuitive API
  • Clear documentation
  • 3-5x faster than Selenium
  • Slower JSON decoding than Rod
  • Single event-loop can deadlock under high concurrency
  • No built-in stealth mode
  • Chrome-only
  • Can use too much memory if each URL opens a new browser context

Best for: Browser automation tasks form filling, screenshot capture, quick JS rendering. Chromedp is the go-to choice when you need browser control with minimal setup. 

Rating: 8/10 for general browser automation.

Ferret

GitHub: MontFerret/ferret | Stars: ~3K

With Ferret, you write declarative queries in FQL (Ferret Query Language) instead of writing every request, wait, selector, and loop in Go code, which is inspired by ArangoDB's AQL.

Ferret is useful when scraping rules need to be written, reviewed, or changed separately from the Go application. For example, an internal data want to update extraction logic without touching the main service code.

Key Features:

  • Declarative FQL query language;
  • Handles both static and dynamic pages;
  • Embeddable in Go applications;
  • Built-in browser support;
  • Extensible via custom functions.

Code Example:

go
package main

import (
    "context"
    "fmt"
    "github.com/MontFerret/ferret/pkg/compiler"
    "github.com/MontFerret/ferret/pkg/drivers/http"
)

func main() {
    // Define WHAT data to get, rather than HOW to get it
    query := `
        LET doc = DOCUMENT("https://news.ycombinator.com/", { driver: "http" })
        FOR item IN ELEMENTS(doc, ".athing")
            LET title = ELEMENT(item, ".titleline > a")
            RETURN {
                title: INNER_TEXT(title),
                url: ATTR(title, "href")
            }
    `

    comp := compiler.New()
    comp.RegisterDrivers(http.NewDriver())

    program, _ := comp.Compile(query)
    result, _ := program.Run(context.Background())

    fmt.Println(string(result))
}

Pros/Cons:

ProsCons
  • Declarative approach means less boilerplate
  • Handles static and dynamic pages
  • Embeddable in Go apps
  • Good for data extraction pipelines
  • Can help teams separate extraction logic from application logic
  • Smaller community
  • FQL is another language to learn
  • Less control than procedural code
  • Performance overhead from query parsing
  • Debugging can feel less direct than normal Go code

Best for: Data extraction workflows where you want to define "what" to extract, not "how". Great for teams that want non-programmers to write scraping rules.

Rating: 7/10 — innovative but niche.

Playwright-Go

GitHub: MontFerret/ferret | Stars: ~3K

Playwright-Go is a community-driven port of Microsoft’s popular Playwright tool. It is the only Go framework that lets you switch between Chromium, Firefox, and WebKit using an identical interface.

For scraping, cross-browser support is not always necessary. Playwright-Go becomes more useful when your project also involves browser testing, mobile emulation, trace debugging, or cross-browser behavior checks.

Key Features:

  • Multi-browser support (Chromium, Firefox, WebKit);
  • Auto-wait for element state;
  • Network interception and mocking;
  • Mobile device emulation;
  • Trace viewer for debugging.

Code Example:

go
package main

import (
    "fmt"
    "log"
    "github.com/playwright-community/playwright-go"
)

func main() {
    pw, _ := playwright.Run()
    defer pw.Stop()

    // Launch any browser — Chromium, Firefox, or WebKit
    browser, _ := pw.Chromium.Launch(playwright.BrowserTypeLaunchOptions{
        Headless: playwright.Bool(true),
    })
    defer browser.Close()

    page, _ := browser.NewPage()
    page.Goto("https://example.com/products")

    // Auto-wait + modern locator API
    products := page.Locator("div.product-card")
    count, _ := products.Count()

    for i := 0; i < count; i++ {
        card := products.Nth(i)
        name, _ := card.Locator("h3").TextContent()
        price, _ := card.Locator(".price").TextContent()
        fmt.Printf("%s — %s\n", name, price)
    }
}

Pros/Cons:

ProsCons
  • Chromium, Firefox, and WebKit support
  • Strong locator and auto-wait model
  • Useful for testing and scraping
  • Network mocking built in
  • Community-maintained (not official)
  • Heavier than Rod or Chromedp
  • Requires browser downloads
  • Extra setup compared with pure HTTP scraping

Best for: When you need cross-browser compatibility or already use Playwright in your testing stack.

Rating: 7.5/10 — powerful but heavier than Go-native options.

>> Read more: Top 6 Best Golang Testing Frameworks for Developers

net/http + Goquery

Go's standard library net/http combined with Goquery is often enough for simple scraping tasks. Sometimes you don't need a framework like Colly to only fetch a few pages, parse static HTML, and save the result.

Code Example:

go
package main

import (
    "encoding/csv"
    "fmt"
    "log"
    "net/http"
    "os"
    "sync"
    "time"

    "github.com/PuerkitoBio/goquery"
)

func scrape(url string, wg *sync.WaitGroup, ch chan<- []string) {
    defer wg.Done()

    client := &http.Client{Timeout: 10 * time.Second}
    req, _ := http.NewRequest("GET", url, nil)
    req.Header.Set("User-Agent", "Mozilla/5.0 (compatible; GoScraper/1.0)")

    resp, err := client.Do(req)
    if err != nil {
        log.Printf("Error fetching %s: %v", url, err)
        return
    }
    defer resp.Body.Close()

    doc, _ := goquery.NewDocumentFromReader(resp.Body)
    doc.Find("div.listing").Each(func(i int, s *goquery.Selection) {
        title := s.Find("h2").Text()
        price := s.Find(".price").Text()
        ch <- []string{url, title, price}
    })
}

func main() {
    urls := []string{
        "https://example.com/page/1",
        "https://example.com/page/2",
        "https://example.com/page/3",
    }

    ch := make(chan []string, 100)
    var wg sync.WaitGroup

    for _, url := range urls {
        wg.Add(1)
        go scrape(url, &wg, ch)
    }

    go func() {
        wg.Wait()
        close(ch)
    }()

    // Write results to CSV
    file, _ := os.Create("output.csv")
    writer := csv.NewWriter(file)
    writer.Write([]string{"URL", "Title", "Price"})

    for row := range ch {
        writer.Write(row)
    }
    writer.Flush()
    fmt.Printf("Scraped %d pages → output.csv\n", len(urls))
}

Pros/Cons:

ProsCons
  • Flexibility
  • No framework overhead
  • Easy to understand and debug
  • Great for small static scraping tasks   
  • Full control over headers, timeouts, and HTTP client behavior
  • Works well for proof of concept scrapers
  • You build everything yourself
  • No JS rendering
  • No built-in retry, queue, or rate limit logic
  • Can become messy as the scraper grows
  • Needs more boilerplate for production use
  • Not ideal for complex crawling workflows

Best for: One-off scraping tasks, learning Go concurrency patterns, or when you want full control without framework overhead.

Rating: 7/10 — maximum flexibility, minimum magic.

Performance Benchmarks

Static Page Throughput (pages/second)

Framework100 Pages1,000 Pages10,000 Pages
Colly (async)~50 p/s~45 p/s~40 p/s
net/http + Goquery~40 p/s~35 p/s~30 p/s
Goquery (single-threaded)~15 p/s~12 p/s~10 p/s

JS-Rendered Page Throughput

Framework100 Pages1,000 Pages
Rod~8 p/s~6 p/s
Chromedp~5 p/s~4 p/s
Playwright-Go~4 p/s~3 p/s
Ferret (CDP driver)~3 p/s~2 p/s

Memory Usage (per 100 pages)

  • Goquery: ~15 MB;
  • Colly: ~30 MB;
  • Rod: ~150 MB;
  • Chromedp: ~200 MB;
  • Playwright-Go: ~250 MB

Note: Benchmarks are approximate ranges based on community reports and typical workloads on a 4-core machine with stable network. Actual performance varies significantly by target site complexity, network latency, and hardware. Run your own benchmarks before making architecture decisions.

Common Go Scraping Patterns

Proxy Rotation

Proxy rotation splits your traffic across a list of different servers. To the target website, the requests look like they are coming from multiple different users instead of a single machine. This can avoid your IP address is blocked if you send too many requests in a short period of time.

go
// Works with Colly
c := colly.NewCollector()

proxies := []string{
    "http://proxy1:8080",
    "http://proxy2:8080",
    "http://proxy3:8080",
}

rp, _ := proxy.RoundRobinProxySwitcher(proxies...)
c.SetProxyFunc(rp)

Respectful Rate Limiting

Respectful rate limiting mimics human behavior to avoid overloading a website's server. This pattern sets a maximum number of parallel downloads and adds a small, randomized delay (called jitter) between requests. This keeps your scraper polite and less likely to trigger security blocks.

go
c.Limit(&colly.LimitRule{
    DomainGlob:  "*.example.com",
    Parallelism: 2,
    Delay:       3 * time.Second,
    RandomDelay: 2 * time.Second, // Jitter to look human
})

Error Retry with Backoff

A retry pattern catches specific failures (like HTTP 429 Too Many Requests or 500 Server Errors) and attempts to download the page again after waiting a few seconds.

go
c.OnError(func(r *colly.Response, err error) {
    if r.StatusCode == 429 || r.StatusCode >= 500 {
        time.Sleep(time.Duration(r.Request.Retries*2) * time.Second)
        r.Request.Retry()
    }
})

Concurrent Scraping with Channels

To scrape safely at high speeds, you use channels to pass data between these parallel tasks. One goroutine downloads the page, while another safely receives the data and saves it to a database or file without mixing up the information.

go
results := make(chan Product, 100)
var wg sync.WaitGroup

for _, url := range urls {
    wg.Add(1)
    go func(u string) {
        defer wg.Done()
        product := scrapeProduct(u)
        results <- product
    }(url)
}

go func() {
    wg.Wait()
    close(results)
}()

for product := range results {
    saveToDatabase(product)
}

Go vs Python: Which One is Better for Web Scraping?

FactorGoPython
Speed~5x faster (compiled, goroutines)Slower (GIL limits concurrency)
MemoryLower footprintHigher, especially with browser
DeploymentSingle binary, zero dependenciesRequires runtime + pip packages
EcosystemGrowing (Colly, Rod, Chromedp)Mature (Scrapy, BeautifulSoup, Selenium)
Learning CurveSteeper for beginnersEasier, more tutorials
CommunitySmaller but focusedMassive
Best ForProduction pipelines, high throughputPrototyping, one-off scrapes

Bottom line: Use Go when performance and deployment simplicity matter. Use Python when you need rapid prototyping or access to the largest ecosystem of scraping tools.

FAQs

Which Go web scraping framework is the fastest?

For static HTML, Colly is the fastest with built-in async support handling 40-50 pages/second. For JavaScript-rendered pages, Rod outperforms Chromedp with its decode-on-demand architecture.

Can I scrape JavaScript-rendered pages with Go?

Yes. Use Rod, Chromedp, or Playwright-Go, all use the Chrome DevTools Protocol to control a real browser. Rod and Chromedp are Go-native; Playwright-Go wraps the Playwright ecosystem.

Is Colly still maintained in 2026?

Yes. Colly remains actively maintained and is the most starred Go scraping library (~25K stars). It's production-ready with a stable API. 

Which framework handles anti-bot detection?

Rod with the go-rod/stealth plugin is the best option to handle anti-bot detection. It patches common browser fingerprinting checks automatically. For HTTP-level evasion, combine Colly with proxy rotation and randomized headers.

Can I use multiple frameworks together?

Absolutely. A common pattern is Colly + Goquery. Colly handles the crawling lifecycle (HTTP, rate limits, parallelism) while Goquery provides advanced DOM traversal. For sites with both static and dynamic pages, pair Colly with Rod.

Conclusion

The Go web scraping ecosystem has matured significantly. Here are the common web scraping frameworks you can choose based on your specific needs:

  • Static HTML at scale → Colly
  • Precise HTML parsing → Goquery (pair with Colly)
  • JS-heavy sites + performance → Rod
  • Quick browser automation → Chromedp
  • Cross-browser needs → Playwright-Go
  • Declarative extraction → Ferret
  • Simple one-off tasks → net/http + Goquery

Personally, I start with Colly for most projects, and add Rod when I hit JavaScript walls.

The combination of Go's goroutines, compiled speed, and these frameworks gives you a scraping stack that can handle millions of pages with the footprint of a single Docker container.

>>> Follow and Contact Relia Software for more information!

  • Web application Development
  • web development