commoncrawl.org/

Common Crawl

Common Crawl maintains a free, open repository of web crawl data that can be used by anyone.

Active

Since 2007

Open-source

VISIT CLAIM

Report

Over 250 billion pages spanning 17 years. Free and open corpus since 2007. Cited in over 10,000 research papers. 3–5 billion new pages added each month.

Org. type: Non-profit / charity / foundation

Project type: Resource

Categories: Open knowledge, Civic data, Open internet, Research tools

Common Crawl

Stay up to date with the latest