Common Crawl

Common Crawl maintains a free, open repository of web crawl data that can be used by anyone.
Active
Since 2007
Open-source

Over 250 billion pages spanning 17 years. Free and open corpus since 2007. Cited in over 10,000 research papers. 3–5 billion new pages added each month.

Common Crawl
Org. type: Non-profit / charity / foundation
Project type: Resource
Last modified: Nov 12, 2025 Added: Apr 29, 2024
Back to Top