ClueWeb12

The ClueWeb12 data set was created to support research on information retrieval and related human language technologies. The data set consists of 870,043,929 English web pages, collected between February 10, 2012 and May 10, 2012. ClueWeb12 is a companion or successor to the ClueWeb09 web data set. Distribution of ClueWeb12 began in January 2013.