The dataset was created by the NLP lab. It is a collection of documents that have been tagged by users of the popular social bookmarking website User submitted tags were used to find pages tagged with various topic-related labels. You can read more about the dataset on this page. The file we are distributing to you contains just the data and the split indices. You can download it here.

New: Reduced Only Tar

You can now download a tar containing just the files you need to run against the reduced dataset by using this link.

