We offer both data in raw format (archives of random torrent files), as well as pre-processed hasdhb databases. Make sure to run "7z x" to extract the archives to keep the folder structure intact, some file systems have troubles with a large number of files. Also, hashdb removes duplicate entries for chunk hashes - you only get one InfoHash from the hashdb.
All data is GPL-2, and if you use it for a paper please cite us (bibtex at the bottom).
Used processing steps: