peekaTorrent: Leveraging P2P Hash Values for Digital Forensics

Sebastian Neuner, Martin Schmiedecker, Edgar Weippl

presented at DFRWS 2016 USA

Abstract Sub-file hashing and hash-based carving are increasingly popular methods in digital forensics to detect files on hard drives that are incomplete or have been partially overwritten or modified respectively. While these techniques have been shown to be usable in practice and can be implemented efficiently, they face the problem that a-priori specific “target files” need to be available. While it is always feasible and, in fact, trivial to create case-specific sub-file hash collections, we propose the creation of case-independent sub-file hash databases. To facilitate hash databases which can be publicly shared among investigators, we propose the usage of data from peer-to-peer file sharing networks such as BitTorrent. Most of the file sharing networks in use today rely on large quantities of hash values for integrity checking and chunk identification, and can be leveraged for digital forensics.

In this paper we show how these hash values can be of use to identify possibly vast amounts of data and thus present a feasible solution to cope with the ever-increasing case sizes in digital forensics. While the methodology is independent of the used file sharing protocol, we harvested information from the BitTorrent network. In total we collected and analyzed more than 3.2 billion hash values from 2.3 million torrent files, and we discuss to what extent they can be used to identify otherwise unknown file fragments and data remnants. Using open-source tools like bulk extractor and hashdb, these hash values can be directly used to enhance the effectiveness of sub-file hashing at large scale.

You can find the paper here.


  title={PeekaTorrent: Leveraging P2P Hash Values for Digital Forensics},
  author={Sebastian Neuner, Martin Schmiedecker, Edgar Weippl},
  journal={Digital Investigations},