SPLAD: Obfuscation Resilient Software Plagiarism Detection


Software plagiarism is an act of reusing someone else's code, in whole or in part, into one's own program in a way violating the terms of original license. Along with the rapid developing software industry and the burst of open source projects, software plagiarism has become a very serious threat to Intellectual Property Protection and the "healthiness" of the open-source-embracing software industry. High profile billion-dollar lawsuits dealing with software plagiarism cases have already emerged and showed that even software giants steal code.

To address this threat, computer-aided, automated plagiarism detection tools should play a major role. However, existing plagiarism detection schemes, including both static and dynamic analysis based methods, are still premature. In fact, none of them is resilient to code obfuscation, and they all can be "defeatedd" by (in most cases rather simple) code-obfuscation-based counter-detection measures. Recent developments in code obfuscation have made such measures extremely easy and affordable to take, and indeed mature obfuscation tools have been freely available, making the situation even worse. Moreover, many existing schemes rely on analyzing the source code of a suspected software product, which often cannot be obtained until some strong evidences have been collected.

In this project, we aim to develop binary-oriented, obfuscation-resilient plagiarism detection methods that do not require source code analysis. This research, if successful, will take a significant step forward in addressing the software plagiarism threat.



Software Release


National Science Foundation (NSF)Computing and Communication Foundations (CCF)

Award #1320605, SHF: Small: Towards Obfuscation-Resilient Software Plagiarism Detection, Sencun Zhu, Dinghao Wu (Co-PI), and Peng Liu, National Science Foundation (NSF) CCF-1320605, $500,000, 2013-2017.