Goto

Collaborating Authors

 function-level binary source code matching


Review for NeurIPS paper: CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching

Neural Information Processing Systems

Overall, the reviewers have mixed opinions about this work. Although everyone agreed that this is an application paper, the reviewers felt that there are a series of issues that still need to be addressed. Specifically, in a camera-ready the authors are kindly asked to include "Circle Loss, Cross-Batch Memory, Adversarial Loss, and Cross-lingual Language Model" (as promised in the author feedback) to alleviate some of the reviewer concerns. Nevertheless, this paper addresses a problem in an interesting and important application domain and as such exploring how known neural architectures used in other domains can be adapted and used will be useful for the community, hence I believe this paper should be accepted.


Review for NeurIPS paper: CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching

Neural Information Processing Systems

Weaknesses: I thought that the motivation for function-level matching was a bit weak. I would have liked the paper to open with a scenario or two where the his useful, or absolutely necessary. Another issue is that the paper is not necessarily technically deep. Though I hesitate to be too tough on the paper for that reason; being able to get really good results with a simple method may be considered a feature of the approach. Actually, it is surprising to me that the authors treat source code as text, and binary code as a context free grammar.


CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching

Neural Information Processing Systems

Binary source code matching, especially on function-level, has a critical role in the field of computer security. Given binary code only, finding the corresponding source code improves the accuracy and efficiency in reverse engineering. Given source code only, related binary code retrieval contributes to known vulnerabilities confirmation. However, due to the vast difference between source and binary code, few studies have investigated binary source code matching. Previously published studies focus on code literals extraction such as strings and integers, then utilize traditional matching algorithms such as the Hungarian algorithm for code matching.