BinQuery: A Novel Framework for Retrieving Binary Function with Natural Language Query

ISSTA 2025, download

Bolun Zhang , Zeyu Gao , Hao Wang , Yuxin Cui , Siliang Qin , Chao Zhang , Kai Chen , Beibei Zhao .

Abstract

Binary Function Retrieval (BFR) is crucial in reverse engineering for identifying specific functions in binary code, especially those associated with malicious behavior or vulnerabilities. Traditional BFR methods rely on heuristics, often lacking the efficiency and adaptability needed for large-scale or diverse binary analysis tasks. To address these challenges, we present BinQuery, a Natural Language-based BFR (NL-based BFR) framework that uses natural language queries to retrieve relevant binary functions with improved flexibility and precision. BinQuery introduces innovative techniques to bridge information gaps between binary code and natural language, achieves fine-grained alignment for enhanced retrieval accuracy, and leverages Large Language Models (LLMs) to refine queries and generate diverse descriptions. Our extensive experiments indicate that BinQuery surpasses current state-of-the-art methods, achieving a 42.55% increase in recall@1 and a 4× improvement in performance on comparable benchmarks.