Efficient Cross-Architecture Binary Function Embeddings through Knowledge Distillation
by Dominik Bayerl, Thomas Hutzelmann, and Hans-Joachim Hof (Technische Hochschule Ingolstadt)
Abstract
instruction listings. Unlike existing approaches, our solution explicitly considers the cross-architecture scenario: we propose a training method to adapt the model to different instruction set architectures (ISA) without having to train a new model from scratch, which allows the model to also be used efficiently for embedded systems, where there are a variety of different processor architectures. We show that our solution achieves a similarity classification accuracy of 89.6% on a dataset consisting of several real-world open source software projects. Finally, we conduct extensive experiments to demonstrate the effectiveness of knowledge distillation in increasing the computational efficiency of the embedding model. We demonstrate a reduction in the number of parameters from 87M to 23M, while still maintaining a classification accuracy of 87.8%. Our code and artifacts are available as open source.

