Goto

Collaborating Authors

 hir


The Wisdom of Hindsight Makes Language Models Better Instruction Followers

Zhang, Tianjun, Liu, Fangchen, Wong, Justin, Abbeel, Pieter, Gonzalez, Joseph E.

arXiv.org Artificial Intelligence

Reinforcement learning has seen wide success in finetuning large language models to better align with instructions via human feedback. The so-called algorithm, Reinforcement Learning with Human Feedback (RLHF) demonstrates impressive performance on the GPT series models. However, the underlying Reinforcement Learning (RL) algorithm is complex and requires an additional training pipeline for reward and value networks. In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner. Such an algorithm doesn't require any additional parameters except for the original language model and maximally reuses the pretraining pipeline. To achieve this, we formulate instruction alignment problem for language models as a goal-reaching problem in decision making. We propose Hindsight Instruction Relabeling (HIR), a novel algorithm for aligning language models with instructions. The resulting two-stage algorithm shed light to a family of reward-free approaches that utilize the hindsightly relabeled instructions based on feedback. We evaluate the performance of HIR extensively on 12 challenging BigBench reasoning tasks and show that HIR outperforms the baseline algorithms and is comparable to or even surpasses supervised finetuning.


Boosting Neural Networks to Decompile Optimized Binaries

Cao, Ying, Liang, Ruigang, Chen, Kai, Hu, Peiwei

arXiv.org Artificial Intelligence

Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.


Expressive Description Logic with Instantiation Metamodelling

Kubincová, Petra (Comenius University in Bratislava) | Kľuka, Ján (Comenius University in Bratislava) | Homola, Martin (Comenius University in Bratislava)

AAAI Conferences

We investigate a higher-order extension of the description logic (DL) SROIQ that provides a fixedly interpreted role semantically coupled with instantiation. It is useful to express interesting meta-level constraints on the modelled ontology. We provide a model-theoretic characterization of the semantics, and we show the decidability by means of reduction.