Goto

Collaborating Authors

 Tsimpoukelli, Maria


Scaling Language Models: Methods, Analysis & Insights from Training Gopher

arXiv.org Artificial Intelligence

Natural language communication is core to intelligence, as it allows ideas to be efficiently shared between humans or artificially intelligent systems. The generality of language allows us to express many intelligence tasks as taking in natural language input and producing natural language output. Autoregressive language modelling -- predicting the future of a text sequence from its past -- provides a simple yet powerful objective that admits formulation of numerous cognitive tasks. At the same time, it opens the door to plentiful training data: the internet, books, articles, code, and other writing. However this training objective is only an approximation to any specific goal or application, since we predict everything in the sequence rather than only the aspects we care about. Yet if we treat the resulting models with appropriate caution, we believe they will be a powerful tool to capture some of the richness of human intelligence. Using language models as an ingredient towards intelligence contrasts with their original application: transferring text over a limited-bandwidth communication channel. Shannon's Mathematical Theory of Communication (Shannon, 1948) linked the statistical modelling of natural language with compression, showing that measuring the cross entropy of a language model is equivalent to measuring its compression rate.


Relaxation Search: A Simple Way of Managing Optional Clauses

AAAI Conferences

A number of problems involve managing a set of optional clauses. For example, the soft clauses in a MAXSAT formula are optional—they can be falsified for a cost. Similarly, when computing a Minimum Correction Set for an unsatisfiable formula, all clauses are optional—some can be falsified in order to satisfy the remaining. In both of these cases the task is to find a subset of the optional clauses that achieves some criteria, and whose removal leaves a satisfiable formula. Relaxation search is a simple method of using a standard SAT solver to solve this task. Relaxation search is easy to implement, sometimes requiring only a simple modification of the variable selection heuristic in the SAT solver; it offers considerable flexibility and control over the order in which subsets of optional clauses are examined; and it automatically exploits clause learning to exchange information between the two phases of finding a suitable subset of optional clauses and checking if their removal yields satisfiability. We demonstrate how relaxation search can be used to solve MAXSAT and to compute Minimum Correction Sets. In both cases relaxation search is able to achieve state-of-the-art performance and solve some instances other solvers are not able to solve.