Goto

Collaborating Authors

 Large Language Model


SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain

Neural Information Processing Systems

The integration of synthetically generated data in the second and third steps enhances the models' capabilities in interpreting and processing legal texts, effectively reaching state-of-the-art performance and outperforming






Towards General Loop Invariant Generation: A Benchmark of Programs with Memory Manipulation

Neural Information Processing Systems

We collect 312 programs from various sources, including daily programs from college homework, the international competition (SV -COMP), benchmarks from previous papers (SLING), and programs from real-world software systems (Linux Kernel, GlibC, LiteOS, and Zephyr).



SAFEWORLD: Geo-DiverseSafetyAlignment

Neural Information Processing Systems

Despite significant progress inthisarea, anessential factor often remains overlooked:geo-diversity. Recognizing and incorporating geographical variations [41, 40, 4, 10, 31, 6] in safety principles is crucial in the global landscape of LLM safety. Cultural norms and legal frameworks vary widely, resulting in diverse definitions of safe and acceptable behavior. As shown in Figure 1, while giving a green hatasagift might bebenign inmanycultures, itisconsidered offensiveinChina.


InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models

Neural Information Processing Systems

With the rapid development of code LLMs, many popular evaluation benchmarks, such as HumanEval, DS-1000, and MBPP, have emerged to measure the performance of code LLMs with a particular focus on code generation tasks. However, they are insufficient to cover the full range of expected capabilities of code LLMs, which span beyond code generation to answering diverse coding-related questions.