response
Jinx: Unlimited LLMs for Probing Alignment Failures
Unlimited, or so-called helpful-only language models are trained without safety alignment constraints and never refuse user queries. They are widely used by leading AI companies as internal tools for red teaming and alignment evaluation. For example, if a safety-aligned model produces harmful outputs similar to an unlimited model, this indicates alignment failures that require further attention. Despite their essential role in assessing alignment, such models are not available to the research community. We introduce Jinx, a helpful-only variant of popular open-weight LLMs. Jinx responds to all queries without refusals or safety filtering, while preserving the base model's capabilities in reasoning and instruction following. It provides researchers with an accessible tool for probing alignment failures, evaluating safety boundaries, and systematically studying failure modes in language model safety.
Foundation models may exhibit staged progression in novel CBRN threat disclosure
The extent to which foundation models can disclose novel chemical, biological, radiation, and nuclear (CBRN) threats to expert users is unclear due to a lack of test cases. I leveraged the unique opportunity presented by an upcoming publication describing a novel catastrophic biothreat - "Technical Report on Mirror Bacteria: Feasibility and Risks" - to conduct a small controlled study before it became public. Graduate-trained biologists tasked with predicting the consequences of releasing mirror E. coli showed no significant differences in rubric-graded accuracy using Claude Sonnet 3.5 new (n=10) or web search only (n=2); both groups scored comparably to a web baseline (28 and 43 versus 36). However, Sonnet reasoned correctly when prompted by a report author, but a smaller model, Haiku 3.5, failed even with author guidance (80 versus 5). These results suggest distinct stages of model capability: Haiku is unable to reason about mirror life even with threat-aware expert guidance (Stage 1), while Sonnet correctly reasons only with threat-aware prompting (Stage 2). Continued advances may allow future models to disclose novel CBRN threats to naive experts (Stage 3) or unskilled users (Stage 4). While mirror life represents only one case study, monitoring new models' ability to reason about privately known threats may allow protective measures to be implemented before widespread disclosure.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
Response to reviewers Dear reviewers, Thank you all very much for your time and your comments on how to improve our presentation of our results. A common theme from your comments is a concern on whether our results and our methods are of general interest. We agree that the assumptions we use are restrictive. However, the proof technique we introduce is flexible and will extend to a more general setup (at the cost of heavier notation and longer calculations). The easiest extension is to suppose only that hybrid distributions are strongly log-concave, which is much less stringent than making assumptions on likelihood sites, and covers many latent Gaussian models, like logistic or probit regression.
An Evaluation Benchmark for Autoformalization in Lean4
Gulati, Aryan, Ladsaria, Devanshu, Mishra, Shubhra, Sidhu, Jasdeep, Miranda, Brando
Large Language Models (LLMs) hold the potential to revolutionize autoformalization. The introduction of Lean4, a mathematical programming language, presents an unprecedented opportunity to rigorously assess the autoformalization capabilities of LLMs. This paper introduces a novel evaluation benchmark designed for Lean4, applying it to test the abilities of state-of-the-art LLMs, including GPT-3.5, GPT-4, and Gemini Pro. Our comprehensive analysis reveals that, despite recent advancements, these LLMs still exhibit limitations in autoformalization, particularly in more complex areas of mathematics. These findings underscore the need for further development in LLMs to fully harness their potential in scientific research and development. This study not only benchmarks current LLM capabilities but also sets the stage for future enhancements in autoformalization.
Towards Model Predictive Control for Acrobatic Quadrotor Flights
Jain, Saransh, Shethwala, Yash, Das, Jnaneshwar
This study explores modeling and control for quadrotor acrobatics, focusing on executing flip maneuvers. Flips are an elegant way to deliver sensor probes into no-fly or hazardous zones, like volcanic vents. Successful flips require feasible trajectories and precise control, influenced by rotor dynamics, thrust allocation, and control methodologies. The research introduces a novel approach using Model Predictive Control (MPC) for real-time trajectory planning. The MPC considers dynamic constraints and environmental variables, ensuring system stability during maneuvers. The proposed methodology's effectiveness is examined through simulation studies in ROS and Gazebo, providing insights into quadrotor behavior, response time, and trajectory accuracy. Real-time flight experiments on a custom agile quadrotor using PixHawk 4 and Hardkernel Odroid validate MPC-designed controllers. Experiments confirm successful execution and adaptability to real-world scenarios. Outcomes contribute to autonomous aerial robotics, especially aerial acrobatics, enhancing mission capabilities. MPC controllers find applications in probe throws and optimal image capture views through efficient flight paths, e.g., full roll maneuvers. This research paves the way for quadrotors in demanding scenarios, showcasing groundbreaking applications. Video Link: \url{ https://www.youtube.com/watch?v=UzR0PWjy9W4}
What Are Transformer Models and How Do They Work?
Transformers are a new development in machine learning that have been making a lot of noise lately. They are incredibly good at keeping track of context, and this is why the text that they write makes sense. In this blog post, we will go over their architecture and how they work. Transformer models are one of the most exciting new developments in machine learning. They were introduced in the paper Attention is All You Need. Transformers can be used to write stories, essays, poems, answer questions, translate between languages, chat with humans, and they can even pass exams that are hard for humans!
Ocean temperatures rising faster than thought in 'delayed response' to global warming, scientists say
LONDON - The world's oceans are rising in temperature faster than previously believed as they absorb most of the world's growing climate-changing emissions, scientists said Thursday. Ocean heat -- recorded by thousands of floating robots -- has been setting records repeatedly over the last decade, with 2018 expected to be the hottest year yet, displacing the 2017 record, according to an analysis by the Chinese Academy of Sciences. That is driving sea level rise, as oceans warm and expand, and helping fuel more intense hurricanes and other extreme weather, scientists warn. The warming, measured since 1960, is faster than predicted by scientists in a 2013 Intergovernmental Panel on Climate Change report that looked at ocean warming, according to the study, published this week in the journal Science. "It's mainly driven by the accumulation of greenhouse gases such as carbon dioxide in the atmosphere due to human activities," said Lijing Cheng, a lead author of the study from the Chinese Academy of Sciences.
434
The classical approach to the acquisition of knowledge and reason in artificial intelligence is to program the facts and rules into the machine. Unfortunately, the amount of time required to program the equivalent of human intelligence is prohibitively large. An alternative approach allows an automaton to learn to solve problems through iterative trial-and-error interaction with its environment, much as humans do. To solve a problem posed by the environment, the automaton generates a sequence or collection of responses based on its experience. The environment evaluates the effectiveness of this collection, and reports its evaluation to the automaton.
Cancer: A Computational Disease That AI Can Cure
From an AI perspective, finding effective treatments for cancer is a high-dimensional search problem characterized by many molecularly distinct cancer subtypes, many potential targets and drug combinations, and a dearth of highquality data to connect molecular subtypes and treatments to responses. The broadening availability of molecular diagnostics and electronic medical records presents both opportunities and challenges to apply AI techniques to personalize and improve cancer treatment. We discuss these in the context of Cancer Commons, a "rapid learning" community where patients, physicians, and researchers collect and analyze the molecular and clinical data from every cancer patient and use these results to individualize therapies. Research opportunities include adaptively planning and executing individual treatment experiments across the whole patient population, inferring the causal mechanisms of tumors, predicting drug response in individuals, and generalizing these findings to new cases. The goal is to treat each patient in accord with the best available knowledge and to continually update that knowledge to benefit subsequent patients.