aligned
Adaptive Data-Knowledge Alignment in Genetic Perturbation Prediction
The transcriptional response to genetic perturbation reveals fundamental insights into complex cellular systems. While current approaches have made progress in predicting genetic perturbation responses, they provide limited biological understanding and cannot systematically refine existing knowledge. Overcoming these limitations requires an end-to-end integration of data-driven learning and existing knowledge. However, this integration is challenging due to inconsistencies between data and knowledge bases, such as noise, misannotation, and incompleteness. To address this challenge, we propose ALIGNED (Adaptive aLignment for Inconsistent Genetic kNowledgE and Data), a neuro-symbolic framework based on the Abductive Learning (ABL) paradigm. This end-to-end framework aligns neural and symbolic components and performs systematic knowledge refinement. We introduce a balanced consistency metric to evaluate the predictions' consistency against both data and knowledge. Our results show that ALIGNED outperforms state-of-the-art methods by achieving the highest balanced consistency, while also re-discovering biologically meaningful knowledge. Our work advances beyond existing methods to enable both the transparency and the evolution of mechanistic biological understanding.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning
Yi, Xin, Zheng, Shunfan, Wang, Linlin, de Melo, Gerard, Wang, Xiaoling, He, Liang
The emergence of finetuning-as-a-service has revealed a new vulnerability in large language models (LLMs). A mere handful of malicious data uploaded by users can subtly manipulate the finetuning process, resulting in an alignment-broken model. Existing methods to counteract fine-tuning attacks typically require substantial computational resources. Even with parameter-efficient techniques like LoRA, gradient updates remain essential. To address these challenges, we propose \textbf{N}euron-\textbf{L}evel \textbf{S}afety \textbf{R}ealignment (\textbf{NLSR}), a training-free framework that restores the safety of LLMs based on the similarity difference of safety-critical neurons before and after fine-tuning. The core of our framework is first to construct a safety reference model from an initially aligned model to amplify safety-related features in neurons. We then utilize this reference model to identify safety-critical neurons, which we prepare as patches. Finally, we selectively restore only those neurons that exhibit significant similarity differences by transplanting these prepared patches, thereby minimally altering the fine-tuned model. Extensive experiments demonstrate significant safety enhancements in fine-tuned models across multiple downstream tasks, while greatly maintaining task-level accuracy. Our findings suggest regions of some safety-critical neurons show noticeable differences after fine-tuning, which can be effectively corrected by transplanting neurons from the reference model without requiring additional training. The code will be available at \url{https://github.com/xinykou/NLSR}
Enhancing Cross-Language Code Translation via Task-Specific Embedding Alignment in Retrieval-Augmented Generation
Bhattarai, Manish, Vu, Minh, Santos, Javier E., Boureima, Ismael, Malley, Daniel O'
We introduce a novel method to enhance cross-language code translation from Fortran to C++ by integrating task-specific embedding alignment into a Retrieval-Augmented Generation (RAG) framework. Unlike conventional retrieval approaches that utilize generic embeddings agnostic to the downstream task, our strategy aligns the retrieval model directly with the objective of maximizing translation quality, as quantified by the CodeBLEU metric. This alignment ensures that the embeddings are semantically and syntactically meaningful for the specific code translation task. Our methodology involves constructing a dataset of 25,000 Fortran code snippets sourced from Stack-V2 dataset and generating their corresponding C++ translations using the LLaMA 3.1-8B language model. We compute pairwise CodeBLEU scores between the generated translations and ground truth examples to capture fine-grained similarities. These scores serve as supervision signals in a contrastive learning framework, where we optimize the embedding model to retrieve Fortran-C++ pairs that are most beneficial for improving the language model's translation performance. By integrating these CodeBLEU-optimized embeddings into the RAG framework, our approach significantly enhances both retrieval accuracy and code generation quality over methods employing generic embeddings. On the HPC Fortran2C++ dataset, our method elevates the average CodeBLEU score from 0.64 to 0.73, achieving a 14% relative improvement. On the Numerical Recipes dataset, we observe an increase from 0.52 to 0.60, marking a 15% relative improvement. Importantly, these gains are realized without any fine-tuning of the language model, underscoring the efficiency and practicality of our approach.
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts
Hong, Haodong, Wang, Sen, Huang, Zi, Wu, Qi, Liu, Jiajun
Current Vision-and-Language Navigation (VLN) tasks mainly employ textual instructions to guide agents. However, being inherently abstract, the same textual instruction can be associated with different visual signals, causing severe ambiguity and limiting the transfer of prior knowledge in the vision domain from the user to the agent. To fill this gap, we propose Vision-and-Language Navigation with Multi-modal Prompts (VLN-MP), a novel task augmenting traditional VLN by integrating both natural language and images in instructions. VLN-MP not only maintains backward compatibility by effectively handling text-only prompts but also consistently shows advantages with different quantities and relevance of visual prompts. Possible forms of visual prompts include both exact and similar object images, providing adaptability and versatility in diverse navigation scenarios. To evaluate VLN-MP under a unified framework, we implement a new benchmark that offers: (1) a training-free pipeline to transform textual instructions into multi-modal forms with landmark images; (2) diverse datasets with multi-modal instructions for different downstream tasks; (3) a novel module designed to process various image prompts for seamless integration with state-of-the-art VLN models. Extensive experiments on four VLN benchmarks (R2R, RxR, REVERIE, CVDN) show that incorporating visual prompts significantly boosts navigation performance. While maintaining efficiency with text-only prompts, VLN-MP enables agents to navigate in the pre-explore setting and outperform text-based models, showing its broader applicability.
Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale
Chennakesavalu, Shriram, Hu, Frank, Ibarraran, Sebastian, Rotskoff, Grant M.
Searching through chemical space is an exceptionally challenging problem because the number of possible molecules grows combinatorially with the number of atoms. Large, autoregressive models trained on databases of chemical compounds have yielded powerful generators, but we still lack robust strategies for generating molecules with desired properties. This molecular search problem closely resembles the "alignment" problem for large language models, though for many chemical tasks we have a specific and easily evaluable reward function. Here, we introduce an algorithm called energy rank alignment (ERA) that leverages an explicit reward function to produce a gradient-based objective that we use to optimize autoregressive policies. We show theoretically that this algorithm is closely related to proximal policy optimization (PPO) and direct preference optimization (DPO), but has a minimizer that converges to an ideal Gibbs-Boltzmann distribution with the reward playing the role of an energy function. Furthermore, this algorithm is highly scalable, does not require reinforcement learning, and performs well relative to DPO when the number of preference observations per pairing is small. We deploy this approach to align molecular transformers to generate molecules with externally specified properties and find that it does so robustly, searching through diverse parts of chemical space. While our focus here is on chemical search, we also obtain excellent results on an AI supervised task for LLM alignment, showing that the method is scalable and general.
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- (2 more...)
Aligned: A Platform-based Process for Alignment
Shaotran, Ethan, Pesok, Ido, Jones, Sam, Liu, Emi
We are introducing Aligned, a platform for global governance and alignment of frontier models, and eventually superintelligence. While previous efforts at the major AI labs have attempted to gather inputs for alignment, these are often conducted behind closed doors. We aim to set the foundation for a more trustworthy, public-facing approach to safety: a constitutional committee framework. Initial tests with 680 participants result in a 30-guideline constitution with 93% overall support. We show the platform naturally scales, instilling confidence and enjoyment from the community. We invite other AI labs and teams to plug and play into the Aligned ecosystem.
- North America > United States (0.14)
- Asia > China (0.04)
- South America > Chile (0.04)
- (27 more...)
We Have to Develop Scalable Methods for AI Control so it Remains Aligned With Human Values - Prof.
Professor Bostrom spoke about AI and its safety on the sidelines of Russia's main event in the field of technological entrepreneurship - the annual Open Innovations forum - held by Skolkovo's Innovation Center. Sputnik: In what area can we expect to see "unicorns" in the future? Nick Bostrom: They can crop up in any part of the economy really, as long as the sector has a certain size that could bring new innovations and new ideas that would be successful enough to become worth a billion. Nick Bostrom: I think the highly salient areas which are sort of big ambitions tech things on the Internet. But a lot of other parts of the economy that also are quite large and have less of a public profile.
How To Build Human Friendly AI - Can AI Be Aligned With Human Values?
Nand Kishor is the Product Manager of House of Bots. After finishing his studies in computer science, he ideated & re-launched Real Estate Business Intelligence Tool, where he created one of the leading Business Intelligence Tool for property price analysis in 2012. He also writes, research and sharing knowledge about Artificial Intelligence (AI), Machine Learning (ML), Data Science, Big Data, Python Language etc... ... Nand Kishor is the Product Manager of House of Bots. After finishing his studies in computer science, he ideated & re-launched Real Estate Business Intelligence Tool, where he created one of the leading Business Intelligence Tool for property price analysis in 2012. He also writes, research and sharing knowledge about Artificial Intelligence (AI), Machine Learning (ML), Data Science, Big Data, Python Language etc... Want to Become a Data Scientist?
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence (1.00)