AITopics

2411.07941

Country:

Europe > Switzerland (0.04)
South America > Peru > Lima Department > Lima Province > Lima (0.04)
North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Nuclear Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceDec-11-2024

Concept Bottleneck Language Models For protein design

Ismail, Aya Abdelsalam, Oikarinen, Tuomas, Wang, Amy, Adebayo, Julius, Stanton, Samuel, Joren, Taylor, Kleinhenz, Joseph, Goodman, Allen, Bravo, Héctor Corrada, Cho, Kyunghyun, Frey, Nathan C.

We introduce Concept Bottleneck Protein Language Models (CB-pLM), a generative masked language model with a layer where each neuron corresponds to an interpretable concept. Our architecture offers three key benefits: i) Control: We can intervene on concept values to precisely control the properties of generated proteins, achieving a 3 times larger change in desired concept values compared to baselines. ii) Interpretability: A linear mapping between concept values and predicted tokens allows transparent analysis of the model's decision-making process. iii) Debugging: This transparency facilitates easy debugging of trained models. Our models achieve pre-training perplexity and downstream task performance comparable to traditional masked protein language models, demonstrating that interpretability does not compromise performance. While adaptable to any language model, we focus on masked protein language models due to their importance in drug discovery and the ability to validate our model's capabilities through real-world experiments and expert knowledge. We scale our CB-pLM from 24 million to 3 billion parameters, making them the largest Concept Bottleneck Models trained and the first capable of generative language modeling.

intervention, language model, sequence, (14 more...)

2411.0609

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Machine LearningDec-11-2024

Preference Discerning with LLM-Enhanced Generative Retrieval

Paischer, Fabian, Yang, Liu, Liu, Linfeng, Shao, Shuai, Hassani, Kaveh, Li, Jiacheng, Chen, Ricky, Li, Zhang Gabriel, Gao, Xialo, Shao, Wei, Feng, Xue, Noorshams, Nima, Park, Sem, Long, Bo, Eghbalzadeh, Hamid

Sequential recommendation systems aim to provide personalized recommendations for users based on their interaction history. To achieve this, they often incorporate auxiliary information, such as textual descriptions of items and auxiliary tasks, like predicting user preferences and intent. Despite numerous efforts to enhance these models, they still suffer from limited personalization. To address this issue, we propose a new paradigm, which we term preference discerning. In preference dscerning, we explicitly condition a generative sequential recommendation system on user preferences within its context. To this end, we generate user preferences using Large Language Models (LLMs) based on user reviews and item-specific data. To evaluate preference discerning capabilities of sequential recommendation systems, we introduce a novel benchmark that provides a holistic evaluation across various scenarios, including preference steering and sentiment following. We assess current state-of-the-art methods using our benchmark and show that they struggle to accurately discern user preferences. Therefore, we propose a new method named Mender ($\textbf{M}$ultimodal Prefer$\textbf{en}$ce $\textbf{d}$iscern$\textbf{er}$), which improves upon existing methods and achieves state-of-the-art performance on our benchmark. Our results show that Mender can be effectively guided by human preferences even though they have not been observed during training, paving the way toward more personalized sequential recommendation systems. We will open-source the code and benchmarks upon publication.

dataset, recommendation, user preference, (13 more...)

arXiv.org Machine Learning

2412.08604

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(26 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Maldonado, Hugo Monzón, Möllenhoff, Thomas, Daheim, Nico, Gurevych, Iryna, Khan, Mohammad Emtiyaz

How to Weight Multitask Finetuning? Fast Previews via Bayesian Model-Merging

arXiv.org Machine LearningDec-11-2024

When finetuning multiple tasks altogether, it is important to carefully weigh them to get a good performance, but searching for good weights can be difficult and costly. Here, we propose to aid the search with fast previews to quickly get a rough idea of different reweighting options. We use model merging to create previews by simply reusing and averaging parameters of models trained on each task separately (no retraining required). To improve the quality of previews, we propose a Bayesian approach to design new merging strategies by using more flexible posteriors. We validate our findings on vision and natural-language transformers. Our work shows the benefits of model merging via Bayes to improve multitask finetuning.

international conference, multitask, preview, (14 more...)

arXiv.org Machine Learning

2412.08147

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
South America > Uruguay > Maldonado > Maldonado (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Machine LearningDec-11-2024

Group & Reweight: A Novel Cost-Sensitive Approach to Mitigating Class Imbalance in Network Traffic Classification

Du, Wumei, Liang, Dong, Lv, Yiqin, Liang, Xingxing, Wu, Guanlin, Wang, Qi, Xie, Zheng

Internet services have led to the eruption of network traffic, and machine learning on these Internet data has become an indispensable tool, especially when the application is risk-sensitive. This paper focuses on network traffic classification in the presence of severe class imbalance. Such a distributional trait mostly drifts the optimal decision boundary and results in an unsatisfactory solution. This raises safety concerns in the network traffic field when previous class imbalance methods hardly deal with numerous minority malicious classes. To alleviate these effects, we design a \textit{group \& reweight} strategy for alleviating class imbalance. Inspired by the group distributionally optimization framework, our approach heuristically clusters classes into groups, iteratively updates the non-parametric weights for separate classes, and optimizes the learning model by minimizing reweighted losses. We theoretically interpret the optimization process from a Stackelberg game and perform extensive experiments on typical benchmarks. Results show that our approach can not only suppress the negative effect of class imbalance but also improve the comprehensive performance in prediction.

auc, dataset, gdr-cil, (11 more...)

arXiv.org Machine Learning

2409.19214

Country:

Asia > China > Hunan Province > Changsha (0.04)
South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)
Oceania > Australia > New South Wales (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

BBC NewsDec-10-2024, 02:13:08 GMT

Hershey shares jump on Cadbury owner buyout report

Hershey shares jump on Cadbury owner buyout report Getty ImagesMondelez has reportedly made a preliminary approach to the maker of the iconic Hershey's milk chocolate bar Shares in US chocolate maker Hershey have jumped by more than 10% after a report that Mondelez International, which owns UK-based Cadbury, has approached the firm about a potential buyout. A deal could create a snack food giant with combined sales of almost 50bn ( 39.2bn) a year. Both Mondelez and Hershey declined to comment on the report when contacted by BBC News. In 2016, Hershey rejected a 23bn takeover offer from Mondelez. The approach is still in the preliminary stages and it is not certain that talks will lead to a deal, according to Bloomberg.

artificial intelligence, cadbury owner buyout report, hershey share jump, (2 more...)

BBC News

Country:

South America (0.18)
North America > Central America (0.17)
Oceania > Australia (0.07)
(14 more...)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (1.00)

Technology: Information Technology > Artificial Intelligence (0.32)

Observing Micromotives and Macrobehavior of Large Language Models

Cheng, Yuyang, Qu, Xingwei, Goldsack, Tomas, Lin, Chenghua, Chen, Chung-Chi

Thomas C. Schelling, awarded the 2005 Nobel Memorial Prize in Economic Sciences, pointed out that ``individuals decisions (micromotives), while often personal and localized, can lead to societal outcomes (macrobehavior) that are far more complex and different from what the individuals intended.'' The current research related to large language models' (LLMs') micromotives, such as preferences or biases, assumes that users will make more appropriate decisions once LLMs are devoid of preferences or biases. Consequently, a series of studies has focused on removing bias from LLMs. In the NLP community, while there are many discussions on LLMs' micromotives, previous studies have seldom conducted a systematic examination of how LLMs may influence society's macrobehavior. In this paper, we follow the design of Schelling's model of segregation to observe the relationship between the micromotives and macrobehavior of LLMs. Our results indicate that, regardless of the level of bias in LLMs, a highly segregated society will emerge as more people follow LLMs' suggestions. We hope our discussion will spark further consideration of the fundamental assumption regarding the mitigation of LLMs' micromotives and encourage a reevaluation of how LLMs may influence users and society.

large language model, machine learning, natural language, (21 more...)

2412.10428

Country:

Asia > Middle East > Jordan (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > England > South Yorkshire > Sheffield (0.04)
(6 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Zivic, Pablo, Vazquez, Hernan, Sanchez, Jorge

Scaling Sequential Recommendation Models with Transformers

Modeling user preferences has been mainly addressed by looking at users' interaction history with the different elements available in the system. Tailoring content to individual preferences based on historical data is the main goal of sequential recommendation. The nature of the problem, as well as the good performance observed across various domains, has motivated the use of the transformer architecture, which has proven effective in leveraging increasingly larger amounts of training data when accompanied by an increase in the number of model parameters. This scaling behavior has brought a great deal of attention, as it provides valuable guidance in the design and training of even larger models. Taking inspiration from the scaling laws observed in training large language models, we explore similar principles for sequential recommendation. We use the full Amazon Product Data dataset, which has only been partially explored in other studies, and reveal scaling behaviors similar to those found in language models. Compute-optimal training is possible but requires a careful analysis of the compute-performance trade-offs specific to the application. We also show that performance scaling translates to downstream tasks by fine-tuning larger pre-trained models on smaller task-specific domains. Our approach and findings provide a strategic roadmap for model training and deployment in real high-dimensional preference spaces, facilitating better training and inference efficiency. We hope this paper bridges the gap between the potential of transformers and the intrinsic complexities of high-dimensional sequential recommendation in real-world recommender systems. Code and models can be found at https://github.com/mercadolibre/srt

large language model, machine learning, natural language, (19 more...)

doi: 10.1145/3626772.3657816

2412.07585

Country:

North America > United States > District of Columbia > Washington (0.05)
South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
Oceania > Australia > Queensland (0.04)
(23 more...)

Genre:

Overview (0.67)
Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Ontology-Aware RAG for Improved Question-Answering in Cybersecurity Education

Zhao, Chengshuai, Agrawal, Garima, Kumarage, Tharindu, Tan, Zhen, Deng, Yuli, Chen, Ying-Chih, Liu, Huan

Integrating AI into education has the potential to transform the teaching of science and technology courses, particularly in the field of cybersecurity. AI-driven question-answering (QA) systems can actively manage uncertainty in cybersecurity problem-solving, offering interactive, inquiry-based learning experiences. Large language models (LLMs) have gained prominence in AI-driven QA systems, offering advanced language understanding and user engagement. However, they face challenges like hallucinations and limited domain-specific knowledge, which reduce their reliability in educational settings. To address these challenges, we propose CyberRAG, an ontology-aware retrieval-augmented generation (RAG) approach for developing a reliable and safe QA system in cybersecurity education. CyberRAG employs a two-step approach: first, it augments the domain-specific knowledge by retrieving validated cybersecurity documents from a knowledge base to enhance the relevance and accuracy of the response. Second, it mitigates hallucinations and misuse by integrating a knowledge graph ontology to validate the final answer. Experiments on publicly available cybersecurity datasets show that CyberRAG delivers accurate, reliable responses aligned with domain knowledge, demonstrating the potential of AI tools to enhance education.

large language model, machine learning, question answering, (20 more...)

2412.14191

Country:

South America > Uruguay > Maldonado > Maldonado (0.04)
North America > United States > Arizona (0.04)
Asia > Middle East > Jordan (0.04)
(6 more...)

Genre:

Instructional Material (1.00)
Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
(2 more...)

Protocol Learning, Decentralized Frontier Risk and the No-Off Problem

Long, Alexander

Frontier models are currently developed and distributed primarily through two channels: centralized proprietary APIs or open-sourcing of pre-trained weights. We identify a third paradigm - Protocol Learning - where models are trained across decentralized networks of incentivized participants. This approach has the potential to aggregate orders of magnitude more computational resources than any single centralized entity, enabling unprecedented model scales and capabilities. However, it also introduces novel challenges: heterogeneous and unreliable nodes, malicious participants, the need for unextractable models to preserve incentives, and complex governance dynamics. To date, no systematic analysis has been conducted to assess the feasibility of Protocol Learning or the associated risks, particularly the 'No-Off Problem' arising from the inability to unilaterally halt a collectively trained model. We survey recent technical advances that suggest decentralized training may be feasible - covering emerging communication-efficient strategies and fault-tolerant methods - while highlighting critical open problems that remain. Contrary to the notion that decentralization inherently amplifies frontier risks, we argue that Protocol Learning's transparency, distributed governance, and democratized access ultimately reduce these risks compared to today's centralized regimes.

arxiv preprint arxiv, machine learning, natural language, (13 more...)

2412.0789

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Communications (0.93)