AITopics | Atlantic Ocean

Collaborating Authors

Atlantic Ocean

Monet: Mixture of Monosemantic Experts for Transformers

Park, Jungwoo, Ahn, Young Jin, Kim, Kee-Eung, Kang, Jaewoo

arXiv.org Artificial IntelligenceDec-9-2024

Understanding the internal computations of large language models (LLMs) is crucial for aligning them with human values and preventing undesirable behaviors like toxic content generation. However, mechanistic interpretability is hindered by polysemanticity -- where individual neurons respond to multiple, unrelated concepts. While Sparse Autoencoders (SAEs) have attempted to disentangle these features through sparse dictionary learning, they have compromised LLM performance due to reliance on post-hoc reconstruction loss. To address this issue, we introduce Mixture of Monosemantic Experts for Transformers (Monet) architecture, which incorporates sparse dictionary learning directly into end-to-end Mixture-of-Experts pretraining. Our novel expert decomposition method enables scaling the expert count to 262,144 per layer while total parameters scale proportionally to the square root of the number of experts. Our analyses demonstrate mutual exclusivity of knowledge across experts and showcase the parametric knowledge encapsulated within individual experts. Moreover, Monet allows knowledge manipulation over domains, languages, and toxicity mitigation without degrading general performance. Our pursuit of transparent LLMs highlights the potential of scaling expert counts to enhance mechanistic interpretability and directly resect the internal knowledge to fundamentally adjust model behavior. The source code and pretrained checkpoints are available at https://github.com/dmis-lab/Monet.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.04139

Country:

Europe > United Kingdom > England > Staffordshire (0.04)
North America > United States > Florida (0.04)
Oceania > New Zealand (0.04)
(32 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Banking & Finance (1.00)
Government > Regional Government (0.68)
Health & Medicine > Therapeutic Area > Oncology (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

Beyond Scalars: Concept-Based Alignment Analysis in Vision Transformers

Vielhaben, Johanna, Bareeva, Dilyara, Berend, Jim, Samek, Wojciech, Strodthoff, Nils

arXiv.org Artificial IntelligenceDec-9-2024

Vision transformers (ViTs) can be trained using various learning paradigms, from fully supervised to self-supervised. Diverse training protocols often result in significantly different feature spaces, which are usually compared through alignment analysis. However, current alignment measures quantify this relationship in terms of a single scalar value, obscuring the distinctions between common and unique features in pairs of representations that share the same scalar alignment. We address this limitation by combining alignment analysis with concept discovery, which enables a breakdown of alignment into single concepts encoded in feature space. This fine-grained comparison reveals both universal and unique concepts across different representations, as well as the internal structure of concepts within each of them. Our methodological contributions address two key prerequisites for concept-based alignment: 1) For a description of the representation in terms of concepts that faithfully capture the geometry of the feature space, we define concepts as the most general structure they can possibly form - arbitrary manifolds, allowing hidden features to be described by their proximity to these manifolds. 2) To measure distances between concept proximity scores of two representations, we use a generalized Rand index and partition it for alignment between pairs of concepts. We confirm the superiority of our novel concept definition for alignment analysis over existing linear baselines in a sanity check. The concept-based alignment analysis of representations from four different ViTs reveals that increased supervision correlates with a reduction in the semantic structure of learned representations.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Artificial Intelligence

2412.06639

Country:

Europe > United Kingdom > England > Staffordshire (0.04)
Oceania > New Zealand > South Island > Marlborough District > Blenheim (0.04)
North America > United States > Virginia (0.04)
(5 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Leisure & Entertainment > Sports (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Mystery of bizarre drones over New Jersey deepens after new footage of UFOs emerge

Daily Mail - Science & techDec-6-2024, 19:28:45 GMT

New footage of multiple eerie'triangle' craft flying above New Jersey has only compounded the mystery for locals. At least five or possibly six of the unidentified drones were captured in the new, 50-second cell phone video, which one commenter declared was'the clearest video yet.' One drone, heard roaring in the skies as it moved through the darkness, appeared to have a cluster of white lights on its underbelly and red lights blinking at the tips of its wings and tail. Another drone came into frame that resembled a classic'black triangle' UFO or the triangular TR-3B, which beamed bright white lights from its nose, wingtips and tail. Since mid-November, a wave of unexplained drone sightings above central Jersey has left both law enforcement and the general public watching the skies, hunting for clues on what these mysterious night flights might be.

drone, jersey, new jersey, (12 more...)

Daily Mail - Science & tech

Country:

Europe > Jersey (0.88)
North America > United States > New Jersey (0.67)
Europe > Russia (0.06)
(4 more...)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Transportation (0.93)
Aerospace & Defense (0.75)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.99)

Add feedback

Talking Like One of Us: Effects of Using Regional Language in a Humanoid Social Robot

Sievers, Thomas, Russwinkel, Nele

arXiv.org Artificial IntelligenceDec-6-2024

Social robots are becoming more and more perceptible in public service settings. For engaging people in a natural environment a smooth social interaction as well as acceptance by the users are important issues for future successful Human-Robot Interaction (HRI). The type of verbal communication has a special significance here. In this paper we investigate the effects of spoken language varieties of a non-standard / regional language compared to standard language. More precisely we compare a human dialog with a humanoid social robot Pepper where the robot on the one hand is answering in High German and on the other hand in Low German, a regional language that is understood and partly still spoken in the northern parts of Germany. The content of what the robot says remains the same in both variants. We are interested in the effects that these two different ways of robot talk have on human interlocutors who are more or less familiar with Low German in terms of perceived warmth, competence and possible discomfort in conversation against a background of cultural identity. To measure these factors we use the Robotic Social Attributes Scale (RoSAS) on 17 participants with an age ranging from 19 to 61. Our results show that significantly higher warmth is perceived in the Low German version of the conversation.

artificial intelligence, regional language, robot, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-981-99-8718-4_7

2412.05024

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.05)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
(8 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Robots in the Home (0.85)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.52)

Add feedback

CompCap: Improving Multimodal Large Language Models with Composite Captions

Chen, Xiaohui, Shukla, Satya Narayan, Azab, Mahmoud, Singh, Aashu, Wang, Qifan, Yang, David, Peng, ShengYun, Yu, Hanchao, Yan, Shen, Zhang, Xuewen, He, Baosheng

arXiv.org Artificial IntelligenceDec-6-2024

How well can Multimodal Large Language Models (MLLMs) understand composite images? Composite images (CIs) are synthetic visuals created by merging multiple visual elements, such as charts, posters, or screenshots, rather than being captured directly by a camera. While CIs are prevalent in real-world applications, recent MLLM developments have primarily focused on interpreting natural images (NIs). Our research reveals that current MLLMs face significant challenges in accurately understanding CIs, often struggling to extract information or perform complex reasoning based on these images. We find that existing training data for CIs are mostly formatted for question-answer tasks (e.g., in datasets like ChartQA and ScienceQA), while high-quality image-caption datasets, critical for robust vision-language alignment, are only available for NIs. To bridge this gap, we introduce Composite Captions (CompCap), a flexible framework that leverages Large Language Models (LLMs) and automation tools to synthesize CIs with accurate and detailed captions. Using CompCap, we curate CompCap-118K, a dataset containing 118K image-caption pairs across six CI types. We validate the effectiveness of CompCap-118K by supervised fine-tuning MLLMs of three sizes: xGen-MM-inst.-4B and LLaVA-NeXT-Vicuna-7B/13B. Empirical results show that CompCap-118K significantly enhances MLLMs' understanding of CIs, yielding average gains of 1.7%, 2.0%, and 2.9% across eleven benchmarks, respectively.

caption, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.05243

Country:

North America > Mexico (0.14)
North America > The Bahamas (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
(122 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Government (0.93)
Transportation > Passenger (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Retired quarryman uncovers fossilized tyrannosaur teeth

Popular ScienceDec-5-2024, 16:15:00 GMT

For the first time, razor sharp teeth of tyrannosaurs were found in the Bexhill-on-Sea region of coastal East Sussex, England. These teeth indicate that a bevy of carnivorous dinosaurs including tyrannosaurs, spinosaurs, and members of the Velociraptor family stalked this region about 135 million years ago. The findings are detailed in a study published December 5 in the journal Papers in Palaeontology. Breakthroughs, discoveries, and DIY tips sent every weekday. By signing up you agree to our Terms of Service and Privacy Policy.

discovery, quarryman uncover fossilized tyrannosaur teeth, teeth, (9 more...)

Popular Science

Country:

Europe > United Kingdom > England > East Sussex (0.27)
Europe > United Kingdom > England > Isle of Wight (0.05)
Atlantic Ocean > North Atlantic Ocean > English Channel (0.05)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence (0.36)

Add feedback

A Survey on Large Language Model-Based Social Agents in Game-Theoretic Scenarios

Feng, Xiachong, Dou, Longxu, Li, Ella, Wang, Qinghao, Wang, Haochuan, Guo, Yu, Ma, Chang, Kong, Lingpeng

arXiv.org Artificial IntelligenceDec-5-2024

Game-theoretic scenarios have become pivotal in evaluating the social intelligence of Large Language Model (LLM)-based social agents. While numerous studies have explored these agents in such settings, there is a lack of a comprehensive survey summarizing the current progress. To address this gap, we systematically review existing research on LLM-based social agents within game-theoretic scenarios. Our survey organizes the findings into three core components: Game Framework, Social Agent, and Evaluation Protocol. The game framework encompasses diverse game scenarios, ranging from choice-focusing to communication-focusing games. The social agent part explores agents' preferences, beliefs, and reasoning abilities. The evaluation protocol covers both game-agnostic and game-specific metrics for assessing agent performance. By reflecting on the current research and identifying future research directions, this survey provides insights to advance the development and evaluation of social agents in game-theoretic scenarios.

agent, arxiv preprint, scenario, (15 more...)

arXiv.org Artificial Intelligence

2412.0392

Country:

North America > United States > Texas (0.04)
Asia > Middle East > Republic of Türkiye (0.04)
Atlantic Ocean > Mediterranean Sea > Ionian Sea (0.04)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

WxC-Bench: A Novel Dataset for Weather and Climate Downstream Tasks

Shinde, Rajat, Phillips, Christopher E., Ankur, Kumar, Gupta, Aman, Pfreundschuh, Simon, Roy, Sujit, Kirkland, Sheyenne, Gaur, Vishal, Lin, Amy, Sheshadri, Aditi, Nair, Udaysankar, Maskey, Manil, Ramachandran, Rahul

arXiv.org Artificial IntelligenceDec-3-2024

High-quality machine learning (ML)-ready datasets play a foundational role in developing new artificial intelligence (AI) models or fine-tuning existing models for scientific applications such as weather and climate analysis. Unfortunately, despite the growing development of new deep learning models for weather and climate, there is a scarcity of curated, pre-processed machine learning (ML)-ready datasets. Curating such high-quality datasets for developing new models is challenging particularly because the modality of the input data varies significantly for different downstream tasks addressing different atmospheric scales (spatial and temporal). Here we introduce WxC-Bench (Weather and Climate Bench), a multi-modal dataset designed to support the development of generalizable AI models for downstream use-cases in weather and climate research. WxC-Bench is designed as a dataset of datasets for developing ML-models for a complex weather and climate system, addressing selected downstream tasks as machine learning phenomenon. WxC-Bench encompasses several atmospheric processes from meso-$\beta$ (20 - 200 km) scale to synoptic scales (2500 km), such as aviation turbulence, hurricane intensity and track monitoring, weather analog search, gravity wave parameterization, and natural language report generation. We provide a comprehensive description of the dataset and also present a technical validation for baseline analysis. The dataset and code to prepare the ML-ready data have been made publicly available on Hugging Face -- https://huggingface.co/datasets/nasa-impact/WxC-Bench

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.0278

Country:

Atlantic Ocean (0.04)
Pacific Ocean (0.04)
North America > United States > Alabama > Madison County > Huntsville (0.04)
(13 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Air (1.00)
Energy (1.00)
Government > Regional Government > North America Government > United States Government (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Recovering implicit physics model under real-world constraints

Banerjee, Ayan, Gupta, Sandeep K. S.

arXiv.org Artificial IntelligenceDec-3-2024

Recovering a physics-driven model, i.e. a governing set of equations of the underlying dynamical systems, from the real-world data has been of recent interest. Most existing methods either operate on simulation data with unrealistically high sampling rates or require explicit measurements of all system variables, which is not amenable in real-world deployments. Moreover, they assume the timestamps of external perturbations to the physical system are known a priori, without uncertainty, implicitly discounting any sensor time-synchronization or human reporting errors. In this paper, we propose a novel liquid time constant neural network (LTC-NN) based architecture to recover underlying model of physical dynamics from real-world data. The automatic differentiation property of LTC-NN nodes overcomes problems associated with low sampling rates, the input dependent time constant in the forward pass of the hidden layer of LTC-NN nodes creates a massive search space of implicit physical dynamics, the physics model solver based data reconstruction loss guides the search for the correct set of implicit dynamics, and the use of the dropout regularization in the dense layer ensures extraction of the sparsest model. Further, to account for the perturbation timing error, we utilize dense layer nodes to search through input shifts that results in the lowest reconstruction loss. Experiments on four benchmark dynamical systems, three with simulation data and one with the real-world data show that the LTC-NN architecture is more accurate in recovering implicit physics model coefficients than the state-of-the-art sparse model recovery approaches. We also introduce four additional case studies (total eight) on real-life medical examples in simulation and with real-world clinical data to show effectiveness of our approach in recovering underlying model in practice.

architecture, implicit dynamic, neural architecture, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.3233/FAIA240556

2412.02215

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Arizona (0.04)
Atlantic Ocean > North Atlantic Ocean > Hudson Bay (0.04)

Genre: Research Report > Experimental Study (0.34)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Health Care Technology (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

CultureLLM: Incorporating Cultural Differences into Large Language Models

Li, Cheng, Chen, Mengzhou, Wang, Jindong, Sitaram, Sunayana, Xie, Xing

arXiv.org Artificial IntelligenceDec-3-2024

Large language models (LLMs) are reported to be partial to certain cultures owing to the training data dominance from the English corpora. Since multilingual cultural data are often expensive to collect, existing efforts handle this by prompt engineering or culture-specific pre-training. However, they might overlook the knowledge deficiency of low-resource culture and require extensive computing resources. In this paper, we propose CultureLLM, a cost-effective solution to incorporate cultural differences into LLMs. CultureLLM adopts World Value Survey (WVS) as seed data and generates semantically equivalent training data via the proposed semantic data augmentation. Using only 50 seed samples from WVS with augmented data, we fine-tune culture-specific LLMs and one unified model (CultureLLM-One) for 9 cultures covering rich and low-resource languages. Extensive experiments on 60 culture-related datasets demonstrate that CultureLLM significantly outperforms various counterparts such as GPT-3.5 (by 8.1%) and Gemini Pro (by 9.5%) with comparable performance to GPT-4 or even better. Our human study shows that the generated samples are semantically equivalent to the original samples, providing an effective solution for LLMs augmentation. Code is released at https://github.com/Scarelette/CultureLLM.

culturellm, dataset, detection, (13 more...)

arXiv.org Artificial Intelligence

2402.10946

Country:

Asia > Middle East > Republic of Türkiye (0.14)
Europe > Portugal (0.04)
Asia > China (0.04)
(36 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Media > News (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback