AITopics | onet

Collaborating Authors

onet

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Monet: Mixture of Monosemantic Experts for Transformers

Park, Jungwoo, Ahn, Young Jin, Kim, Kee-Eung, Kang, Jaewoo

arXiv.org Artificial IntelligenceDec-9-2024

Understanding the internal computations of large language models (LLMs) is crucial for aligning them with human values and preventing undesirable behaviors like toxic content generation. However, mechanistic interpretability is hindered by polysemanticity -- where individual neurons respond to multiple, unrelated concepts. While Sparse Autoencoders (SAEs) have attempted to disentangle these features through sparse dictionary learning, they have compromised LLM performance due to reliance on post-hoc reconstruction loss. To address this issue, we introduce Mixture of Monosemantic Experts for Transformers (Monet) architecture, which incorporates sparse dictionary learning directly into end-to-end Mixture-of-Experts pretraining. Our novel expert decomposition method enables scaling the expert count to 262,144 per layer while total parameters scale proportionally to the square root of the number of experts. Our analyses demonstrate mutual exclusivity of knowledge across experts and showcase the parametric knowledge encapsulated within individual experts. Moreover, Monet allows knowledge manipulation over domains, languages, and toxicity mitigation without degrading general performance. Our pursuit of transparent LLMs highlights the potential of scaling expert counts to enhance mechanistic interpretability and directly resect the internal knowledge to fundamentally adjust model behavior. The source code and pretrained checkpoints are available at https://github.com/dmis-lab/Monet.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.04139

Country:

Europe > United Kingdom > England > Staffordshire (0.04)
North America > United States > Florida (0.04)
Oceania > New Zealand (0.04)
(32 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Banking & Finance (1.00)
Government > Regional Government (0.68)
Health & Medicine > Therapeutic Area > Oncology (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

Dish-TS: A General Paradigm for Alleviating Distribution Shift in Time Series Forecasting

Fan, Wei, Wang, Pengyang, Wang, Dongkun, Wang, Dongjie, Zhou, Yuanchun, Fu, Yanjie

arXiv.org Artificial IntelligenceMar-14-2023

The distribution shift in Time Series Forecasting (TSF), indicating series distribution changes over time, largely hinders the performance of TSF models. Existing works towards distribution shift in time series are mostly limited in the quantification of distribution and, more importantly, overlook the potential shift between lookback and horizon windows. To address above challenges, we systematically summarize the distribution shift in TSF into two categories. Regarding lookback windows as input-space and horizon windows as output-space, there exist (i) intra-space shift, that the distribution within the input-space keeps shifted over time, and (ii) inter-space shift, that the distribution is shifted between input-space and output-space. Then we introduce, Dish-TS, a general neural paradigm for alleviating distribution shift in TSF. Specifically, for better distribution estimation, we propose the coefficient net (CONET), which can be any neural architectures, to map input sequences into learnable distribution coefficients. To relieve intra-space and inter-space shift, we organize Dish-TS as a Dual-CONET framework to separately learn the distribution of input- and output-space, which naturally captures the distribution difference of two spaces. In addition, we introduce a more effective training strategy for intractable CONET learning. Finally, we conduct extensive experiments on several datasets coupled with different state-of-the-art forecasting models. Experimental results show Dish-TS consistently boosts them with a more than 20% average improvement. Code is available.

data mining, forecasting, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2302.14829

Country:

North America > United States (0.14)
Asia > Macao (0.04)
Asia > Middle East > Republic of Türkiye (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Operator Learning Framework for Digital Twin and Complex Engineering Systems

Kobayashi, Kazuma, Daniell, James, Alam, Syed B.

arXiv.org Artificial IntelligenceJan-18-2023

With modern computational advancements and statistical analysis methods, machine learning algorithms have become a vital part of engineering modeling. Neural Operator Networks (ONets) is an emerging machine learning algorithm as a "faster surrogate" for approximating solutions to partial differential equations (PDEs) due to their ability to approximate mathematical operators versus the direct approximation of Neural Networks (NN). ONets use the Universal Approximation Theorem to map finite-dimensional inputs to infinite-dimensional space using the branch-trunk architecture, which encodes domain and feature information separately before using a dot product to combine the information. ONets are expected to occupy a vital niche for surrogate modeling in physical systems and Digital Twin (DT) development. Three test cases are evaluated using ONets for operator approximation, including a 1-dimensional ordinary differential equations (ODE), general diffusion system, and convection-diffusion (Burger) system. Solutions for ODE and diffusion systems yield accurate and reliable results (R2>0.95), while solutions for Burger systems need further refinement in the ONet algorithm.

artificial intelligence, machine learning, onet, (15 more...)

arXiv.org Artificial Intelligence

2301.06701

Country:

Asia > India > Tripura (0.05)
North America > United States > Missouri > Phelps County > Rolla (0.04)

Genre: Research Report (0.65)

Industry:

Energy > Power Industry > Utilities > Nuclear (0.93)
Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OTyper: A Neural Architecture for Open Named Entity Typing

Yuan, Zheng (Northwestern University) | Downey, Doug (Northwestern University)

AAAI ConferencesFeb-8-2018

Named Entity Typing (NET) is valuable for many natural language processing tasks, such as relation extraction, question answering, knowledge base population, and co-reference resolution. Classical NET targeted a few coarse-grained types, but the task has expanded to sets of hundreds of types in recent years. Existing work in NET assumes that the target types are specified in advance, and that hand-labeled examples of each type are available. In this work, we introduce the task of Open Named Entity Typing (ONET), which is NET when the set of target types is not known in advance. We propose a neural network architecture for ONET, called OTyper, and evaluate its ability to tag entities with types not seen in training. On the benchmark FIGER(GOLD) dataset, OTyper achieves a weighted AUC-ROC score of 0.870 on unseen types, substantially outperforming pattern- and embedding-based baselines.

machine learning, natural language, otyper, (18 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States > Illinois > Cook County (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback