AITopics | representation loss

Collaborating Authors

representation loss

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding Junda Wu1 Tong Y u 2 Rui Wang 3 Zhao Song

Neural Information Processing SystemsFeb-16-2026, 22:06:54 GMT

However, recent works reveal that it is non-trivial to find a proper initialization of the prompt tokens.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding Junda Wu1 Tong Y u 2 Rui Wang 3 Zhao Song

Neural Information Processing SystemsOct-9-2025, 06:29:07 GMT

However, recent works reveal that it is non-trivial to find a proper initialization of the prompt tokens.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

KANITE: Kolmogorov-Arnold Networks for ITE estimation

Mehendale, Eshan, Thorat, Abhinav, Kolla, Ravi, Pedanekar, Niranjan

arXiv.org Artificial IntelligenceMar-18-2025

We introduce KANITE, a framework leveraging Kolmogorov-Arnold Networks (KANs) for Individual Treatment Effect (ITE) estimation under multiple treatments setting in causal inference. By utilizing KAN's unique abilities to learn univariate activation functions as opposed to learning linear weights by Multi-Layer Perceptrons (MLPs), we improve the estimates of ITEs. The KANITE framework comprises two key architectures: 1.Integral Probability Metric (IPM) architecture: This employs an IPM loss in a specialized manner to effectively align towards ITE estimation across multiple treatments. 2. Entropy Balancing (EB) architecture: This uses weights for samples that are learned by optimizing entropy subject to balancing the covariates across treatment groups. Extensive evaluations on benchmark datasets demonstrate that KANITE outperforms state-of-the-art algorithms in both $\epsilon_{\text{PEHE}}$ and $\epsilon_{\text{ATE}}$ metrics. Our experiments highlight the advantages of KANITE in achieving improved causal estimates, emphasizing the potential of KANs to advance causal inference methodologies across diverse application areas.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2503.13912

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

Improving Knowledge Distillation for BERT Models: Loss Functions, Mapping Methods, and Weight Tuning

Dankar, Apoorv, Jassani, Adeem, Kumar, Kartikaeya

arXiv.org Artificial IntelligenceAug-26-2023

The use of large transformer-based models such as BERT, GPT, and T5 has led to significant advancements in natural language processing. However, these models are computationally expensive, necessitating model compression techniques that reduce their size and complexity while maintaining accuracy. This project investigates and applies knowledge distillation for BERT model compression, specifically focusing on the TinyBERT student model. We explore various techniques to improve knowledge distillation, including experimentation with loss functions, transformer layer mapping methods, and tuning the weights of attention and representation loss and evaluate our proposed techniques on a selection of downstream tasks from the GLUE benchmark. The goal of this work is to improve the efficiency and effectiveness of knowledge distillation, enabling the development of more efficient and accurate models for a range of natural language processing tasks.

experiment, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2308.13958

Country:

North America > Canada > Ontario > Toronto (0.05)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report > New Finding (0.69)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding

Wu, Junda, Yu, Tong, Wang, Rui, Song, Zhao, Zhang, Ruiyi, Zhao, Handong, Lu, Chaochao, Li, Shuai, Henao, Ricardo

arXiv.org Artificial IntelligenceJun-8-2023

Soft prompt tuning achieves superior performances across a wide range of few-shot tasks. However, the performances of prompt tuning can be highly sensitive to the initialization of the prompts. We also empirically observe that conventional prompt tuning methods cannot encode and learn sufficient task-relevant information from prompt tokens. In this work, we develop an information-theoretic framework that formulates soft prompt tuning as maximizing mutual information between prompts and other model parameters (or encoded representations). This novel view helps us to develop a more efficient, accurate and robust soft prompt tuning method InfoPrompt. With this framework, we develop two novel mutual information based loss functions, to (i) discover proper prompt initialization for the downstream tasks and learn sufficient task-relevant information from prompt tokens and (ii) encourage the output representation from the pretrained language model to be more aware of the task-relevant information captured in the learnt prompt. Extensive experiments validate that InfoPrompt can significantly accelerate the convergence of the prompt tuning and outperform traditional prompt tuning methods. Finally, we provide a formal theoretical result for showing to show that gradient descent type algorithm can be used to train our mutual information loss.

information, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2306.04933

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Virginia (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

AVIDA: Alternating method for Visualizing and Integrating Data

Dover, Kathryn, Cang, Zixuan, Ma, Anna, Nie, Qing, Vershynin, Roman

arXiv.org Artificial IntelligenceApr-7-2023

High-dimensional multimodal data arises in many scientific fields. The integration of multimodal data becomes challenging when there is no known correspondence between the samples and the features of different datasets. To tackle this challenge, we introduce AVIDA, a framework for simultaneously performing data alignment and dimension reduction. In the numerical experiments, Gromov-Wasserstein optimal transport and t-distributed stochastic neighbor embedding are used as the alignment and dimension reduction modules respectively. We show that AVIDA correctly aligns high-dimensional datasets without common features with four synthesized datasets and two real multimodal single-cell datasets. Compared to several existing methods, we demonstrate that AVIDA better preserves structures of individual datasets, especially distinct local structures in the joint low-dimensional visualization, while achieving comparable alignment performance. Such a property is important in multimodal single-cell data analysis as some biological processes are uniquely captured by one of the datasets. In general applications, other methods can be used for the alignment and dimension reduction modules.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2206.00135

Country:

North America > United States > North Carolina (0.04)
North America > United States > California > Orange County > Irvine (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Add feedback

Hidden Agenda: a Social Deduction Game with Diverse Learned Equilibria

Kopparapu, Kavya, Duéñez-Guzmán, Edgar A., Matyas, Jayd, Vezhnevets, Alexander Sasha, Agapiou, John P., McKee, Kevin R., Everett, Richard, Marecki, Janusz, Leibo, Joel Z., Graepel, Thore

arXiv.org Artificial IntelligenceJan-5-2022

A key challenge in the study of multiagent cooperation is the need for individual agents not only to cooperate effectively, but to decide with whom to cooperate. This is particularly critical in situations when other agents have hidden, possibly misaligned motivations and goals. Social deduction games offer an avenue to study how individuals might learn to synthesize potentially unreliable information about others, and elucidate their true motivations. In this work, we present Hidden Agenda, a two-team social deduction game that provides a 2D environment for studying learning agents in scenarios of unknown team alignment. The environment admits a rich set of strategies for both teams. Reinforcement learning agents trained in Hidden Agenda show that agents can learn a variety of behaviors, including partnering and voting without need for communication in natural language.

agent, crewmate, impostor, (15 more...)

arXiv.org Artificial Intelligence

2201.01816

Country:

South America > Brazil > São Paulo (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Online Selection of Diverse Committees

Do, Virginie, Atif, Jamal, Lang, Jérôme, Usunier, Nicolas

arXiv.org Artificial IntelligenceMay-19-2021

Citizens' assemblies need to represent subpopulations according to their proportions in the general population. These large committees are often constructed in an online fashion by contacting people, asking for the demographic features of the volunteers, and deciding to include them or not. This raises a trade-off between the number of people contacted (and the incurring cost) and the representativeness of the committee. We study three methods, theoretically and experimentally: a greedy algorithm that includes volunteers as long as proportionality is not violated; a non-adaptive method that includes a volunteer with a probability depending only on their features, assuming that the joint feature distribution in the volunteer pool is known; and a reinforcement learning based approach when this distribution is not known a priori but learnt online.

constraint, representation loss, sample complexity, (16 more...)

arXiv.org Artificial Intelligence

2105.09295

Country:

Europe > United Kingdom (0.47)
Europe > France (0.14)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States (0.04)

Genre: Research Report (1.00)

Industry: Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)

Add feedback

Weakly-paired Cross-Modal Hashing

Liu, Xuanwu, Wang, Jun, Yu, Guoxian, Domeniconi, Carlotta, Zhang, Xiangliang

arXiv.org Machine LearningMay-28-2019

Hashing has been widely adopted for large-scale data retrieval in many domains, due to its low storage cost and high retrieval speed. Existing cross-modal hashing methods optimistically assume that the correspondence between training samples across modalities are readily available. This assumption is unrealistic in practical applications. In addition, these methods generally require the same number of samples across different modalities, which restricts their flexibility. We propose a flexible cross-modal hashing approach (Flex-CMH) to learn effective hashing codes from weakly-paired data, whose correspondence across modalities are partially (or even totally) unknown. FlexCMH first introduces a clustering-based matching strategy to explore the local structure of each cluster, and thus to find the potential correspondence between clusters (and samples therein) across modalities. To reduce the impact of an incomplete correspondence, it jointly optimizes in a unified objective function the potential correspondence, the cross-modal hashing functions derived from the correspondence, and a hashing quantitative loss. An alternative optimization technique is also proposed to coordinate the correspondence and hash functions, and to reinforce the reciprocal effects of the two objectives. Experiments on publicly multi-modal datasets show that FlexCMH achieves significantly better results than state-of-the-art methods, and it indeed offers a high degree of flexibility for practical cross-modal hashing tasks.

artificial intelligence, machine learning, modality, (16 more...)

arXiv.org Machine Learning

1905.12203

Country:

North America > United States > Alaska > Anchorage Municipality > Anchorage (0.05)
Asia > China (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

Add feedback