AITopics | dset

Collaborating Authors

dset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Session-based Recommender Systems: User Interest as a Stochastic Process in the Latent Space

Balcer, Klaudia, Lipinski, Piotr

arXiv.org Machine LearningApr-14-2025

This paper jointly addresses the problem of data uncertainty, popularity bias, and exposure bias in session-based recommender systems. We study the symptoms of this bias both in item embeddings and in recommendations. We propose treating user interest as a stochastic process in the latent space and providing a model-agnostic implementation of this mathematical concept. The proposed stochastic component consists of elements: debiasing item embeddings with regularization for embedding uniformity, modeling dense user interest from session prefixes, and introducing fake targets in the data to simulate extended exposure. We conducted computational experiments on two popular benchmark datasets, Diginetica and YooChoose 1/64, as well as several modifications of the YooChoose dataset with different ratios of popular items. The results show that the proposed approach allows us to mitigate the challenges mentioned.

artificial intelligence, machine learning, recommendation, (17 more...)

arXiv.org Machine Learning

2504.10005

Country: Europe > Poland > Lower Silesia Province > Wroclaw (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Dynamics of "Spontaneous" Topic Changes in Next Token Prediction with Self-Attention

Jia, Mumin, Diaz-Rodriguez, Jairo

arXiv.org Machine LearningJan-10-2025

Human cognition can spontaneously shift conversation topics, often triggered by emotional or contextual signals. In contrast, self-attention-based language models depend on structured statistical cues from input tokens for next-token prediction, lacking this spontaneity. Motivated by this distinction, we investigate the factors that influence the next-token prediction to change the topic of the input sequence. We define concepts of topic continuity, ambiguous sequences, and change of topic, based on defining a topic as a set of token priority graphs (TPGs). Using a simplified single-layer self-attention architecture, we derive analytical characterizations of topic changes. Specifically, we demonstrate that (1) the model maintains the priority order of tokens related to the input topic, (2) a topic change occurs only if lower-priority tokens outnumber all higher-priority tokens of the input topic, and (3) unlike human cognition, longer context lengths and overlapping topics reduce the likelihood of spontaneous redirection. These insights highlight differences between human cognition and self-attention-based models in navigating topic changes and underscore the challenges in designing conversational AI capable of handling "spontaneous" conversations more naturally. To our knowledge, this is the first work to address these questions in such close relation to human conversation and thought.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2501.06382

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Mechanics of Next Token Prediction with Self-Attention

Li, Yingcong, Huang, Yixiao, Ildiz, M. Emrullah, Rawat, Ankit Singh, Oymak, Samet

arXiv.org Artificial IntelligenceMar-12-2024

Transformer-based language models are trained on large datasets to predict the next token given an input sequence. Despite this simple training objective, they have led to revolutionary advances in natural language processing. Underlying this success is the self-attention mechanism. In this work, we ask: $\textit{What}$ $\textit{does}$ $\textit{a}$ $\textit{single}$ $\textit{self-attention}$ $\textit{layer}$ $\textit{learn}$ $\textit{from}$ $\textit{next-token}$ $\textit{prediction?}$ We show that training self-attention with gradient descent learns an automaton which generates the next token in two distinct steps: $\textbf{(1)}$ $\textbf{Hard}$ $\textbf{retrieval:}$ Given input sequence, self-attention precisely selects the $\textit{high-priority}$ $\textit{input}$ $\textit{tokens}$ associated with the last input token. $\textbf{(2)}$ $\textbf{Soft}$ $\textbf{composition:}$ It then creates a convex combination of the high-priority tokens from which the next token can be sampled. Under suitable conditions, we rigorously characterize these mechanics through a directed graph over tokens extracted from the training data. We prove that gradient descent implicitly discovers the strongly-connected components (SCC) of this graph and self-attention learns to retrieve the tokens that belong to the highest-priority SCC available in the context window. Our theory relies on decomposing the model weights into a directional component and a finite component that correspond to hard retrieval and soft composition steps respectively. This also formalizes a related implicit bias formula conjectured in [Tarzanagh et al. 2023]. We hope that these findings shed light on how self-attention processes sequential data and pave the path toward demystifying more complex architectures.

dset, fin, graph-svm, (13 more...)

arXiv.org Artificial Intelligence

2403.08081

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Minimize Control Inputs for Strong Structural Controllability Using Reinforcement Learning with Graph Neural Network

Zou, Mengbang, Guo, Weisi, Jin, Bailu

arXiv.org Artificial IntelligenceFeb-26-2024

Strong structural controllability (SSC) guarantees networked system with linear-invariant dynamics controllable for all numerical realizations of parameters. Current research has established algebraic and graph-theoretic conditions of SSC for zero/nonzero or zero/nonzero/arbitrary structure. One relevant practical problem is how to fully control the system with the minimal number of input signals and identify which nodes must be imposed signals. Previous work shows that this optimization problem is NP-hard and it is difficult to find the solution. To solve this problem, we formulate the graph coloring process as a Markov decision process (MDP) according to the graph-theoretical condition of SSC for both zero/nonzero and zero/nonzero/arbitrary structure. We use Actor-critic method with Directed graph neural network which represents the color information of graph to optimize MDP. Our method is validated in a social influence network with real data and different complex network models. We find that the number of input nodes is determined by the average degree of the network and the input nodes tend to select nodes with low in-degree and avoid high-degree nodes.

dset, graph, node, (15 more...)

arXiv.org Artificial Intelligence

2402.16925

Country:

Europe > United Kingdom (0.04)
Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Interpretation and Simplification of Deep Forest

Kim, Sangwon, Jeong, Mira, Ko, Byoung Chul

arXiv.org Artificial IntelligenceJan-14-2020

This paper proposes a new method for interpreting and simplifying a black box model of a deep random forest (RF) using a proposed rule elimination. In deep RF, a large number of decision trees are connected to multiple layers, thereby making an analysis difficult. It has a high performance similar to that of a deep neural network (DNN), but achieves a better generalizability. Therefore, in this study, we consider quantifying the feature contributions and frequency of the fully trained deep RF in the form of a decision rule set. The feature contributions provide a basis for determining how features affect the decision process in a rule set. Model simplification is achieved by eliminating unnecessary rules by measuring the feature contributions. Consequently, the simplified model has fewer parameters and rules than before. Experiment results have shown that a feature contribution analysis allows a black box model to be decomposed for quantitatively interpreting a rule set. The proposed method was successfully applied to various deep RF models and benchmark datasets while maintaining a robust performance despite the elimination of a large number of rules.

contribution, feature contribution, ilmrf, (16 more...)

arXiv.org Artificial Intelligence

2001.04721

Country:

North America > United States > Wisconsin (0.04)
Asia > South Korea > Daegu > Daegu (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation (0.55)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

tensorflow/models

#artificialintelligenceDec-26-2019, 01:10:17 GMT

This is an implementation of the keypoint network proposed in "Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning [pdf]". Given a single 2D image of a known class, this network can predict a set of 3D keypoints that are consistent across viewing angles of the same object and across object instances. These keypoints and their detectors are discovered and learned automatically without keypoint location supervision [demo]. We trained the network using the total batch size of 256 (8 x 32 replicas). You may have to tune the learning rate if your batch size is different.

batch size, folder, tensorflow model, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.75)

Add feedback

Value Elimination: Bayesian Inference via Backtracking Search

Bacchus, Fahiem, Dalmao, Shannon, Pitassi, Toniann

arXiv.org Artificial IntelligenceOct-19-2012

Backtracking search is a powerful algorithmic paradigm that can be used to solve many problems. It is in a certain sense the dual of variable elimination; but on many problems, e.g., SAT, it is vastly superior to variable elimination in practice. Motivated by this we investigate the application of backtracking search to the problem of Bayesian inference (Bayes). We show that natural generalizations of known techniques allow backtracking search to achieve performance guarantees similar to standard algorithms for Bayes, and that there exist problems on which backtracking can in fact do much better. We also demonstrate that these ideas can be applied to implement a Bayesian inference engine whose performance is competitive with standard algorithms. Since backtracking search can very naturally take advantage of context specific structure, the potential exists for performance superior to standard algorithms on many problems.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1212.2452

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Add feedback

Multi-dynamic Bayesian Networks

Filali, Karim, Bilmes, Jeff A.

Neural Information Processing SystemsDec-31-2007

We present a generalization of dynamic Bayesian networks to concisely describe complex probability distributions such as in problems with multiple interacting variable-length streams of random variables. Our framework incorporates recent graphical model constructs to account for existence uncertainty, value-specific independence, aggregation relationships, and local and global constraints, while still retaining a Bayesian network interpretation and efficient inference and learning techniques. We introduce one such general technique, which is an extension of Value Elimination, a backtracking search inference algorithm. Multi-dynamic Bayesian networks are motivated by our work on Statistical Machine Translation (MT). We present results on MT word alignment in support of our claim that MDBNs are a promising framework for the rapid prototyping of new MT systems.

constraint, dset, mdbn, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Africa > Middle East > Egypt > Giza Governorate > Giza (0.06)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Multi-dynamic Bayesian Networks

Filali, Karim, Bilmes, Jeff A.

Neural Information Processing SystemsDec-31-2007

constraint, dset, mdbn, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Africa > Middle East > Egypt > Giza Governorate > Giza (0.06)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Multi-dynamic Bayesian Networks

Filali, Karim, Bilmes, Jeff A.

Neural Information Processing SystemsDec-31-2007

We present a generalization of dynamic Bayesian networks to concisely describe complex probability distributions such as in problems with multiple interacting variable-length streams of random variables. Our framework incorporates recent graphical model constructs to account for existence uncertainty, value-specific independence, aggregation relationships, and local and global constraints, while still retaining a Bayesian network interpretation and efficient inference and learning techniques.We introduce one such general technique, which is an extension of Value Elimination, a backtracking search inference algorithm. Multi-dynamic Bayesian networks are motivated by our work on Statistical Machine Translation (MT).We present results on MT word alignment in support of our claim that MDBNs are a promising framework for the rapid prototyping of new MT systems.

artificial intelligence, constraint, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback