nfa
On the Neural Feature Ansatz for Deep Neural Networks
Tansley, Edward, Massart, Estelle, Cartis, Coralia
Understanding feature learning is an important open question in establishing a mathematical foundation for deep neural networks. The Neural Feature Ansatz (NFA) states that after training, the Gram matrix of the first-layer weights of a deep neural network is proportional to some power $α>0$ of the average gradient outer product (AGOP) of this network with respect to its inputs. Assuming gradient flow dynamics with balanced weight initialization, the NFA was proven to hold throughout training for two-layer linear networks with exponent $α= 1/2$ (Radhakrishnan et al., 2024). We extend this result to networks with $L \geq 2$ layers, showing that the NFA holds with exponent $α= 1/L$, thus demonstrating a depth dependency of the NFA. Furthermore, we prove that for unbalanced initialization, the NFA holds asymptotically through training if weight decay is applied. We also provide counterexamples showing that the NFA does not hold for some network architectures with nonlinear activations, even when these networks fit arbitrarily well the training data. We thoroughly validate our theoretical results through numerical experiments across a variety of optimization algorithms, weight decay rates and initialization schemes.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
Inconsistency Handling in DatalogMTL
Bienvenu, Meghyn, Bourgaux, Camille, Khodadaditaghanaki, Atefe
In this paper, we explore the issue of inconsistency handling in DatalogMTL, an extension of Datalog with metric temporal operators. Since facts are associated with time intervals, there are different manners to restore consistency when they contradict the rules, such as removing facts or modifying their time intervals. Our first contribution is the definition of relevant notions of conflicts (minimal explanations for inconsistency) and repairs (possible ways of restoring consistency) for this setting and the study of the properties of these notions and the associated inconsistency-tolerant semantics. Our second contribution is a data complexity analysis of the tasks of generating a single conflict / repair and query entailment under repair-based semantics.
Deconstructing Subset Construction -- Reducing While Determinizing
We present a novel perspective on the NFA canonization problem, which introduces intermediate minimization steps to reduce the exploration space on-the-fly. Essential to our approach are so-called equivalence registries which manage information about equivalent states and allow for incorporating further optimization techniques such as convexity closures or simulation to boost performance. Due to the generality of our approach, these concepts can be embedded in classic subset construction or Brzozowski's approach. We evaluate our approach on a set of real-world examples from automatic sequences and observe that we are able to improve especially worst-case scenarios. We implement our approach in an open-source library for users to experiment with.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Middle East > Cyprus > Pafos > Paphos (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (10 more...)
Data Complexity in Expressive Description Logics With Path Expressions
We investigate the data complexity of the satisfiability problem for the very expressive description logic ZOIQ (a.k.a. ALCHb Self reg OIQ) over quasi-forests and establish its NP-completeness. This completes the data complexity landscape for decidable fragments of ZOIQ, and reproves known results on decidable fragments of OWL2 (SR family). Using the same technique, we establish coNEXPTIME-completeness (w.r.t. the combined complexity) of the entailment problem of rooted queries in ZIQ.
- Europe > Austria > Vienna (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (11 more...)
Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks
Beaglehole, Daniel, Mitliagkas, Ioannis, Agarwala, Atish
Understanding the mechanisms through which neural networks extract statistics from input-label pairs is one of the most important unsolved problems in supervised learning. Prior works have identified that the gram matrices of the weights in trained neural networks of general architectures are proportional to the average gradient outer product of the model, in a statement known as the Neural Feature Ansatz (NFA). However, the reason these quantities become correlated during training is poorly understood. In this work, we explain the emergence of this correlation. We identify that the NFA is equivalent to alignment between the left singular structure of the weight matrices and a significant component of the empirical neural tangent kernels associated with those weights. We establish that the NFA introduced in prior works is driven by a centered NFA that isolates this alignment. We show that the speed of NFA development can be predicted analytically at early training times in terms of simple statistics of the inputs and labels. Finally, we introduce a simple intervention to increase NFA correlation at any given layer, which dramatically improves the quality of features learned.
- Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation
Beurer-Kellner, Luca, Fischer, Marc, Vechev, Martin
To ensure that text generated by large language models (LLMs) is in an expected format, constrained decoding proposes to enforce strict formal language constraints during generation. However, as we show in this work, not only do such methods incur performance overhead during generation, but many of them also significantly impair task accuracy, if they do not correctly align the underlying LLM sub-word vocabularies with external constraints. To address this, we present a novel decoding algorithm, DOMINO, that can enforce constraints in a fully subword-aligned fashion, while leveraging pre-computation and speculative decoding to achieve virtually no overhead and in some cases even almost 2$\times$ speedup over unconstrained decoding -- thereby outperforming existing approaches by a wide margin.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
A General Framework for Learning from Weak Supervision
Chen, Hao, Wang, Jindong, Feng, Lei, Li, Xiang, Wang, Yidong, Xie, Xing, Sugiyama, Masashi, Singh, Rita, Raj, Bhiksha
Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment. This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm. Central to GLWS is an Expectation-Maximization (EM) formulation, adeptly accommodating various weak supervision sources, including instance partial labels, aggregate statistics, pairwise observations, and unlabeled data. We further present an advanced algorithm that significantly simplifies the EM computational demands using a Non-deterministic Finite Automaton (NFA) along with a forward-backward algorithm, which effectively reduces time complexity from quadratic or factorial often required in existing solutions to linear scale. The problem of learning from arbitrary weak supervision is therefore converted to the NFA modeling of them. GLWS not only enhances the scalability of machine learning models but also demonstrates superior performance and versatility across 11 weak supervision scenarios. We hope our work paves the way for further advancements and practical deployment in this field.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
- (2 more...)
Directed Regular and Context-Free Languages
Ganardi, Moses, Saglam, Irmak, Zetzsche, Georg
We study the problem of deciding whether a given language is directed. A language $L$ is \emph{directed} if every pair of words in $L$ have a common (scattered) superword in $L$. Deciding directedness is a fundamental problem in connection with ideal decompositions of downward closed sets. Another motivation is that deciding whether two \emph{directed} context-free languages have the same downward closures can be decided in polynomial time, whereas for general context-free languages, this problem is known to be coNEXP-complete. We show that the directedness problem for regular languages, given as NFAs, belongs to $AC^1$, and thus polynomial time. Moreover, it is NL-complete for fixed alphabet sizes. Furthermore, we show that for context-free languages, the directedness problem is PSPACE-complete.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Italy > Lazio > Rome (0.04)
- Europe > Germany > Saarland > Saarbrücken (0.04)
- (16 more...)
Taking advantage of a very simple property to efficiently infer NFAs
Jastrzab, Tomasz, Lardeux, Frédéric, Monfroy, Eric
Grammatical inference consists in learning a formal grammar as a finite state machine or as a set of rewrite rules. In this paper, we are concerned with inferring Nondeterministic Finite Automata (NFA) that must accept some words, and reject some other words from a given sample. This problem can naturally be modeled in SAT. The standard model being enormous, some models based on prefixes, suffixes, and hybrids were designed to generate smaller SAT instances. There is a very simple and obvious property that says: if there is an NFA of size k for a given sample, there is also an NFA of size k+1. We first strengthen this property by adding some characteristics to the NFA of size k+1. Hence, we can use this property to tighten the bounds of the size of the minimal NFA for a given sample. We then propose simplified and refined models for NFA of size k+1 that are smaller than the initial models for NFA of size k. We also propose a reduction algorithm to build an NFA of size k from a specific NFA of size k+1. Finally, we validate our proposition with some experimentation that shows the efficiency of our approach.
- Europe > France (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.34)
Automaton-Guided Control Synthesis for Signal Temporal Logic Specifications
Ho, Qi Heng, Ilyes, Roland B., Sunberg, Zachary N., Lahijanian, Morteza
This paper presents an algorithmic framework for control synthesis of continuous dynamical systems subject to signal temporal logic (STL) specifications. We propose a novel algorithm to obtain a time-partitioned finite automaton from an STL specification, and introduce a multi-layered framework that utilizes this automaton to guide a sampling-based search tree both spatially and temporally. Our approach is able to synthesize a controller for nonlinear dynamics and polynomial predicate functions. We prove the correctness and probabilistic completeness of our algorithm, and illustrate the efficiency and efficacy of our framework on several case studies. Our results show an order of magnitude speedup over the state of the art.
- North America > United States > Colorado > Boulder County > Boulder (0.14)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)