AITopics | angluin

Collaborating Authors

angluin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Differentially Private Language Generation and Identification in the Limit

Mehrotra, Anay, Velegkas, Grigoris, Yu, Xifan, Zhou, Felix

arXiv.org Machine LearningApr-10-2026

We initiate the study of language generation in the limit, a model recently introduced by Kleinberg and Mullainathan [KM24], under the constraint of differential privacy. We consider the continual release model, where a generator must eventually output a stream of valid strings while protecting the privacy of the entire input sequence. Our first main result is that for countable collections of languages, privacy comes at no qualitative cost: we provide an $\varepsilon$-differentially-private algorithm that generates in the limit from any countable collection. This stands in contrast to many learning settings where privacy renders learnability impossible. However, privacy does impose a quantitative cost: there are finite collections of size $k$ for which uniform private generation requires $Ω(k/\varepsilon)$ samples, whereas just one sample suffices non-privately. We then turn to the harder problem of language identification in the limit. Here, we show that privacy creates fundamental barriers. We prove that no $\varepsilon$-DP algorithm can identify a collection containing two languages with an infinite intersection and a finite set difference, a condition far stronger than the classical non-private characterization of identification. Next, we turn to the stochastic setting where the sample strings are sampled i.i.d. from a distribution (instead of being generated by an adversary). Here, we show that private identification is possible if and only if the collection is identifiable in the adversarial model. Together, our results establish new dimensions along which generation and identification differ and, for identification, a separation between adversarial and stochastic settings induced by privacy constraints.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2604.08504

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(7 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.66)

Add feedback

Characterizations of Language Generation With Breadth

Kalavasis, Alkis, Mehrotra, Anay, Velegkas, Grigoris

arXiv.org Machine LearningDec-24-2024

We study language generation in the limit, introduced by Kleinberg and Mullainathan [KM24], building on classical works of Gold [Gol67] and Angluin [Ang79]. [KM24] proposed an algorithm that generates strings from any countable language collection in the limit. While their algorithm eventually outputs strings from the target language $K$, it sacrifices breadth, i.e., the ability to generate all strings in $K$. A key open question in [KM24] is whether this trade-off between consistency and breadth is inherrent. Recent works proposed different notions of consistent generation with breadth. Kalavasis, Mehrotra, and Velegkas [KVM24] introduced three definitions: generation with exact breadth, approximate breadth, and unambiguous generation. Concurrently and independently, Charikar and Pabbaraju [CP24a] proposed exhaustive generation. Both works examined when generation with these notions of breadth is possible. Building on [CP24a, KVM24], we fully characterize language generation for these notions and their natural combinations. For exact breadth, we provide an unconditional lower bound, removing a technical condition from [KVM24] and extending the result of [CP24a] that holds for specific collections of languages. We show that generation with exact breadth is characterized by Angluin's condition for identification. We further introduce a weaker version of Angluin's condition that tightly characterizes both approximate breadth and exhaustive generation, proving their equivalence. Additionally, we show that unambiguous generation is also characterized by Angluin's condition as a special case of a broader result. Finally, we strengthen [KVM24] by giving unconditional lower bounds for stable generators, showing that Angluin's condition characterizes the previous breadth notions for stable generators. This shows a separation between stable and unstable generation with approximate breadth.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2412.1853

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.24)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Italy (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.92)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.82)

Add feedback

Exploring Facets of Language Generation in the Limit

Charikar, Moses, Pabbaraju, Chirag

arXiv.org Artificial IntelligenceDec-24-2024

The recent work of Kleinberg & Mullainathan [KM24] provides a concrete model for language generation in the limit: given a sequence of examples from an unknown target language, the goal is to generate new examples from the target language such that no incorrect examples are generated beyond some point. In sharp contrast to strong negative results for the closely related problem of language identification, they establish positive results for language generation in the limit for all countable collections of languages. Follow-up work by Raman & Tewari [RT24] studies bounds on the number of distinct inputs required by an algorithm before correct language generation is achieved -- namely, whether this is a constant for all languages in the collection (uniform generation) or a language-dependent constant (non-uniform generation). We show that every countable language collection has a generator which has the stronger property of non-uniform generation in the limit. However, while the generation algorithm of [KM24] can be implemented using membership queries, we show that any algorithm cannot non-uniformly generate even for collections of just two languages, using only membership queries. We also formalize the tension between validity and breadth in the generation algorithm of [KM24] by introducing a definition of exhaustive generation, and show a strong negative result for exhaustive generation. Our result shows that a tradeoff between validity and breadth is inherent for generation in the limit. We also provide a precise characterization of the language collections for which exhaustive generation is possible. Finally, inspired by algorithms that can choose to obtain feedback, we consider a model of uniform generation with feedback, completely characterizing language collections for which such uniform generation with feedback is possible in terms of a complexity measure of the collection.

algorithm, artificial intelligence, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.15364

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)

Add feedback

An $\mathbf{L^*}$ Algorithm for Deterministic Weighted Regular Languages

Pasti, Clemente, Karagöz, Talu, Svete, Anej, Nowak, Franz, Boumasmoud, Reda, Cotterell, Ryan

arXiv.org Artificial IntelligenceDec-19-2024

Extracting finite state automata (FSAs) from black-box models offers a powerful approach to gaining interpretable insights into complex model behaviors. To support this pursuit, we present a weighted variant of Angluin's (1987) $\mathbf{L^*}$ algorithm for learning FSAs. We stay faithful to the original algorithm, devising a way to exactly learn deterministic weighted FSAs whose weights support division. Furthermore, we formulate the learning process in a manner that highlights the connection with FSA minimization, showing how $\mathbf{L^*}$ directly learns a minimal automaton for the target language.

artificial intelligence, automaton, machine learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2024.emnlp-main.468

2411.06228

Country:

North America > United States > California (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Can adversarial attacks by large language models be attributed?

Cebrian, Manuel, Telle, Jan Arne

arXiv.org Artificial IntelligenceNov-12-2024

Attributing outputs from Large Language Models (LLMs) in adversarial settings-such as cyberattacks and disinformation-presents significant challenges that are likely to grow in importance. We investigate this attribution problem using formal language theory, specifically language identification in the limit as introduced by Gold and extended by Angluin. By modeling LLM outputs as formal languages, we analyze whether finite text samples can uniquely pinpoint the originating model. Our results show that due to the non-identifiability of certain language classes, under some mild assumptions about overlapping outputs from fine-tuned models it is theoretically impossible to attribute outputs to specific LLMs with certainty. This holds also when accounting for expressivity limitations of Transformer architectures. Even with direct model access or comprehensive monitoring, significant computational hurdles impede attribution efforts. These findings highlight an urgent need for proactive measures to mitigate risks posed by adversarial LLM use as their influence continues to expand.

attribution, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.08003

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.54)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Analyzing Robustness of Angluin's L* Algorithm in Presence of Noise

Khmelnitsky, Igor, Haddad, Serge, Ye, Lina, Barbot, Benoît, Bollig, Benedikt, Leucker, Martin, Neider, Daniel, Roy, Rajarshi

arXiv.org Artificial IntelligenceSep-21-2022

Angluin's L* algorithm learns the minimal (complete) deterministic finite automaton (DFA) of a regular language using membership and equivalence queries. Its probabilistic approximatively correct (PAC) version substitutes an equivalence query by a large enough set of random membership queries to get a high level confidence to the answer. Thus it can be applied to any kind of (also non-regular) device and may be viewed as an algorithm for synthesizing an automaton abstracting the behavior of the device based on observations. Here we are interested on how Angluin's PAC learning algorithm behaves for devices which are obtained from a DFA by introducing some noise. More precisely we study whether Angluin's algorithm reduces the noise and produces a DFA closer to the original one than the noisy device. We propose several ways to introduce the noise: (1) the noisy device inverts the classification of words w.r.t. the DFA with a small probability, (2) the noisy device modifies with a small probability the letters of the word before asking its classification w.r.t. the DFA, and (3) the noisy device combines the classification of a word w.r.t. the DFA and its classification w.r.t. a counter automaton. Our experiments were performed on several hundred DFAs. Our main contributions, bluntly stated, consist in showing that: (1) Angluin's algorithm behaves well whenever the noisy device is produced by a random process, (2) but poorly with a structured noise, and, that (3) almost surely randomness yields systems with non-recursively enumerable languages.

angluin, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.4204/EPTCS.370.6

2209.10315

Country:

Europe > France (0.05)
North America > United States > New York (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.48)

Add feedback

Property-Directed Verification of Recurrent Neural Networks

Khmelnitsky, Igor, Neider, Daniel, Roy, Rajarshi, Barbot, Benoît, Bollig, Benedikt, Finkel, Alain, Haddad, Serge, Leucker, Martin, Ye, Lina

arXiv.org Artificial IntelligenceSep-22-2020

This paper presents a property-directed approach to verifying recurrent neural networks (RNNs). To this end, we learn a deterministic finite automaton as a surrogate model from a given RNN using active automata learning. This model may then be analyzed using model checking as verification technique. The term property-directed reflects the idea that our procedure is guided and controlled by the given property rather than performing the two steps separately. We show that this not only allows us to discover small counterexamples fast, but also to generalize them by pumping towards faulty flows hinting at the underlying error in the RNN.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2009.1061

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > France (0.05)
South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
(7 more...)

Genre: Research Report (0.64)

Industry: Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning Residual Alternating Automata

Berndt, Sebastian (University of Lübeck) | Liśkiewicz, Maciej (University of Lübeck) | Lutter, Matthias (University of Lübeck) | Reischuk, Rüdiger (University of Lübeck)

AAAI ConferencesFeb-14-2017

Residuality plays an essential role for learning finite automata. While residual deterministic and non-deterministic automata have been understood quite well, fundamental questions concerning alternating automata (AFA) remain open. Recently, Angluin, Eisenstat, and Fisman (2015) have initiated a systematic study of residual AFAs and proposed an algorithm called AL* – an extension of the popular L* algorithm – to learn AFAs. Based on computer experiments they have conjectured that AL* produces residual AFAs, but have not been able to give a proof. In this paper we disprove this conjecture by constructing a counterexample. As our main positive result we design an efficient learning algorithm, named AL** and give a proof that it outputs residual AFAs only. In addition, we investigate the succinctness of these different FA types in more detail.

artificial intelligence, automata, machine learning, (17 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: Europe (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning Conditional Preference Networks with Queries

Koriche, Frederic (LIRMM, CNRS UMR 5506, Université Montpellier II) | Zanuttini, Bruno (GREYC, CNRS UMR 6072, Université de Caen Basse-Normandie)

AAAI ConferencesJun-23-2009

We investigate the problem of eliciting CP-nets in the well-known model of exact learning with equivalence and membership queries. The goal is to identify a preference ordering with a binary-valued CP-net by guiding the user through a sequence of queries. Each example is a dominance test on some pair of outcomes. In this setting, we show that acyclic CP-nets are not learnable with equivalence queries alone, while they are learnable with the help of membership queries if the supplied examples are restricted to swaps. A similar property holds for tree CP-nets with arbitrary examples. In fact, membership queries allow us to provide attribute-efficient algorithms for which the query complexity is only logarithmic in the number of attributes. Such results highlight the utility of this model for eliciting CP-nets in large multi-attribute domains.

cp-net, membership query, query, (16 more...)

AAAI Conferences

Twenty-First International Joint Conference on Artificial Intelligence

Country:

North America > United States > North Carolina (0.04)
Europe > France > Occitanie > Hérault > Montpellier (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)

Add feedback

Angluin-Style Learning of NFA

Bollig, Benedikt (LSV, CNRS, ENS Cachan) | Habermehl, Peter (LIAFA, Paris 7) | Kern, Carsten (MOVES, RWTH Aachen) | Leucker, Martin (TU Munich)

AAAI ConferencesJun-23-2009

We introduce NL*, a learning algorithm for inferring non-deterministic finite-state automata using membership and equivalence queries. More specifically, residual finite-state automata (RFSA) are learned similarly as in Angluin's popular L* algorithm, which, however, learns deterministic finite-state automata (DFA). Like in a DFA, the states of an RFSA represent residual languages. Unlike a DFA, an RFSA restricts to prime residual languages, which cannot be described as the union of other residual languages. In doing so, RFSA can be exponentially more succinct than DFA. They are, therefore, the preferable choice for many learning applications. The implementation of our algorithm is applied to a collection of examples and confirms the expected advantage of NL* over L*.

algorithm, query, rfsa, (16 more...)

AAAI Conferences

Twenty-First International Joint Conference on Artificial Intelligence

Country:

Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback