AITopics | North America

Collaborating Authors

North America

ColdGANs: Taming Language GANs with Cautious Sampling Strategies Thomas Scialom, Paul-Alexis Dray

Neural Information Processing SystemsMay-22-2025, 10:43:19 GMT

Training regimes based on Maximum Likelihood Estimation (MLE) suffer from known limitations, often leading to poorly generated text sequences. At the root of these limitations is the mismatch between training and inference, i.e. the so-called exposure bias, exacerbated by considering only the reference texts as correct, while in practice several alternative formulations could be as good. Generative Adversarial Networks (GANs) can mitigate those limitations but the discrete nature of text has hindered their application to language generation: the approaches proposed so far, based on Reinforcement Learning, have been shown to underperform MLE. Departing from previous works, we analyze the exploration step in GANs applied to text generation, and show how classical sampling results in unstable training. We propose to consider alternative exploration strategies in a GAN framework that we name ColdGANs, where we force the sampling to be close to the distribution modes to get smoother learning dynamics. For the first time, to the best of our knowledge, the proposed language GANs compare favorably to MLE, and obtain improvements over the state-of-the-art on three generative tasks, namely unconditional text generation, question generation, and abstractive summarization.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Europe > France (0.14)
North America > Canada (0.14)
Europe > Denmark (0.14)
Europe > Belgium (0.14)

Genre: Research Report (0.68)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Add feedback

Who's to Blame When AI Agents Screw Up?

WIREDMay-22-2025, 10:30:00 GMT

Over the past year, veteran software engineer Jay Prakash Thakur has spent his nights and weekends prototyping AI agents that could, in the near future, order meals and engineer mobile apps almost entirely on their own. His agents, while surprisingly capable, have also exposed new legal questions that await companies trying to capitalize on Silicon Valley's hottest new technology. Agents are AI programs that can act mostly independently, allowing companies to automate tasks such as answering customer questions or paying invoices. While ChatGPT and similar chatbots can draft emails or analyze bills upon request, Microsoft and other tech giants expect that agents will tackle more complex functions--and most importantly, do it with little human oversight. The tech industry's most ambitious plans involve multi-agent systems, with dozens of agents someday teaming up to replace entire workforces.

artificial intelligence, machine learning, natural language, (11 more...)

WIRED

Country: North America > United States > California (0.52)

Industry:

Law (1.00)
Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

Interview with Gillian Hadfield: Normative infrastructure for AI alignment

AIHubMay-22-2025, 10:02:25 GMT

During the 33rd International Joint Conference on Artificial Intelligence (IJCAI), held in Jeju, I had the opportunity to meet with one of the keynote speakers, Gillian Hadfield. We spoke about her interdisciplinary research, career trajectory, path into AI alignment, law, and general thoughts on AI systems. Transcript: Note: the transcript has been lightly edited for clarity. This is an interview with Professor Gillian Hadfield who was a keynote speaker at IJCAI 2024. She gave a very insightful talk about normative infrastructures and how they can guide our search for AI alignment. Kumar Kshitij Patel (KKP): Could you talk a bit about your background and career trajectory? I want our readers to understand how much interdisciplinary work you've done over the years. Gillian Hadfield (GH): I did a PhD in economics and a law degree, a JD, at Stanford, originally motivated by wanting to think about the big questions about the world. So I read John Rawls' theory of justice when I was an undergraduate, and those are the big questions: how do we organize the world and just institutions, but I was very interested in using more formal methods and social scientific approaches. That's why I decided to do that joint degree. So, this is in the 1980s, and in the early days of starting to use a lot of game theory. I studied information theory, a student of Canaro and Paul Milgram at the economics department at Stanford. I did work on contract theory, bargaining theory, but I was still very interested in going to law school, not to practice law, but to learn about legal institutions and how those work. I was a member of this emerging area of law and economics early in my career, which of course, was interdisciplinary, using economics to think about law and legal institutions.

agent, artificial intelligence, machine learning, (16 more...)

AIHub

Country:

North America > United States > California (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Personal > Interview (0.88)

Industry:

Law (1.00)
Education > Educational Setting > Higher Education (0.69)
Education > Curriculum > Subject-Specific Education (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)

Add feedback

AI Melania: First lady embarks on 'new frontier' in publishing with audiobook of memoir

FOX NewsMay-22-2025, 10:00:02 GMT

EXCLUSIVE: First lady Melania Trump is launching an audiobook of her memoir using artificial intelligence (AI) audio technology in multiple languages, Fox News Digital has learned. The first lady released her first memoir, "Melania," last year. This week, she is breaking new ground by releasing "Melania, the Audiobook," which has been "created entirely" with AI. "I am proud to be at the forefront of publishing's new frontier – the intersection of artificial intelligence technology and audio," Trump told Fox News Digital. The first lady said ElevenLabs AI developed "an AI-generated replica of my voice under strict supervision, which will establish an unforgettable connection with my personal story, in multiple languages for listeners worldwide." ElevenLabs AI CEO Mati Staniszewski told Fox News Digital that they are "excited that Melania Trump trusted our technology to power this first-of-its-kind audiobook project."

artificial intelligence, audiobook, fox new digital, (9 more...)

FOX News

Country: North America > United States (0.66)

Industry:

Media > Publishing (1.00)
Government (1.00)
Media > News (0.89)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

ICNet: Intra-saliency Correlation Network for Co-Saliency Detection

Neural Information Processing SystemsMay-22-2025, 09:24:19 GMT

Model-based methods produce coarse Co-SOD results due to hand-crafted intra-and inter-saliency features. Current data-driven models exploit inter-saliency cues, but undervalue the potential power of intra-saliency cues. In this paper, we propose an Intra-saliency Correlation Network (ICNet) to extract intra-saliency cues from the single image saliency maps (SISMs) predicted by any off-the-shelf SOD method, and obtain inter-saliency cues by correlation techniques. Specifically, we adopt normalized masked average pooling (NMAP) to extract latent intra-saliency categories from the SISMs and semantic features as intra cues. Then we employ a correlation fusion module (CFM) to obtain inter cues by exploiting correlations between the intra cues and single-image features. To improve Co-SOD performance, we propose a category-independent rearranged self-correlation feature (RSCF) strategy. Experiments on three benchmarks show that our ICNet outperforms previous state-of-the-art methods on Co-SOD.

artificial intelligence, icnet, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > China (0.14)
North America > Canada (0.14)

Genre: Research Report (0.66)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training using Gradient Similarity Measurement

Neural Information Processing SystemsMay-22-2025, 08:54:05 GMT

Large scale training requires massive parallelism to finish the training within a reasonable amount of time. To support massive parallelism, large batch training is the key enabler but often at the cost of generalization performance. Existing works explore adaptive batching or hand-tuned static large batching, in order to strike a balance between the computational efficiency and the performance. However, these methods can provide only coarse-grained adaption (e.g., at a epoch level) due to the intrinsic expensive calculation or hand tuning requirements. In this paper, we propose a fully automated and lightweight adaptive batching methodology to enable fine-grained batch size adaption (e.g., at a mini-batch level) that can achieve stateof-the-art performance with record breaking batch sizes. The core component of our method is a lightweight yet efficient representation of the critical gradient noise information. We open-source the proposed methodology by providing a plugin tool that supports mainstream machine learning frameworks. Extensive evaluations on popular benchmarks (e.g., CIFAR10, ImageNet, and BERT-Large) demonstrate that the proposed methodology outperforms state-of-the-art methodologies using adaptive batching approaches or hand-tuned static strategies in both performance and batch size. Particularly, we achieve a new state-of-the-art batch size of 78k in BERT-Large pretraining with SQuAD score 90.69 compared to 90.58 reported in previous state-of-the-art with 59k batch size.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

A Tight Lower Bound and Efficient Reduction for Swap Regret

Neural Information Processing SystemsMay-22-2025, 08:37:12 GMT

Swap regret, a generic performance measure of online decision-making algorithms, plays an important role in the theory of repeated games, along with a close connection to correlated equilibria in strategic games. This paper shows an (p TN log N)-lower bound for swap regret, where T and N denote the numbers of time steps and available actions, respectively. Our lower bound is tight up to a constant, and resolves an open problem mentioned, e.g., in the book by Nisan et al. [28]. Besides, we present a computationally efficient reduction method that converts no-external-regret algorithms to no-swap-regret algorithms. This method can be applied not only to the full-information setting but also to the bandit setting and provides a better regret bound than previous results.

artificial intelligence, data mining, machine learning, (21 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England (0.14)
North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

Nested Variational Inference Hao Wu Jan-Willem van de Meent

Neural Information Processing SystemsMay-22-2025, 08:36:05 GMT

We develop nested variational inference (NVI), a family of methods that learn proposals for nested importance samplers by minimizing an forward or reverse KL divergence at each level of nesting. NVI is applicable to many commonly-used importance sampling strategies and provides a mechanism for learning intermediate densities, which can serve as heuristics to guide the sampler. Our experiments apply NVI to (a) sample from a multimodal distribution using a learned annealing path (b) learn heuristics that approximate the likelihood of future observations in a hidden Markov model and (c) to perform amortized inference in hierarchical deep generative models. We observe that optimizing nested objectives leads to improved sample quality in terms of log average weight and effective sample size.

artificial intelligence, machine learning, objective, (16 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Generalizing Bayesian Optimization with Decision-theoretic Entropies Willie Neiswanger

Neural Information Processing SystemsMay-22-2025, 07:07:35 GMT

Bayesian optimization (BO) is a popular method for efficiently inferring optima of an expensive black-box function via a sequence of queries. Existing informationtheoretic BO procedures aim to make queries that most reduce the uncertainty about optima, where the uncertainty is captured by Shannon entropy. However, an optimal measure of uncertainty would, ideally, factor in how we intend to use the inferred quantity in some downstream procedure. In this paper, we instead consider a generalization of Shannon entropy from work in statistical decision theory [13, 39], which contains a broad class of uncertainty measures parameterized by a problem-specific loss function corresponding to a downstream task. We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures such as knowledge gradient, expected improvement, and entropy search. We then show how alternative choices for the loss yield a flexible family of acquisition functions that can be customized for use in novel optimization settings.

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin (0.14)
North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.46)

Industry:

Government (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.69)
Health & Medicine > Therapeutic Area > Vaccines (0.48)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Invariant and Transportable Representations for Anti-Causal Domain Shifts and Victor Veitch Department of Computer Science, University of Chicago Department of Statistics, University of Chicago

Neural Information Processing SystemsMay-22-2025, 06:27:49 GMT

Real-world classification problems must contend with domain shift, the (potential) mismatch between the domain where a model is deployed and the domain(s) where the training data was gathered. Methods to handle such problems must specify what structure is common between the domains and what varies. A natural assumption is that causal (structural) relationships are invariant in all domains. Then, it is tempting to learn a predictor for label Y that depends only on its causal parents. However, many real-world problems are "anti-causal" in the sense that Y is a cause of the covariates X--in this case, Y has no causal parents and the naive causal invariance is useless.

artificial intelligence, machine learning, predictor, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois > Cook County > Chicago (0.76)

Genre: Research Report (0.67)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: