AITopics | Ratzlaff, Neale

Collaborating Authors

Ratzlaff, Neale

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization

Olson, Matthew Lyle, Ratzlaff, Neale, Hinck, Musashi, Luo, Man, Yu, Sungduk, Xue, Chendi, Lal, Vasudev

arXiv.org Artificial IntelligenceFeb-15-2025

DeepSeek-R1, the largest open-source Mixture-of-Experts (MoE) model, has demonstrated reasoning capabilities comparable to proprietary frontier models. Prior research has explored expert routing in MoE models, but findings suggest that expert selection is often token-dependent rather than semantically driven. Given DeepSeek-R1's enhanced reasoning abilities, we investigate whether its routing mechanism exhibits greater semantic specialization than previous MoE models. To explore this, we conduct two key experiments: (1) a word sense disambiguation task, where we examine expert activation patterns for words with differing senses, and (2) a cognitive reasoning analysis, where we assess DeepSeek-R1's structured thought process in an interactive task setting of DiscoveryWorld. We conclude that DeepSeek-R1's routing mechanism is more semantically aware and it engages in structured cognitive processes.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.10928

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Steering Large Language Models to Evaluate and Amplify Creativity

Olson, Matthew Lyle, Ratzlaff, Neale, Hinck, Musashi, Tseng, Shao-yen, Lal, Vasudev

arXiv.org Artificial IntelligenceDec-8-2024

Although capable of generating creative text, Large Language Models (LLMs) are poor judges of what constitutes "creativity". In this work, we show that we can leverage this knowledge of how to write creatively in order to better judge what is creative. We take a mechanistic approach that extracts differences in the internal states of an LLM when prompted to respond "boringly" or "creatively" to provide a robust measure of creativity that corresponds strongly with human judgment. We also show these internal state differences can be applied to enhance the creativity of generated text at inference time.

creativity, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2412.0606

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning

Ratzlaff, Neale, Luo, Man, Su, Xin, Lal, Vasudev, Howard, Phillip

arXiv.org Artificial IntelligenceDec-4-2024

Multimodal models typically combine a powerful large language model (LLM) with a vision encoder and are then trained on multimodal data via instruction tuning. While this process adapts LLMs to multimodal settings, it remains unclear whether this adaptation compromises their original language reasoning capabilities. In this work, we explore the effects of multimodal instruction tuning on language reasoning performance. We focus on LLaVA, a leading multimodal framework that integrates LLMs such as Vicuna or Mistral with the CLIP vision encoder. We compare the performance of the original LLMs with their multimodal-adapted counterparts across eight language reasoning tasks. Our experiments yield several key insights. First, the impact of multimodal learning varies between Vicuna and Mistral: we observe a degradation in language reasoning for Mistral but improvements for Vicuna across most tasks. Second, while multimodal instruction learning consistently degrades performance on mathematical reasoning tasks (e.g., GSM8K), it enhances performance on commonsense reasoning tasks (e.g., CommonsenseQA). Finally, we demonstrate that a training-free model merging technique can effectively mitigate the language reasoning degradation observed in multimodal-adapted Mistral and even improve performance on visual tasks.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2412.03467

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Debias your Large Multi-Modal Model at Test-Time with Non-Contrastive Visual Attribute Steering

Ratzlaff, Neale, Olson, Matthew Lyle, Hinck, Musashi, Aflalo, Estelle, Tseng, Shao-Yen, Lal, Vasudev, Howard, Phillip

arXiv.org Artificial IntelligenceNov-15-2024

Large Multi-Modal Models (LMMs) have demonstrated impressive capabilities as general-purpose chatbots that can engage in conversations about a provided input, such as an image. However, their responses are influenced by societal biases present in their training datasets, leading to undesirable differences in how the model responds when presented with images depicting people of different demographics. In this work, we propose a novel debiasing framework for LMMs that directly removes biased representations during text generation to decrease outputs related to protected attributes, or even representing them internally. Our proposed method is training-free; given a single image and a list of target attributes, we can ablate the corresponding representations with just one step of gradient descent on the image itself. Our experiments show that not only can we can minimize the propensity of LMMs to generate text related to protected attributes, but we can improve sentiment and even simply use synthetic data to inform the ablation while retaining language modeling capabilities on real data such as COCO or FACET. Furthermore, we find the resulting generations from a debiased LMM exhibit similar accuracy as a baseline biased model, showing that debiasing effects can be achieved without sacrificing model performance.

large language model, machine learning, preprint arxiv, (21 more...)

arXiv.org Artificial Intelligence

2411.1259

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Debiasing Large Vision-Language Models by Ablating Protected Attribute Representations

Ratzlaff, Neale, Olson, Matthew Lyle, Hinck, Musashi, Tseng, Shao-Yen, Lal, Vasudev, Howard, Phillip

arXiv.org Artificial IntelligenceOct-17-2024

Large Vision Language Models (LVLMs) such as LLaVA have demonstrated impressive capabilities as general-purpose chatbots that can engage in conversations about a provided input image. However, their responses are influenced by societal biases present in their training datasets, leading to undesirable differences in how the model responds when presented with images depicting people of different demographics. In this work, we propose a novel debiasing framework for LVLMs by directly ablating biased attributes during text generation to avoid generating text related to protected attributes, or even representing them internally. Our method requires no training and a relatively small amount of representative biased outputs ( 1000 samples). Our experiments show that not only can we can minimize the propensity of LVLMs to generate text related to protected attributes, but we can even use synthetic data to inform the ablation while retaining captioning performance on real data such as COCO. Furthermore, we find the resulting generations from a debiased LVLM exhibit similar accuracy as a baseline biased model, showing that debiasing effects can be achieved without sacrificing model performance.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.13976

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)

Add feedback

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

Baker, Megan M., New, Alexander, Aguilar-Simon, Mario, Al-Halah, Ziad, Arnold, Sébastien M. R., Ben-Iwhiwhu, Ese, Brna, Andrew P., Brooks, Ethan, Brown, Ryan C., Daniels, Zachary, Daram, Anurag, Delattre, Fabien, Dellana, Ryan, Eaton, Eric, Fu, Haotian, Grauman, Kristen, Hostetler, Jesse, Iqbal, Shariq, Kent, Cassandra, Ketz, Nicholas, Kolouri, Soheil, Konidaris, George, Kudithipudi, Dhireesha, Learned-Miller, Erik, Lee, Seungwon, Littman, Michael L., Madireddy, Sandeep, Mendez, Jorge A., Nguyen, Eric Q., Piatko, Christine D., Pilly, Praveen K., Raghavan, Aswin, Rahman, Abrar, Ramakrishnan, Santhosh Kumar, Ratzlaff, Neale, Soltoggio, Andrea, Stone, Peter, Sur, Indranil, Tang, Zhipeng, Tiwari, Saket, Vedder, Kyle, Wang, Felix, Xu, Zifan, Yanguas-Gil, Angel, Yedidsion, Harel, Yu, Shangqun, Vallabha, Gautam K.

arXiv.org Artificial IntelligenceJan-18-2023

Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through the development of "Lifelong Learning" systems that are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability. Unfortunately, efforts to improve these capabilities are typically treated as distinct areas of research that are assessed independently, without regard to the impact of each separate capability on other aspects of the system. We instead propose a holistic approach, using a suite of metrics and an evaluation framework to assess Lifelong Learning in a principled way that is agnostic to specific domains or system techniques. Through five case studies, we show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems. We highlight how the proposed suite of metrics quantifies performance trade-offs present during Lifelong Learning system development - both the widely discussed Stability-Plasticity dilemma and the newly proposed relationship between Sample Efficient and Robust Learning. Further, we make recommendations for the formulation and use of metrics to guide the continuing development of Lifelong Learning systems and assess their progress in the future.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.neunet.2023.01.007

2301.07799

Country: North America > United States > California (0.92)

Genre:

Instructional Material (1.00)
Research Report > Experimental Study (0.67)
Research Report > New Finding (0.46)

Industry:

Education > Educational Setting > Continuing Education (1.00)
Government > Regional Government > North America Government > United States Government (0.46)
Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.92)

Add feedback

Contrastive Identification of Covariate Shift in Image Data

Olson, Matthew L., Nguyen, Thuy-Vy, Dixit, Gaurav, Ratzlaff, Neale, Wong, Weng-Keen, Kahng, Minsuk

arXiv.org Artificial IntelligenceAug-19-2021

Identifying covariate shift is crucial for making machine learning systems robust in the real world and for detecting training data biases that are not reflected in test data. However, detecting covariate shift is challenging, especially when the data consists of high-dimensional images, and when multiple types of localized covariate shift affect different subspaces of the data. Although automated techniques can be used to detect the existence of covariate shift, our goal is to help human users characterize the extent of covariate shift in large image datasets with interfaces that seamlessly integrate information obtained from the detection algorithms. In this paper, we design and evaluate a new visual interface that facilitates the comparison of the local distributions of training and test data. We conduct a quantitative user study on multi-attribute facial data to compare two different learned low-dimensional latent representations (pretrained ImageNet CNN vs. density ratio) and two user analytic workflows (nearest-neighbor vs. cluster-to-cluster). Our results indicate that the latent representation of our density ratio model, combined with a nearest-neighbor comparison, is the most effective at helping humans identify covariate shift.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

2108.08

Country: North America > United States (0.14)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

Generative Particle Variational Inference via Estimation of Functional Gradients

Ratzlaff, Neale, Bai, Qinxun, Fuxin, Li, Xu, Wei

arXiv.org Machine LearningMar-1-2021

Recently, particle-based variational inference (ParVI) methods have gained interest because they directly minimize the Kullback-Leibler divergence and do not suffer from approximation errors from the evidence-based lower bound. However, many ParVI approaches do not allow arbitrary sampling from the posterior, and the few that do allow such sampling suffer from suboptimality. This work proposes a new method for learning to approximately sample from the posterior distribution. We construct a neural sampler that is trained with the functional gradient of the KL-divergence between the empirical sampling distribution and the target distribution, assuming the gradient resides within a reproducing kernel Hilbert space. Our generative ParVI (GPVI) approach maintains the asymptotic performance of ParVI methods while offering the flexibility of a generative sampler. Through carefully constructed experiments, we show that GPVI outperforms previous generative ParVI methods such as amortized SVGD, and is competitive with ParVI as well as gold-standard approaches like Hamiltonian Monte Carlo for fitting both exactly known and intractable target distributions.

artificial intelligence, generative particle variational inference, neural network, (13 more...)

arXiv.org Machine Learning

2103.01291

Country:

North America > United States > Oregon (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.94)

Add feedback

Avoiding Side Effects in Complex Environments

Turner, Alexander Matt, Ratzlaff, Neale, Tadepalli, Prasad

arXiv.org Artificial IntelligenceJun-11-2020

Reward function specification can be difficult, even in simple environments. Realistic environments contain millions of states. Rewarding the agent for making a widget may be easy, but penalizing the multitude of possible negative side effects is hard. In toy environments, Attainable Utility Preservation (AUP) avoids side effects by penalizing shifts in the ability to achieve randomly generated goals. We scale this approach to large, randomly generated environments based on Conway's Game of Life. By preserving optimal value for a single randomly generated reward function, AUP incurs modest overhead, completes the specified task, and avoids side effects.

artificial intelligence, reinforcement learning, side effect, (17 more...)

arXiv.org Artificial Intelligence

2006.06547

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

HyperGAN: A Generative Model for Diverse, Performant Neural Networks

Ratzlaff, Neale, Fuxin, Li

arXiv.org Machine LearningJan-30-2019

We introduce HyperGAN, a generative network that learns to generate all the weights within a deep neural network. HyperGAN employs a novel mixer to transform independent Gaussian noise into a latent space where dimensions are correlated, which is then transformed to generate weights in each layer of a deep neural network. We utilize an architecture that bears resemblance to generative adversarial networks, but we evaluate the likelihood of samples with a classification loss. This is equivalent to minimizing the KL-divergence between the generated network parameter distribution and an unknown true parameter distribution. We apply HyperGAN to classification, showing that HyperGAN can learn to generate parameters which solve the MNIST and CIFAR-10 datasets with competitive performance to fully supervised learning, while learning a rich distribution of effective parameters. We also show that HyperGAN can also provide better uncertainty than standard ensembles. This is evaluated by the ability of HyperGAN generated ensembles to detect out of distribution data as well as adversarial examples. We see that in addition to being highly accurate on inlier data, HyperGAN can provide reasonable uncertainty estimates.

deep learning, hypergan, neural network, (20 more...)

arXiv.org Machine Learning

1901.11058

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback