AITopics | Yang, Weiwei

Collaborating Authors

Yang, Weiwei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

NeurIPS 2023 LLM Efficiency Fine-tuning Competition

Saroufim, Mark, Perlitz, Yotam, Choshen, Leshem, Antiga, Luca, Bowyer, Greg, Puhrsch, Christian, Guessous, Driss, Rao, Supriya, Chauhan, Geeta, Kumar, Ashvini, Kumar, Jindal Pawan, Parikh, Rajpoot Ankur, Isaacson, Joe, Yang, Weiwei

arXiv.org Artificial IntelligenceMar-13-2025

Our analysis of the NeurIPS 2023 large language model (LLM) fine-tuning competition revealed the following trend: top-performing models exhibit significant overfitting on benchmark datasets, mirroring the broader issue of benchmark overfitting on popular leaderboards and that data curation is essential in order to get a high performing LLM. The competition, which consisted of two stages - an open evaluation stage with publicly available tasks and a closed evaluation stage with unseen tasks - allowed us to assess the generalizability of fine-tuned LLMs. Our results highlight the limitations of current benchmark-based evaluation schemes for generative models and demonstrate the need for more robust evaluation methods. Notably, the winning submissions utilized standard open-source libraries and focused primarily on data curation. To facilitate further research and promote reproducibility, we release all competition entries, Docker files, and evaluation infrastructure, providing a valuable resource for the community to explore fine-tuning, overfitting, and reproducibility in LLMs..

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.13507

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Interpretable Language Modeling via Induction-head Ngram Models

Kim, Eunji, Mantena, Sriya, Yang, Weiwei, Singh, Chandan, Yoon, Sungroh, Gao, Jianfeng

arXiv.org Artificial IntelligenceOct-31-2024

Recent large language models (LLMs) have excelled across a wide range of tasks, but their use in high-stakes and compute-limited settings has intensified the demand for interpretability and efficiency. We address this need by proposing Induction-head ngram models (Induction-Gram), a method that builds an efficient, interpretable LM by bolstering modern ngram models with a hand-engineered "induction head". This induction head uses a custom neural similarity metric to efficiently search the model's input context for potential next-word completions. This process enables Induction-Gram to provide ngram-level grounding for each generated token. Moreover, experiments show that this simple method significantly improves next-word prediction over baseline interpretable models (up to 26%p) and can be used to speed up LLM inference for large models through speculative decoding. We further study Induction-Gram in a natural-language neuroscience setting, where the goal is to predict the next fMRI response in a sequence. It again provides a significant improvement over interpretable models (20% relative increase in the correlation of predicted fMRI responses), potentially enabling deeper scientific investigation of language selectivity in the brain. The code is available at https://github.com/ejkim47/induction-gram.

induction-only, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2411.00066

Country:

North America > United States (0.28)
Asia > Middle East (0.28)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities

Sun, Chung-En, Liu, Xiaodong, Yang, Weiwei, Weng, Tsui-Wei, Cheng, Hao, San, Aidan, Galley, Michel, Gao, Jianfeng

arXiv.org Artificial IntelligenceOct-25-2024

Recent research has shown that Large Language Models (LLMs) are vulnerable to automated jailbreak attacks, where adversarial suffixes crafted by algorithms appended to harmful queries bypass safety alignment and trigger unintended responses. Current methods for generating these suffixes are computationally expensive and have low Attack Success Rates (ASR), especially against well-aligned models like Llama2 and Llama3. To overcome these limitations, we introduce ADV-LLM, an iterative self-tuning process that crafts adversarial LLMs with enhanced jailbreak ability. Our framework significantly reduces the computational cost of generating adversarial suffixes while achieving nearly 100\% ASR on various open-source LLMs. Moreover, it exhibits strong attack transferability to closed-source models, achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4, despite being optimized solely on Llama3. Beyond improving jailbreak ability, ADV-LLM provides valuable insights for future safety alignment research through its ability to generate large datasets for studying LLM safety.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.18469

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Fast and Reliable $N-k$ Contingency Screening with Input-Convex Neural Networks

Christianson, Nicolas, Cui, Wenqi, Low, Steven, Yang, Weiwei, Zhang, Baosen

arXiv.org Artificial IntelligenceOct-1-2024

Power system operators must ensure that dispatch decisions remain feasible in case of grid outages or contingencies to prevent cascading failures and ensure reliable operation. However, checking the feasibility of all $N - k$ contingencies -- every possible simultaneous failure of $k$ grid components -- is computationally intractable for even small $k$, requiring system operators to resort to heuristic screening methods. Because of the increase in uncertainty and changes in system behaviors, heuristic lists might not include all relevant contingencies, generating false negatives in which unsafe scenarios are misclassified as safe. In this work, we propose to use input-convex neural networks (ICNNs) for contingency screening. We show that ICNN reliability can be determined by solving a convex optimization problem, and by scaling model weights using this problem as a differentiable optimization layer during training, we can learn an ICNN classifier that is both data-driven and has provably guaranteed reliability. Namely, our method can ensure a zero false negative rate. We empirically validate this methodology in a case study on the IEEE 39-bus test network, observing that it yields substantial (10-20x) speedups while having excellent classification accuracy.

artificial intelligence, icnn, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2410.00796

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.82)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models

Jin, Xin, Larson, Jonathan, Yang, Weiwei, Lin, Zhiqiang

arXiv.org Artificial IntelligenceDec-15-2023

Binary code summarization, while invaluable for understanding code semantics, is challenging due to its labor-intensive nature. This study delves into the potential of large language models (LLMs) for binary code comprehension. To this end, we present BinSum, a comprehensive benchmark and dataset of over 557K binary functions and introduce a novel method for prompt synthesis and optimization. To more accurately gauge LLM performance, we also propose a new semantic similarity metric that surpasses traditional exact-match approaches. Our extensive evaluation of prominent LLMs, including ChatGPT, GPT-4, Llama 2, and Code Llama, reveals 10 pivotal insights. This evaluation generates 4 billion inference tokens, incurred a total expense of 11,418 US dollars and 873 NVIDIA A100 GPU hours. Our findings highlight both the transformative potential of LLMs in this field and the challenges yet to be overcome.

large language model, machine learning, natural language, (24 more...)

arXiv.org Artificial Intelligence

2312.09601

Country: North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Statistical Turing Test for Generative Models

Helm, Hayden, Priebe, Carey E., Yang, Weiwei

arXiv.org Artificial IntelligenceSep-16-2023

The emergence of human-like abilities of AI systems for content generation in domains such as text, audio, and vision has prompted the development of classifiers to determine whether content originated from a human or a machine. Implicit in these efforts is an assumption that the generation properties of a human are different from that of the machine. In this work, we provide a framework in the language of statistical pattern recognition that quantifies the difference between the distributions of human and machine-generated content conditioned on an evaluation context. We describe current methods in the context of the framework and demonstrate how to use the framework to evaluate the progression of generative models towards human-like capabilities, among many axes of analysis.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2309.08913

Country: Europe (0.68)

Genre: Research Report (0.82)

Industry:

Media > News (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Approximately optimal domain adaptation with Fisher's Linear Discriminant Analysis

Helm, Hayden S., De Silva, Ashwin, Vogelstein, Joshua T., Priebe, Carey E., Yang, Weiwei

arXiv.org Artificial IntelligenceMar-14-2023

We propose a class of models based on Fisher's Linear Discriminant (FLD) for domain adaptation. The class entails a convex combination of two hypotheses: i) an average hypothesis representing previously encountered source tasks and ii) a hypothesis trained on a new target task. For a particular generative setting, we derive the expected risk of this combined hypothesis with respect to the target distribution and propose a computable approximation. This is then leveraged to estimate an optimal convex coefficient that exploits the bias-variance trade-off between source and target information to arrive at an optimal classifier for the target task. We study the effect of various generative parameter settings on the relative risks between the optimal hypothesis, hypothesis i), and hypothesis ii). Furthermore, we demonstrate the effectiveness of the proposed optimal classifier in several EEGand ECG-based classification problems and argue that the optimal classifier can be computed without access to direct information from any of the individual source tasks, leading to the preservation of privacy. We conclude by discussing further applications, limitations, and potential future directions. In problems with limited context-specific labeled data, machine learning models often fail to generalize well. These approaches are either ineffective or unavailable for problems where the input signals are highly variable across contexts or where a single model does not have access to a sufficient amount of data due to privacy or resource constraints (Mühlhoff, 2021). Note that the terms "context" and "task" can be used interchangeably here.

artificial intelligence, classifier, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2302.14186

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.47)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Discriminant Analysis (0.60)

Add feedback

Efficient Reinforcement Learning Through Trajectory Generation

Cui, Wenqi, Huang, Linbin, Yang, Weiwei, Zhang, Baosen

arXiv.org Artificial IntelligenceDec-1-2022

A key barrier to using reinforcement learning (RL) in many real-world applications is the requirement of a large number of system interactions to learn a good control policy. Off-policy and Offline RL methods have been proposed to reduce the number of interactions with the physical environment by learning control policies from historical data. However, their performances suffer from the lack of exploration and the distributional shifts in trajectories once controllers are updated. Moreover, most RL methods require that all states are directly observed, which is difficult to be attained in many settings. To overcome these challenges, we propose a trajectory generation algorithm, which adaptively generates new trajectories as if the system is being operated and explored under the updated control policies. Motivated by the fundamental lemma for linear systems, assuming sufficient excitation, we generate trajectories from linear combinations of historical trajectories. For linear feedback control, we prove that the algorithm generates trajectories with the exact distribution as if they were sampled from the real system using the updated control policy. In particular, the algorithm extends to systems where the states are not directly observed. Experiments show that the proposed method significantly reduces the number of sampled data needed for RL algorithms.

machine learning, reinforcement learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2211.17249

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.64)

Industry: Energy > Power Industry (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Prospective Learning: Back to the Future

Vogelstein, Joshua T., Verstynen, Timothy, Kording, Konrad P., Isik, Leyla, Krakauer, John W., Etienne-Cummings, Ralph, Ogburn, Elizabeth L., Priebe, Carey E., Burns, Randal, Kutten, Kwame, Knierim, James J., Potash, James B., Hartung, Thomas, Smirnova, Lena, Worley, Paul, Savonenko, Alena, Phillips, Ian, Miller, Michael I., Vidal, Rene, Sulam, Jeremias, Charles, Adam, Cowan, Noah J., Bichuch, Maxim, Venkataraman, Archana, Li, Chen, Thakor, Nitish, Kebschull, Justus M, Albert, Marilyn, Xu, Jinchong, Shuler, Marshall Hussain, Caffo, Brian, Ratnanather, Tilak, Geisa, Ali, Roh, Seung-Eon, Yezerets, Eva, Madhyastha, Meghana, How, Javier J., Tomita, Tyler M., Dey, Jayanta, Ningyuan, null, Huang, null, Shin, Jong M., Kinfu, Kaleab Alemayehu, Chaudhari, Pratik, Baker, Ben, Schapiro, Anna, Jayaraman, Dinesh, Eaton, Eric, Platt, Michael, Ungar, Lyle, Wehbe, Leila, Kepecs, Adam, Christensen, Amy, Osuagwu, Onyema, Brunton, Bing, Mensh, Brett, Muotri, Alysson R., Silva, Gabriel, Puppo, Francesca, Engert, Florian, Hillman, Elizabeth, Brown, Julia, White, Chris, Yang, Weiwei

arXiv.org Artificial IntelligenceJan-18-2022

Research on both natural intelligence (NI) and artificial intelligence (AI) generally assumes that the future resembles the past: intelligent agents or systems (what we call 'intelligence') observe and act on the world, then use this experience to act on future experiences of the same kind. We call this 'retrospective learning'. For example, an intelligence may see a set of pictures of objects, along with their names, and learn to name them. A retrospective learning intelligence would merely be able to name more pictures of the same objects. We argue that this is not what true intelligence is about. In many real world problems, both NIs and AIs will have to learn for an uncertain future. Both must update their internal models to be useful for future tasks, such as naming fundamentally new objects and using these objects effectively in a new context or to achieve previously unencountered goals. This ability to learn for the future we call 'prospective learning'. We articulate four relevant factors that jointly define prospective learning. Continual learning enables intelligences to remember those aspects of the past which it believes will be most useful in the future. Prospective constraints (including biases and priors) facilitate the intelligence finding general solutions that will be applicable to future problems. Curiosity motivates taking actions that inform future decision making, including in previously unmet situations. Causal estimation enables learning the structure of relations that guide choosing actions for specific outcomes, even when the specific action-outcome contingencies have never been observed before. We argue that a paradigm shift from retrospective to prospective learning will enable the communities that study intelligence to unite and overcome existing bottlenecks to more effectively explain, augment, and engineer intelligences.

machine learning, natural language, reinforcement learning, (22 more...)

arXiv.org Artificial Intelligence

2201.07372

Country:

Europe > United Kingdom > England (0.14)
North America > United States > Texas (0.14)
North America > United States > Pennsylvania (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (1.00)
Education (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(4 more...)

Add feedback

Learning without gradient descent encoded by the dynamics of a neurobiological model

George, Vivek Kurien, Morar, Vikash, Yang, Weiwei, Larson, Jonathan, Tower, Bryan, Mahajan, Shweti, Gupta, Arkin, White, Christopher, Silva, Gabriel A.

arXiv.org Artificial IntelligenceMar-23-2021

The success of state-of-the-art machine learning is essentially all based on different variations of gradient descent algorithms that minimize some version of a cost or loss function. A fundamental limitation, however, is the need to train these systems in either supervised or unsupervised ways by exposing them to typically large numbers of training examples. Here, we introduce a fundamentally novel conceptual approach to machine learning that takes advantage of a neurobiologically derived model of dynamic signaling, constrained by the geometric structure of a network. We show that MNIST images can be uniquely encoded and classified by the dynamics of geometric networks with nearly state-of-the-art accuracy in an unsupervised way, and without the need for any training.

neural network, neurology, node, (18 more...)

arXiv.org Artificial Intelligence

2103.08878

Country: North America > United States > California > San Diego County (0.15)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback