AITopics | Wallis, Phillip

Collaborating Authors

Wallis, Phillip

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Backtracking for Safety

Sel, Bilgehan, Li, Dingcheng, Wallis, Phillip, Keshava, Vaishakh, Jin, Ming, Jonnalagadda, Siddhartha Reddy

arXiv.org Artificial IntelligenceMar-11-2025

Large language models (LLMs) have demonstrated remarkable capabilities across various tasks, but ensuring their safety and alignment with human values remains crucial. Current safety alignment methods, such as supervised fine-tuning and reinforcement learning-based approaches, can exhibit vulnerabilities to adversarial attacks and often result in shallow safety alignment, primarily focusing on preventing harmful content in the initial tokens of the generated output. While methods like resetting can help recover from unsafe generations by discarding previous tokens and restarting the generation process, they are not well-suited for addressing nuanced safety violations like toxicity that may arise within otherwise benign and lengthy generations. In this paper, we propose a novel backtracking method designed to address these limitations. Our method allows the model to revert to a safer generation state, not necessarily at the beginning, when safety violations occur during generation. This approach enables targeted correction of problematic segments without discarding the entire generated text, thereby preserving efficiency. We demonstrate that our method dramatically reduces toxicity appearing through the generation process with minimal impact to efficiency.

large language model, machine learning, preprint arxiv, (18 more...)

arXiv.org Artificial Intelligence

2503.08919

Country:

Asia (0.46)
North America > United States (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (0.48)
Education > Educational Setting (0.46)
Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

LoRA: Low-Rank Adaptation of Large Language Models

Hu, Edward J., Shen, Yelong, Wallis, Phillip, Allen-Zhu, Zeyuan, Li, Yuanzhi, Wang, Shean, Chen, Weizhu

arXiv.org Artificial IntelligenceJun-17-2021

The dominant paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, conventional fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example, deploying many independent instances of fine-tuned models, each with 175B parameters, is extremely expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. For GPT-3, LoRA can reduce the number of trainable parameters by 10,000 times and the computation hardware requirement by 3 times compared to full fine-tuning. LoRA performs on-par or better than fine-tuning in model quality on both GPT-3 and GPT-2, despite having fewer trainable parameters, a higher training throughput, and no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptations, which sheds light on the efficacy of LoRA. We release our implementation in GPT-2 at https://github.com/microsoft/LoRA .

deep learning, lora, neural network, (20 more...)

arXiv.org Artificial Intelligence

2106.09685

Country: North America > United States > Louisiana (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Differential Equation Units: Learning Functional Forms of Activation Functions from Data

Torkamani, MohamadAli, Shankar, Shiv, Rooshenas, Amirmohammad, Wallis, Phillip

arXiv.org Machine LearningSep-6-2019

Most deep neural networks use simple, fixed activation functions, such as sigmoids or rectified linear units, regardless of domain or network structure. We introduce differential equation units (DEUs), an improvement to modern neural networks, which enables each neuron to learn a particular nonlinear activation function from a family of solutions to an ordinary differential equation. Specifically, each neuron may change its functional form during training based on the behavior of the other parts of the network. We show that using neurons with DEU activation functions results in a more compact network capable of achieving comparable, if not superior, performance when is compared to much larger networks.

activation function, deep learning, neural network, (15 more...)

arXiv.org Machine Learning

1909.03069

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.64)

Industry: Energy (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Learning Compact Neural Networks Using Ordinary Differential Equations as Activation Functions

Torkamani, MohamadAli, Wallis, Phillip, Shankar, Shiv, Rooshenas, Amirmohammad

arXiv.org Machine LearningMay-18-2019

activation function, deep learning, neural network, (15 more...)

arXiv.org Machine Learning

1905.07685

Country:

North America > United States > Massachusetts (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Learning Semantic Relationships from Medical Codes

Wallis, Phillip (Cambria Health Solutions) | Danaee, Padideh (Cambria Health Solutions)

AAAI ConferencesMay-15-2019

We demonstrate the value of learning dense representations (embeddings) of collections of codes representing various domains ofmo medical information. These embeddings are learned jointly using sparse representations of diagnosis, procedures and prescriptions extracted from medical claims, in order to infer semantic relationships both within, as well as between domains. We show that learning effective embeddings allows for a rich representation of a patient's clinical state at a point in time, a mechanism for assigning robust clinical similarity between patients, and a data representation which is generally useful in modeling various health care related events, such as the next most likely event (i.e. diagnosis, procedure or prescription), or the likelihood of a specific event in the future (e.g. an emergency room visit). Three methods are showcased in this paper including: general embedding, task-specific embedding, and a combination of the two which we have deemed "super" embedding for the purpose of this paper.

deep learning, neural network, representation, (23 more...)

AAAI Conferences

The Thirty-Second International Flairs Conference

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Providers & Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback