Goto

Collaborating Authors

 turek


Reversing Large Language Models for Efficient Training and Fine-Tuning

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are known for their expensive and time-consuming training. Thus, oftentimes, LLMs are fine-tuned to address a specific task, given the pretrained weights of a pre-trained LLM considered a foundation model. In this work, we introduce memory-efficient, reversible architectures for LLMs, inspired by symmetric and symplectic differential equations, and investigate their theoretical properties. Different from standard, baseline architectures that store all intermediate activations, the proposed models use time-reversible dynamics to retrieve hidden states during backpropagation, relieving the need to store activations. This property allows for a drastic reduction in memory consumption, allowing for the processing of larger batch sizes for the same available memory, thereby offering improved throughput. In addition, we propose an efficient method for converting existing, non-reversible LLMs into reversible architectures through fine-tuning, rendering our approach practical for exploiting existing pre-trained models. Our results show comparable or improved performance on several datasets and benchmarks, on several LLMs, building a scalable and efficient path towards reducing the memory and computational costs associated with both training from scratch and fine-tuning of LLMs.


AI Weekly: DARPA seeks to better align AI with human intentions

#artificialintelligence

Did you miss a session at the Data Summit? This week in AI, DARPA, the emerging technologies R&D agency of the U.S. Defense Department, launched a new program that aims to "align" AI systems with human decision-makers in domains where there isn't an agreed-upon right answer. Elsewhere, two prominent cofounders from LinkedIn and DeepMind, Reid Hoffman and Mustafa Suleyman, announced a new AI startup called Inflection AI that seeks to develop software that allows humans to talk to computers using everyday language. In a press release describing the new three-and-a-half-year program, DARPA says that the goal is to "evaluate and build trusted algorithmic decision-makers for mission-critical Department of Defense operations." Dubbed "In the Moment," or ITM, it focuses on the process of alignment -- building AI systems that accomplish what they're expected to accomplish.


EETimes - Bringing Common Sense to 'Brittle' AI Algorithms

#artificialintelligence

The ongoing recalibration of AI research and development underscores a fundamental tenet of machine learning: We must learn to crawl before we can walk. Thus far, AI hype has mostly talked the talk rather than walking the walk. Returning to what appear to be engineering first principles, U.S. research efforts are attempting to move beyond current "brittle" AI models that excel at only specific tasks. The goal is developing more generalized models that can adapt much like humans do in new situations. Among those efforts is a Machine Common Sense program overseen by the Defense Advanced Research Projects Agency (DARPA) that seeks to imbue machine learning models with the kinds of commonplace reasoning displayed by among the fastest learners on the planet: infants.


Seeking Artificial Common Sense

Communications of the ACM

Although artificial intelligence (AI) has made great strides in recent years, it still struggles to provide useful guidance about unstructured events in the physical or social world. In short, computer programs lack common sense. "Think of it as the tens of millions of rules of thumb about how the world works that are almost never explicitly communicated," said Doug Lenat of Cycorp, in Austin, TX. Beyond these implicit rules, though, commonsense systems need to make proper deductions from them and from other, explicit statements, he said. "If you are unable to do logical reasoning, then you don't have common sense."


Where Are The Deepfakes In This Presidential Election?

NPR Technology

So far, few deepfakes have been used this political season. It's not because they aren't a potential threat, but because simpler deceptive tactics are still effective at spreading misinformation. So far, few deepfakes have been used this political season. It's not because they aren't a potential threat, but because simpler deceptive tactics are still effective at spreading misinformation. Despite people's fears, sophisticated, deceptive videos known as "deepfakes" haven't arrived this political season.


Researchers Use Artificial Intelligence to Fight Coronavirus

#artificialintelligence

The world's leading medical researchers are rushing to find a treatment for COVID-19 with the help of the most powerful and advanced supercomputers in the world. Researchers aross the globe are submitting potential treatments and cures to the COVID-19 High Performance Computing Consortium. The consortium, using a network of supercomputers and laboratotires, can run through simulations to narrow down or rule out drug compounds to use in a cure much faster than traditional methods. "It's a means by which one can begin to analyze tremendously complex or large problems," says Vice President of Technical Computing at IBM Cognitive Systems Dave Turek. "Pharmaceutical companies may have billions of compounds that could be potential drugs."


U.S. Defense Department Produces First Tools for Catching Deepfakes

#artificialintelligence

U.S. Defense Advanced Research Projects Agency researchers have created the first forensic tools for detecting deepfake videos. U.S. Defense Advanced Research Projects Agency (DARPA) researchers say they have created the first forensic tools for catching fake videos, known as "deepfakes," created with artificial intelligence (AI). The team says the most common technique for generating deepfakes involves using machine learning to graft one person's face onto another person's body. Matthew Turek, who leads DARPA's Media Forensics program, says his team discovered subtle cues in current images and videos manipulated by generative adversarial networks, which allowed them to detect alteration. For example, the researchers realized that faces in deepfakes rarely blink, and when they do, the eye movement is unnatural.


The Defense Department has produced the first tools for catching deepfakes

MIT Technology Review

The first forensics tools for catching revenge porn and fake news created with AI have been developed through a program run by the US Defense Department. Forensics experts have rushed to find ways of detecting videos synthesized and manipulated using machine learning because the technology makes it far easier to create convincing fake videos that could be used to sow disinformation or harass people. The most common technique for generating fake videos involves using machine learning to swap one person's face onto another's. The resulting videos, known as "deepfakes," are simple to make, and can be surprisingly realistic. Further tweaks, made by a skilled video editor, can make them seem even more real.


U.S. Angles to Retake Supercomputer Lead EE Times

#artificialintelligence

The latest Top500 list of the world's fastest supercomputers turns the spotlight on China, which overtook the United States in the total number of ranked systems and which scored the top two fastest installations on the list. Rather than target systems that test well on the Top500's distributed-memory version of the Linpack benchmarks (High Performance Linpack), the companies aim to render those measurements irrelevant on their way to beating China to exascale computing. China captured not only first and second place in the ranking of the fastest installed systems, but also won the majority share of ranked installations and took the aggregate performance lead, according to the November 2017 Top500 list, which was the 50th one to be published since the ranking debuted in June 1993. According to the Top500 organization, "There is no system from the USA under the Top3. The number of systems installed in China increased to a new record high of 202, compared to 160 on the last list. China now clearly shows a substantially larger number of installations than the USA. China now is also pulling ahead of the USA in the performance category, with China holding 35.4% of the overall installed performance, while the USA is second, with 29.6%."


The democratization of the supercomputers

#artificialintelligence

You may wonder why a company would want to use supercomputers, or high-performance computing (HPC), to make shampoos. But that's exactly what a European cosmetics company did to get the right mix of materials that would make the shampoo smooth. "Nobody wants to buy shampoo that is not stable--one in which ingredients get segregated like in a salad dressing so you have to shake it before you can use it," said Dave Turek, who leads HPC (high performance computing) strategy at International Business Machines (IBM) Corp., which worked with the shampoo maker on devising a cost-efficient HPC solution. According to Turek, ordinarily, one would start the shampoo-making process by getting a laboratory to mix the ingredients (water, detergent, thickeners, etc.), testing them and checking the mixture's properties like how it "behaves, sitting on the shelf" of a bathroom. "This will teach you something about the ratio of materials, and then you do another set of experiments with another ratio and so on."