darpa
U.S. military funds AI tools to speed modeling of viral outbreaks
As SARS-CoV-2 radiated across the planet in 2020, epidemiologists scrambled to predict its spread--and its deadly consequences. Often, they turned to models that not only simulate viral transmission and hospitalization rates, but can also predict the effect of interventions: masks, vaccines, or travel bans. But in addition to being computationally intensive, models in epidemiology and other disciplines can be black boxes: millions of lines of legacy code subject to finicky tunings by operators at research organizations scattered around the world. They don't always provide clear guidance. "The models that are used are often kind of brittle and nonexplainable," says Erica Briscoe, who was a program manager for the Automating Scientific Knowledge Extraction and Modeling (ASKEM) project at the Defense Advanced Research Projects Agency (DARPA).
- North America > United States > California > San Francisco County > San Francisco (0.15)
- North America > United States > Virginia (0.05)
Token embeddings violate the manifold hypothesis
Robinson, Michael, Dey, Sourya, Chiang, Tony
To fully understand the behavior of a large language model (LLM) requires our understanding of its input space. If this input space differs from our assumption, our understanding of and conclusions about the LLM is likely flawed, regardless of its architecture. Here, we elucidate the structure of the token embeddings, the input domain for LLMs, both empirically and theoretically. We present a generalized and statistically testable model where the neighborhood of each token splits into well-defined signal and noise dimensions. This model is based on a generalization of a manifold called a fiber bundle, so we denote our hypothesis test as the ``fiber bundle null.'' Failing to reject the null is uninformative, but rejecting it at a specific token indicates that token has a statistically significant local structure, and so is of interest to us. By running our test over several open-source LLMs, each with unique token embeddings, we find that the null is frequently rejected, and so the token subspace is provably not a fiber bundle and hence also not a manifold. As a consequence of our findings, when an LLM is presented with two semantically equivalent prompts, and if one prompt contains a token implicated by our test, that prompt will likely exhibit more output variability proportional to the local signal dimension of the token.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > Michigan (0.04)
- (2 more...)
Probing the topology of the space of tokens with structured prompts
Robinson, Michael, Dey, Sourya, Kushner, Taisa
The set of tokens T, when embedded within the latent space X of a large language model (LLM) can be thought of as a finite sample drawn from a distribution supported on a topological subspace of X. One can ask what the smallest (in the sense of inclusion) subspace and simplest (in terms of fewest free parameters) distribution can account for such a sample. Previous work[1] suggests that the smallest topological subspace from which tokens can be drawn is not manifold, but has structure consistent with a stratified manifold. That paper relied upon knowing the token input embedding function T X, which given each token t T, ascribes a representation in X. Because embeddings preserve topological structure, in this paper, we will study T by equating it with the image of the token input embedding function, thereby treating T both as the set of tokens and as a subspace of X. This subspace is called the token subspace of X. Usually X is taken to be Euclidean space R
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > New York (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
AI Cyber Risk Benchmark: Automated Exploitation Capabilities
Ristea, Dan, Mavroudis, Vasilios, Hicks, Chris
We introduce a new benchmark for assessing AI models' capabilities and risks in automated software exploitation, focusing on their ability to detect and exploit vulnerabilities in real-world software systems. Using DARPA's AI Cyber Challenge (AIxCC) framework and the Nginx challenge project, a deliberately modified version of the widely used Nginx web server, we evaluate several leading language models, including OpenAI's o1-preview and o1-mini, Anthropic's Claude-3.5-sonnet-20241022 and Claude-3.5-sonnet-20240620, Google DeepMind's Gemini-1.5-pro, and OpenAI's earlier GPT-4o model. Our findings reveal that these models vary significantly in their success rates and efficiency, with o1-preview achieving the highest success rate of 64.71 percent and o1-mini and Claude-3.5-sonnet-20241022 providing cost-effective but less successful alternatives. This benchmark establishes a foundation for systematically evaluating the AI cyber risk posed by automated exploitation tools.
The structure of the token space for large language models
Robinson, Michael, Dey, Sourya, Sweet, Shauna
Large language models encode the correlational structure present in natural language by fitting segments of utterances (tokens) into a high dimensional ambient latent space upon which the models then operate. We assert that in order to develop a foundational, first-principles understanding of the behavior and limitations of large language models, it is crucial to understand the topological and geometric structure of this token subspace. In this article, we present estimators for the dimension and Ricci scalar curvature of the token subspace, and apply it to three open source large language models of moderate size: GPT2, LLEMMA7B, and MISTRAL7B. In all three models, using these measurements, we find that the token subspace is not a manifold, but is instead a stratified manifold, where on each of the individual strata, the Ricci curvature is significantly negative. We additionally find that the dimension and curvature correlate with generative fluency of the models, which suggest that these findings have implications for model behavior.
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > Michigan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (3 more...)
From Protoscience to Epistemic Monoculture: How Benchmarking Set the Stage for the Deep Learning Revolution
Koch, Bernard J., Peterson, David
Over the past decade, AI research has focused heavily on building ever-larger deep learning models. This approach has simultaneously unlocked incredible achievements in science and technology, and hindered AI from overcoming long-standing limitations with respect to explainability, ethical harms, and environmental efficiency. Drawing on qualitative interviews and computational analyses, our three-part history of AI research traces the creation of this "epistemic monoculture" back to a radical reconceptualization of scientific progress that began in the late 1980s. In the first era of AI research (1950s-late 1980s), researchers and patrons approached AI as a "basic" science that would advance through autonomous exploration and organic assessments of progress (e.g., peer-review, theoretical consensus). The failure of this approach led to a retrenchment of funding in the 1980s. Amid this "AI Winter," an intervention by the U.S. government reoriented the field towards measurable progress on tasks of military and commercial interest. A new evaluation system called "benchmarking" provided an objective way to quantify progress on tasks by focusing exclusively on increasing predictive accuracy on example datasets. Distilling science down to verifiable metrics clarified the roles of scientists, allowed the field to rapidly integrate talent, and provided clear signals of significance and progress. But history has also revealed a tradeoff to this streamlined approach to science: the consolidation around external interests and inherent conservatism of benchmarking has disincentivized exploration beyond scaling monoculture. In the discussion, we explain how AI's monoculture offers a compelling challenge to the belief that basic, exploration-driven research is needed for scientific progress. Implications for the spread of AI monoculture to other sciences in the era of generative AI are also discussed.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.05)
- (18 more...)
- Leisure & Entertainment (1.00)
- Information Technology (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- (3 more...)
Pentagon announces competition to develop new AI programs, plug holes in national cyber defense
Canopy CMO Yaron Litwin discusses how criminals are using deepfake technology to blackmail teens and generate child pornography. The Defense Advanced Research Projects Agency (DARPA) has announced a competition for companies to provide new artificial intelligence (AI) platforms to help identify and seal holes in national cybersecurity. "In the AI Cyber Challenge, our goal is to again create this kind of new ecosystem with a diverse set of creative cyber competitors, empowered by the country's top AI firms, all pointed at new ways to secure the software infrastructure that underlies our economy," DARPA Outreach told Fox News Digital. "Ultimately, we want to see the best and the brightest cybersecurity, computer science, program analysis and AI and machine learning from across industry and academia come together to participate in this challenge." DARPA announced the challenge at Black Hat USA 2023, calling the competition the AI Cyber Challenge (AIxCC), which will last two years and involve multiple rounds of qualification and competition for a $4 million prize.
- North America > United States > Oregon > Multnomah County > Portland (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- North America > United States > California > Los Angeles County > Pomona (0.05)
- Information Technology > Security & Privacy (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military (1.00)
The White House's 'AI Cyber Challenge' aims to crowdsource national security solutions
Our local and state level government systems are hacked and held ransom with disheartening regularity. At the Black Hat USA Conference in Las Vegas on Wednesday, the Biden Administration revealed its plans to better defend the nation's critical digital infrastructure: It's launching a DARPA-led challenge competition to build AI systems capable of proactively identifying and fixing software vulnerabilities. The "AI Cyber Challenge" (AIxCC) is a two-year development program open to competitors throughout the US. It's being hosted by DARPA in collaboration with Anthropic, Google, Microsoft and OpenAI. Those companies are providing both their expertise in the field and access to their AI technologies.
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military (1.00)
DARPA To Host Workshops For Trustworthy Artificial Intelligence - Potomac Officers Club
The Defense Advanced Research Projects Agency plans to conduct two workshops in 2023, aiming to convene academic, commercial and government experts to foster discussions on developing trustworthy artificial intelligence for national security purposes. DARPA's Information Innovation Office will host a virtual workshop from June 13 to 16 and an in-person workshop in Boston, Massachusetts, from July 31 to Aug. 2 as part of its AI Forward initiative. Each event will be limited to 100 attendees. Interested individuals are tasked with submitting an executive summary by Mar. According to DARPA, research efforts need to be directed toward foundational theory, engineering and human-AI teaming to delimit the scope of AI systems, ensure their real-world functionality and make them trustworthy partners for people.
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military (1.00)
Self-flying fighter jet takes off, fights against other aircraft and lands - without ANY human help
A modified F-16 fighter jet has successfully flown and fought another aircraft while being entirely controlled by artificial intelligence (AI). During test flights, the jet, known as'X-62A' or'VISTA', performed takeoffs, landings and combat manoeuvres without human intervention for a total of over 17 hours. They took place in December 2022 at the Edwards Air Force Base in California, USA, and showed that it is possible to completely hand over the reigns to AI in battle. The algorithms which powered it were developed by the Defense Advanced Research Projects Agency (DARPA) - the research branch of the US Department of Defense. This marks the first time AI has been used on a tactical aircraft as, prior to this milestone, it had only been used in computer simulations of F-16 dogfights.
- North America > United States > California (0.26)
- Europe > Ukraine (0.05)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military > Air Force (1.00)