AITopics | harness

Collaborating Authors

harness

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Why the World's Best AI Systems Are Still So Bad at Pokémon

TIME - TechJan-13-2026, 19:10:01 GMT

Why the World's Best AI Systems Are Still So Bad at Pokémon Pillay is an editorial fellow at TIME. Pillay is an editorial fellow at TIME. Right now, live on Twitch, you can watch three of the world's smartest AI systems-- GPT 5.2, Claude Opus 4.5, and Gemini 3 Pro --doing their best to beat classic Pokémon games. At least by human standards, they are not very good. The systems are slow, overconfident, and often confused.

large language model, machine learning, natural language, (18 more...)

TIME - Tech

Country:

North America > United States (0.05)
Europe > France (0.05)
Asia > China (0.05)
Africa (0.05)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HARNESS: Human-Agent Risk Navigation and Event Safety System for Proactive Hazard Forecasting in High-Risk DOE Environments

Elgedawy, Ran, Das, Sanjay, Seefried, Ethan, Wiggins, Gavin, Burchfield, Ryan, Hewit, Dana, Srinivasan, Sudarshan, Thomas, Todd, Balaprakash, Prasanna, Ghosal, Tirthankar

arXiv.org Artificial IntelligenceNov-17-2025

Operational safety at mission-critical work sites is a top priority given the complex and hazardous nature of daily tasks. This paper presents the Human-Agent Risk Navigation and Event Safety System (HARNESS), a modular AI framework designed to forecast hazardous events and analyze operational risks in U.S. Department of Energy (DOE) environments. HARNESS integrates Large Language Models (LLMs) with structured work data, historical event retrieval, and risk analysis to proactively identify potential hazards. A human-in-the-loop mechanism allows subject matter experts (SMEs) to refine predictions, creating an adaptive learning loop that enhances performance over time. By combining SME collaboration with iterative agentic reasoning, HARNESS improves the reliability and efficiency of predictive safety systems. Preliminary deployment shows promising results, with future work focusing on quantitative evaluation of accuracy, SME agreement, and decision latency reduction.

large language model, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2511.1081

Country:

North America > United States > Tennessee > Anderson County > Oak Ridge (0.07)
North America > United States > Washington > King County > Seattle (0.05)
North America > United States > New York > New York County > New York City (0.05)
(2 more...)

Genre: Research Report (0.64)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

deepSURF: Detecting Memory Safety Vulnerabilities in Rust Through Fuzzing LLM-Augmented Harnesses

Androutsopoulos, Georgios, Bianchi, Antonio

arXiv.org Artificial IntelligenceOct-28-2025

Although Rust ensures memory safety by default, it also permits the use of unsafe code, which can introduce memory safety vulnerabilities if misused. Unfortunately, existing tools for detecting memory bugs in Rust typically exhibit limited detection capabilities, inadequately handle Rust-specific types, or rely heavily on manual intervention. To address these limitations, we present deepSURF, a tool that integrates static analysis with Large Language Model (LLM)-guided fuzzing harness generation to effectively identify memory safety vulnerabilities in Rust libraries, specifically targeting unsafe code. deepSURF introduces a novel approach for handling generics by substituting them with custom types and generating tailored implementations for the required traits, enabling the fuzzer to simulate user-defined behaviors within the fuzzed library. Additionally, deepSURF employs LLMs to augment fuzzing harnesses dynamically, facilitating exploration of complex API interactions and significantly increasing the likelihood of exposing memory safety vulnerabilities. We evaluated deepSURF on 63 real-world Rust crates, successfully rediscovering 30 known memory safety bugs and uncovering 12 previously-unknown vulnerabilities (out of which 11 have been assigned RustSec IDs and 3 have been patched), demonstrating clear improvements over state-of-the-art tools.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2506.15648

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Greece > Attica > Athens (0.04)

Genre:

Research Report > Promising Solution (0.48)
Overview > Innovation (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Could a self-monitoring system for criminals replace prisons one day?

New ScientistOct-22-2025, 18:00:00 GMT

Could a self-monitoring system for criminals replace prisons one day? Future Chronicles is our regular speculative look at inventions yet to come. In this latest installment, we journey to 2050, when technology had been developed so that criminals could be monitored at home. "It's no surprise that the first countries to abolish prisons were Scandinavian " In the 2020s, the US was spending an eye-watering $182 billion a year on locking up its citizens. No other country imprisoned as many people or spent as much in doing so.

artificial intelligence, prisoner, social media, (17 more...)

New Scientist

Country:

Europe > Norway (0.18)
North America (0.15)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.71)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Feline stressed: Experts urge cat owners NOT to take their pets out for walks on trendy harnesses - amid fears they leave kitties feeling 'scared'

Daily Mail - Science & techOct-1-2025, 14:30:57 GMT

Shroud of Turin mystery deepens as surgeon spots hidden detail that points to Jesus' resurrection I was so happy after trying a trendy new cosmetic procedure. But 10 years later I suffered a devastating side effect... the doctor had lied I'm no longer sleeping with my husband - and never will again, says MOLLY RYDDELL. I love him, but counted down the moments until he climaxed. Then I couldn't bear it any more and the truth spilled out... so many women feel the same The'middle-class kinks' saving marriages: Wives reveal the eight buzzy sex trends that revived their lagging libidos - including the fantasy husbands are secretly obsessed with Lori Loughlin's husband Mossimo Giannulli seen with mystery brunette in tiny skirt day after shock split I'm a woman with autism... here are the signs you might be masking, even from yourself Cake-faced 90s sitcom star looks unrecognizable as she ditches the heavy eyeshadow for an LA errand run can you guess who? Trump dollar coin design released by Treasury... and it's inspired by the most iconic political photo of the century I've loved Taylor Swift for years. Mystery deepens over Hulk Hogan's death as his widow faces fresh anguish Body count from Houston's bayous rises as serial killer whispers grip city and residents are told: 'Be vigilant' Prison chief reveals exactly where Diddy could end up... and the one horrifying jail he MUST avoid Diddy sentenced to 50 MONTHS in prison for prostitution offenses as he's branded a vile and unrepentant woman beater Feline stressed: Experts urge cat owners NOT to take their pets out for walks on trendy harnesses - amid fears they leave kitties feeling'scared' A cat charity has urged owners not to use trendy harnesses on their cats - amid fears they leave felines feeling scared.

artificial intelligence, social media, taylor swift, (13 more...)

Daily Mail - Science & tech

Country:

Europe > Italy > Piedmont > Turin Province > Turin (0.24)
North America > Canada > Alberta (0.14)
North America > United States > Texas (0.04)
(11 more...)

Genre: Personal (0.46)

Industry:

Media > Television (1.00)
Media > Music (1.00)
Media > Film (1.00)
(7 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (0.68)

Add feedback

Orion: Fuzzing Workflow Automation

Bazalii, Max, Fleischer, Marius

arXiv.org Artificial IntelligenceSep-19-2025

Fuzz testing is one of the most effective techniques for finding software vulnerabilities. While modern fuzzers can generate inputs and monitor executions automatically, the overall workflow, from analyzing a codebase, to configuring harnesses, to triaging results, still requires substantial manual effort. Prior attempts focused on single stages such as harness synthesis or input minimization, leaving researchers to manually connect the pieces into a complete fuzzing campaign. We introduce Orion, a framework that automates the the manual bottlenecks of fuzzing by integrating LLM reasoning with traditional tools, allowing campaigns to scale to settings where human effort alone was impractical. Orion uses LLMs for code reasoning and semantic guidance, while relying on deterministic tools for verification, iterative refinement, and tasks that require precision. Across our benchmark suite, Orion reduces human effort by 46-204x depending on the workflow stage, and we demonstrate its effectiveness through the discovery of two previously unknown vulnerabilities in the widely used open-source clib library.

large language model, machine learning, orion, (21 more...)

arXiv.org Artificial Intelligence

2509.15195

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.05)
Europe > Portugal > Lisbon > Lisbon (0.04)
(4 more...)

Genre: Workflow (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

PentestJudge: Judging Agent Behavior Against Operational Requirements

Caldwell, Shane, Harley, Max, Kouremetis, Michael, Abruzzo, Vincent, Pearce, Will

arXiv.org Artificial IntelligenceAug-6-2025

We introduce PentestJudge, a system for evaluating the operations of penetration testing agents. PentestJudge is a large language model (LLM)-as-judge with access to tools that allow it to consume arbitrary trajectories of agent states and tool call history to determine whether a security agent's actions meet certain operating criteria that would be impractical to evaluate programmatically. We develop rubrics that use a tree structure to hierarchically collapse the penetration testing task for a particular environment into smaller, simpler, and more manageable sub-tasks and criteria until each leaf node represents simple yes-or-no criteria for PentestJudge to evaluate. Task nodes are broken down into different categories related to operational objectives, operational security, and tradecraft. LLM-as-judge scores are compared to human domain experts as a ground-truth reference, allowing us to compare their relative performance with standard binary classification metrics, such as F1 scores. We evaluate several frontier and open-source models acting as judge agents, with the best model reaching an F1 score of 0.83. We find models that are better at tool-use perform more closely to human experts. By stratifying the F1 scores by requirement type, we find even models with similar overall scores struggle with different types of questions, suggesting certain models may be better judges of particular operating criteria. We find that weaker and cheaper models can judge the trajectories of pentests performed by stronger and more expensive models, suggesting verification may be easier than generation for the penetration testing task. We share this methodology to facilitate future research in understanding the ability of judges to holistically and scalably evaluate the process quality of AI-based information security agents so that they may be confidently used in sensitive production environments.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.02921

Country:

North America > United States (1.00)
Europe > Italy > Abruzzo (0.40)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and Verification

Pinckney, Nathaniel, Deng, Chenhui, Ho, Chia-Tung, Tsai, Yun-Da, Liu, Mingjie, Zhou, Wenfei, Khailany, Brucek, Ren, Haoxing

arXiv.org Artificial IntelligenceJun-18-2025

We present the Comprehensive Verilog Design Problems (CVDP) benchmark, a new dataset and infrastructure to advance LLM and agent research in hardware design and verification. CVDP includes 783 problems across 13 task categories, covering RTL generation, verification, debugging, specification alignment, and technical Q&A authored by experienced hardware engineers. Problems are offered in both non-agentic and agentic formats. The benchmark introduces more realistic and challenging contexts than prior work, with state-of-the-art models achieving no more than 34% pass@1 on code generation. Agentic tasks$\unicode{x2013}$especially those involving RTL reuse and verification$\unicode{x2013}$are particularly difficult. Evaluation uses open-source tools and model scoring infrastructure, with comprehension tasks assessed via BLEU and LLM-based judging. CVDP reveals substantial gaps in current model capabilities, underscoring the need for continued research toward robust, real-world hardware design automation.

category, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2506.14074

Country: North America > United States (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

As the US and China lock horns, Malaysia hopes to harness an AI revolution

Al JazeeraMar-31-2025, 05:21:28 GMT

Kulim, Malaysia – When tech giant AT&S decided a few years ago that it needed to ramp up production to keep pace with the artificial intelligence (AI) boom, it did not look to its largest manufacturing facilities in China. The Austrian firm's plants in Chongqing and Shanghai – opened in 2022 and 2016, respectively – employ some 9,000 workers between them, churning out high-end components used in everything from consumer electronics to cars. But AT&S was at the same time coming to grips with the risks of concentrating production in one country. Like many tech firms grappling with the disruption of the COVID-19 pandemic and the trade war salvoes between the United States and China, AT&S decided it needed to diversify its supply chains. Malaysia quickly emerged at the top of the company's list of potential locations for its next plant.

artificial intelligence, at&s, malaysia, (17 more...)

Al Jazeera

Country:

Asia > China > Shanghai > Shanghai (0.24)
Asia > China > Chongqing Province > Chongqing (0.24)
Asia > Malaysia > Penang (0.06)
(13 more...)

Industry:

Information Technology (1.00)
Banking & Finance (0.95)
Government > Foreign Policy (0.34)
Government > Commerce (0.34)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

KHAIT: K-9 Handler Artificial Intelligence Teaming for Collaborative Sensemaking

Wilchek, Matthew, Wang, Linhan, Dickinson, Sally, Feuerbacher, Erica, Luther, Kurt, Batarseh, Feras A.

arXiv.org Artificial IntelligenceFeb-3-2025

In urban search and rescue (USAR) operations, communication between handlers and specially trained canines is crucial but often complicated by challenging environments and the specific behaviors canines are trained to exhibit when detecting a person. Since a USAR canine often works out of sight of the handler, the handler lacks awareness of the canine's location and situation, known as the 'sensemaking gap.' In this paper, we propose KHAIT, a novel approach to close the sensemaking gap and enhance USAR effectiveness by integrating object detection-based Artificial Intelligence (AI) and Augmented Reality (AR). Equipped with AI-powered cameras, edge computing, and AR headsets, KHAIT enables precise and rapid object detection from a canine's perspective, improving survivor localization. We evaluate this approach in a real-world USAR environment, demonstrating an average survival allocation time decrease of 22%, enhancing the speed and accuracy of operations.

handler, harness, khait, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3708359.3712107

2503.15524

Country:

North America > United States > Virginia (0.30)
Europe > Italy (0.16)
Asia > Middle East > Israel (0.14)
(6 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Consumer Health (0.93)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Networks (0.93)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.69)

Add feedback