AITopics | sherlock

Collaborating Authors

sherlock

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sherlock: Self-Correcting Reasoning in Vision-Language Models

Neural Information Processing SystemsJun-13-2026, 07:58:39 GMT

artificial intelligence, natural language, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.46)

Add feedback

Mind the Gap: Missing Cyber Threat Coverage in NIDS Datasets for the Energy Sector

Tory, Adrita Rahman, Hasan, Khondokar Fida, Rahman, Md Saifur, Koroniotis, Nickolaos, Moni, Mohammad Ali

arXiv.org Artificial IntelligenceNov-4-2025

Network Intrusion Detection Systems (NIDS) developed using publicly available datasets predominantly focus on enterprise environments, raising concerns about their effectiveness for converged Information Technology (IT) and Operational Technology (OT) in energy infrastructures. This study evaluates the representativeness of five widely used datasets: CIC-IDS2017, SWaT, WADI, Sherlock, and CIC-Modbus2023 against network-detectable MITRE ATT&CK techniques extracted from documented energy sector incidents. Using a structured five-step analytical approach, this article successfully developed and performed a gap analysis that identified 94 network observable techniques from an initial pool of 274 ATT&CK techniques. Sherlock dataset exhibited the highest mean coverage (0.56), followed closely by CIC-IDS2017 (0.55), while SWaT and WADI recorded the lowest scores (0.38). Combining CIC-IDS2017, Sherlock, and CIC-Modbus2023 achieved an aggregate coverage of 92%, highlighting their complementary strengths. The analysis identifies critical gaps, particularly in lateral movement and industrial protocol manipulation, providing a clear pathway for dataset enhancement and more robust NIDS evaluation in hybrid IT/OT energy environments.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2511.0036

Country: Oceania > Australia (0.68)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Energy (1.00)
Water & Waste Management > Water Management > Lifecycle > Treatment (0.47)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
(2 more...)

Add feedback

Sherlock: Reliable and Efficient Agentic Workflow Execution

Ro, Yeonju, Qiu, Haoran, Goiri, Íñigo, Fonseca, Rodrigo, Bianchini, Ricardo, Akella, Aditya, Wang, Zhangyang, Erez, Mattan, Choukse, Esha

arXiv.org Artificial IntelligenceNov-4-2025

With the increasing adoption of large language models (LLM), agentic workflows, which compose multiple LLM calls with tools, retrieval, and reasoning steps, are increasingly replacing traditional applications. However, such workflows are inherently error-prone: incorrect or partially correct output at one step can propagate or even amplify through subsequent stages, compounding the impact on the final output. Recent work proposes integrating verifiers that validate LLM output or actions, such as self-reflection, debate, or LLM-as-a-judge mechanisms. Yet, verifying every step introduces significant latency and cost overheads. In this work, we seek to answer three key questions: which nodes in a workflow are most error-prone and thus deserve costly verification, how to select the most appropriate verifier for each node, and how to use verification with minimal impact to latency? Our solution, Sherlock, addresses these using counterfactual analysis on agentic workflows to identify error-prone nodes and selectively attaching cost-optimal verifiers only where necessary. At runtime, Sherlock speculatively executes downstream tasks to reduce latency overhead, while verification runs in the background. If verification fails, execution is rolled back to the last verified output. Compared to the non-verifying baseline, Sherlock delivers an 18.3% accuracy gain on average across benchmarks. Sherlock reduces workflow execution time by up to 48.7% over non-speculative execution and lowers verification cost by 26.0% compared to the Monte Carlo search-based method, demonstrating that principled, fault-aware verification effectively balances efficiency and reliability in agentic workflows.

large language model, natural language, sherlock, (14 more...)

arXiv.org Artificial Intelligence

2511.0033

Genre:

Workflow (1.00)
Research Report (1.00)

Industry:

Information Technology (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Sherlock: Self-Correcting Reasoning in Vision-Language Models

Ding, Yi, Zhang, Ruqi

arXiv.org Artificial IntelligenceOct-24-2025

Reasoning Vision-Language Models (VLMs) have shown promising performance on complex multimodal tasks. However, they still face significant challenges: they are highly sensitive to reasoning errors, require large volumes of annotated data or accurate verifiers, and struggle to generalize beyond specific domains. To address these limitations, we explore self-correction as a strategy to enhance reasoning VLMs. We first conduct an in-depth analysis of reasoning VLMs' self-correction abilities and identify key gaps. Based on our findings, we introduce Sherlock, a self-correction and self-improvement training framework. Sherlock introduces a trajectory-level self-correction objective, a preference data construction method based on visual perturbation, and a dynamic $β$ for preference tuning. Once the model acquires self-correction capabilities using only 20k randomly sampled annotated data, it continues to self-improve without external supervision. Built on the Llama3.2-Vision-11B model, Sherlock achieves remarkable results across eight benchmarks, reaching an average accuracy of 64.1 with direct generation and 65.4 after self-correction. It outperforms LLaVA-CoT (63.2), Mulberry (63.9), and LlamaV-o1 (63.4) while using less than 20% of the annotated data.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2505.22651

Genre: Research Report > New Finding (0.66)

Industry: Education > Curriculum > Subject-Specific Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

SHERLOCK: Towards Dynamic Knowledge Adaptation in LLM-enhanced E-commerce Risk Management

Lu, Nan, Hu, Yurong, Fang, Jiaquan, Liu, Yan, Dong, Rui, Wang, Yiming, Lin, Rui, Xu, Shaoyi

arXiv.org Artificial IntelligenceOct-14-2025

The growth of the e-commerce industry has intensified the adversarial dynamics between shadow economy actors and risk management teams. Companies often conduct risk investigations into suspicious cases to identify emerging fraud patterns, thereby enhancing both preemptive risk prevention and post-hoc governance. However, the sheer volume of case analyses imposes a substantial workload on risk management analysts, as each case requires the integration of long-term expert experience and meticulous scrutiny across multiple risk dimensions. Additionally, individual disparities among analysts hinder the establishment of uniform and high-standard workflows. To address these challenges, we propose the SHERLOCK framework, which leverages the reasoning capabilities of large language models (LLMs) to assist analysts in risk investigations. Our approach consists of three primary components: (1) extracting risk management knowledge from multi-modal data and constructing a domain knowledge base (KB), (2) building an intelligent platform guided by the data flywheel paradigm that integrates daily operations, expert annotations, and model evaluations, with iteratively fine-tuning for preference alignment, and (3) introducing a Reflect & Refine (R&R) module that collaborates with the domain KB to establish a rapid response mechanism for evolving risk patterns. Experiments conducted on the real-world transaction dataset from JD dot com demonstrate that our method significantly improves the precision of both factual alignment and risk localization within the LLM analysis results. Deployment of the SHERLOCK-based LLM system on JD dot com has substantially enhanced the efficiency of case investigation workflows for risk managers.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.08948

Country: Asia > China (0.16)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Information Technology > Services > e-Commerce Services (0.73)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLM

Ma, Junxiao, Wang, Jingjing, Luo, Jiamin, Yu, Peiying, Zhou, Guodong

arXiv.org Artificial IntelligenceFeb-26-2025

Prior studies on Video Anomaly Detection (VAD) mainly focus on detecting whether each video frame is abnormal or not in the video, which largely ignore the structured video semantic information (i.e., what, when, and where does the abnormal event happen). With this in mind, we propose a new chat-paradigm \textbf{M}ulti-scene Video Abnormal Event Extraction and Localization (M-VAE) task, aiming to extract the abnormal event quadruples (i.e., subject, event type, object, scene) and localize such event. Further, this paper believes that this new task faces two key challenges, i.e., global-local spatial modeling and global-local spatial balancing. To this end, this paper proposes a Global-local Spatial-sensitive Large Language Model (LLM) named Sherlock, i.e., acting like Sherlock Holmes to track down the criminal events, for this M-VAE task. Specifically, this model designs a Global-local Spatial-enhanced MoE (GSM) module and a Spatial Imbalance Regulator (SIR) to address the two challenges respectively. Extensive experiments on our M-VAE instruction dataset show the significant advantages of Sherlock over several advanced Video-LLMs. This justifies the importance of global-local spatial information for the M-VAE task and the effectiveness of Sherlock in capturing such information.

information, proceedings, spatial information, (17 more...)

arXiv.org Artificial Intelligence

2502.18863

Country:

Oceania > Australia > New South Wales > Sydney (0.05)
Asia > China (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

T-KAER: Transparency-enhanced Knowledge-Augmented Entity Resolution Framework

Li, Lan, Fang, Liri, Liu, Yiren, Torvik, Vetle I., Ludaescher, Bertram

arXiv.org Artificial IntelligenceSep-30-2024

Entity resolution (ER) is the process of determining whether two representations refer to the same real-world entity and plays a crucial role in data curation and data cleaning. Recent studies have introduced the KAER framework, aiming to improve pre-trained language models by augmenting external knowledge. However, identifying and documenting the external knowledge that is being augmented and understanding its contribution to the model's predictions have received little to no attention in the research community. This paper addresses this gap by introducing T-KAER, the Transparency-enhanced Knowledge-Augmented Entity Resolution framework. To enhance transparency, three Transparency-related Questions (T-Qs) have been proposed: T-Q(1): What is the experimental process for matching results based on data inputs? T-Q(2): Which semantic information does KAER augment in the raw data inputs? T-Q(3): Which semantic information of the augmented data inputs influences the predictions? To address the T-Qs, T-KAER is designed to improve transparency by documenting the entity resolution processes in log files. In experiments, a citation dataset is used to demonstrate the transparency components of T-KAER. This demonstration showcases how T-KAER facilitates error analysis from both quantitative and qualitative perspectives, providing evidence on "what" semantic information is augmented and "why" the augmented knowledge influences predictions differently.

doduo, information, sherlock, (14 more...)

arXiv.org Artificial Intelligence

2410.00218

Country: North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Comprehending Semantic Types in JSON Data with Graph Neural Networks

Wei, Shuang, Mior, Michael J.

arXiv.org Artificial IntelligenceJul-24-2023

Semantic types are a more powerful and detailed way of describing data than atomic types such as strings or integers. They establish connections between columns and concepts from the real world, providing more nuanced and fine-grained information that can be useful for tasks such as automated data cleaning, schema matching, and data discovery. Existing deep learning models trained on large text corpora have been successful at performing single-column semantic type prediction for relational data. However, in this work, we propose an extension of the semantic type prediction problem to JSON data, labeling the types based on JSON Paths. Similar to columns in relational data, JSON Path is a query language that enables the navigation of complex JSON data structures by specifying the location and content of the elements. We use a graph neural network to comprehend the structural information within collections of JSON documents. Our model outperforms a state-of-the-art existing model in several cases. These results demonstrate the ability of our model to understand complex JSON data and its potential usage for JSON-related data processing tasks.

json data, semantic type, sherlock, (14 more...)

arXiv.org Artificial Intelligence

2307.12807

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > New York > Monroe County > Rochester (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
(12 more...)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sherlock.io: An Upgraded Machine Learning Monitoring System

#artificialintelligenceAug-26-2022, 00:45:32 GMT

In 2019, eBay started an initiative to upgrade the monitoring platform to handle increased monitoring signals. We decided to make these upgrades in order to cope with the vast number of queries our system encounters, which in turn revealed several engineering challenges to be overcome. In addition to ingestion, storage and query layer, we decided to upgrade the anomaly detection module because the anomaly detection results that were provided by the previous monitor platform to site SEC/SRE received complaints due to noise and inaccuracy. We revisited all of the use cases that have been used by the SEC/SRE. Business metrics These metrics, including listing number/minute, checkout number/minute and others are critical signals for eBay business.

algorithm, detection, sherlock, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.65)

Add feedback

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning

Hessel, Jack, Hwang, Jena D., Park, Jae Sung, Zellers, Rowan, Bhagavatula, Chandra, Rohrbach, Anna, Saenko, Kate, Choi, Yejin

arXiv.org Artificial IntelligenceJul-25-2022

Humans have remarkable capacity to reason abductively and hypothesize about what lies beyond the literal content of an image. By identifying concrete visual clues scattered throughout a scene, we almost can't help but draw probable inferences beyond the literal scene based on our everyday experience and knowledge about the world. For example, if we see a "20 mph" sign alongside a road, we might assume the street sits in a residential area (rather than on a highway), even if no houses are pictured. Can machines perform similar visual reasoning? We present Sherlock, an annotated corpus of 103K images for testing machine capacity for abductive reasoning beyond literal image contents. We adopt a free-viewing paradigm: participants first observe and identify salient clues within images (e.g., objects, actions) and then provide a plausible inference about the scene, given the clue. In total, we collect 363K (clue, inference) pairs, which form a first-of-its-kind abductive visual reasoning dataset. Using our corpus, we test three complementary axes of abductive reasoning. We evaluate the capacity of models to: i) retrieve relevant inferences from a large candidate corpus; ii) localize evidence for inferences via bounding boxes, and iii) compare plausible inferences to match human judgments on a newly-collected diagnostic corpus of 19K Likert-scale judgments. While we find that fine-tuning CLIP-RN50x64 with a multitask objective outperforms strong baselines, significant headroom exists between model performance and human agreement. Data, models, and leaderboard available at http://visualabduction.com/

inference, reasoning, sherlock, (16 more...)

arXiv.org Artificial Intelligence

2202.048

Country:

North America > United States > Ohio (0.04)
Oceania > Australia (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Government (0.93)
Leisure & Entertainment (0.93)
Transportation > Ground (0.46)
Law > Statutes (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Abductive Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback