sacramento
Indictment of ex-Newsom aide hints at feds' probe into state's earlier investigation of video game giant
Things to Do in L.A. Tap to enable a layout that focuses on the article. Dana Williamson, Gov. Gavin Newsom's former chief of staff, leaves the Robert T. Matsui United States Courthouse in Sacramento after being arrested in a federal public corruption probe involving multiple counts of bank and wire fraud on Wednesday. This is read by an automated voice. Please report any issues or inconsistencies here . Newsom's former chief of staff and two political operatives face federal corruption charges for fraud, including misusing campaign funds for luxury purchases.
- North America > United States > California > Los Angeles County > Los Angeles (0.06)
- North America > United States > New York (0.04)
- North America > United States > California > Los Angeles County > Santa Monica (0.04)
- (4 more...)
- Leisure & Entertainment (1.00)
- Law > Litigation (1.00)
- Law > Criminal Law (1.00)
- (4 more...)
Automated Circuit Interpretation via Probe Prompting
Mechanistic interpretability aims to understand neural networks by identifying which learned features mediate specific behaviors. Attribution graphs reveal these feature pathways, but interpreting them requires extensive manual analysis -- a single prompt can take approximately 2 hours for an experienced circuit tracer. We present probe prompting, an automated pipeline that transforms attribution graphs into compact, interpretable subgraphs built from concept-aligned supernodes. Starting from a seed prompt and target logit, we select high-influence features, generate concept-targeted yet context-varying probes, and group features by cross-prompt activation signatures into Semantic, Relationship, and Say-X categories using transparent decision rules. Across five prompts including classic "capitals" circuits, probe-prompted subgraphs preserve high explanatory coverage while compressing complexity (Completeness 0.83, mean across circuits; Replacement 0.54). Compared to geometric clustering baselines, concept-aligned groups exhibit higher behavioral coherence: 2.3x higher peak-token consistency (0.425 vs 0.183) and 5.8x higher activation-pattern similarity (0.762 vs 0.130), despite lower geometric compactness. Entity-swap tests reveal a layerwise hierarchy: early-layer features transfer robustly (64% transfer rate, mean layer 6.3), while late-layer Say-X features specialize for output promotion (mean layer 16.4), supporting a backbone-and-specialization view of transformer computation. We release code (https://github.com/peppinob-ol/attribution-graph-probing), an interactive demo (https://huggingface.co/spaces/Peppinob/attribution-graph-probing), and minimal artifacts enabling immediate reproduction and community adoption.
- North America > United States > Oklahoma (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > California > Sacramento County > Sacramento (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
New Mets pitcher Justin Garza credits video game MLB The Show for helping save career
Fox News Flash top sports headlines are here. Check out what's clicking on Foxnews.com. Pitcher Justin Garza was thinking about quitting the game he loved during the COVID-19 pandemic in 2020 as he struggled in the minor leagues. However, as Garza is joining the New York Mets now after a deal with the San Francisco Giants, he credits one thing to saving his career. Boston Red Sox starting pitcher Justin Garza, #63, reacts after giving up a home run to Minnesota Twins designated hitter Byron Buxton during the first inning at Target Field.
- North America > United States > New York (0.28)
- North America > United States > California > San Francisco County > San Francisco (0.28)
- North America > United States > Montana > Roosevelt County (0.26)
- (4 more...)
ReAgent: Reversible Multi-Agent Reasoning for Knowledge-Enhanced Multi-Hop QA
Xinjie, Zhao, Gao, Fan, Yang, Rui, Chen, Yingjian, Wang, Yuyang, Zhu, Ying, Tang, Jiacheng, Li, Irene
Recent advances in large language models (LLMs) have significantly improved multi-hop question answering (QA) through direct Chain-of-Thought (CoT) reasoning. However, the irreversible nature of CoT leads to error accumulation, making it challenging to correct mistakes in multi-hop reasoning. This paper introduces ReAgent: a Reversible multi-Agent collaborative framework augmented with explicit backtracking mechanisms, enabling reversible multi-hop reasoning. By incorporating text-based retrieval, information aggregation and validation, our system can detect and correct errors mid-reasoning, leading to more robust and interpretable QA outcomes. The framework and experiments serve as a foundation for future work on error-tolerant QA systems. Empirical evaluations across three benchmarks indicate ReAgent's efficacy, yielding average about 6\% improvements against baseline models.
- North America > United States > California > Los Angeles County > Los Angeles (0.32)
- North America > Canada (0.14)
- Europe > Spain (0.14)
- (3 more...)
- Health & Medicine (0.46)
- Leisure & Entertainment > Sports > Olympic Games (0.31)
In-Simulation Testing of Deep Learning Vision Models in Autonomous Robotic Manipulators
Humeniuk, Dmytro, Braiek, Houssem Ben, Reid, Thomas, Khomh, Foutse
Testing autonomous robotic manipulators is challenging due to the complex software interactions between vision and control components. A crucial element of modern robotic manipulators is the deep learning based object detection model. The creation and assessment of this model requires real world data, which can be hard to label and collect, especially when the hardware setup is not available. The current techniques primarily focus on using synthetic data to train deep neural networks (DDNs) and identifying failures through offline or online simulation-based testing. However, the process of exploiting the identified failures to uncover design flaws early on, and leveraging the optimized DNN within the simulation to accelerate the engineering of the DNN for real-world tasks remains unclear. To address these challenges, we propose the MARTENS (Manipulator Robot Testing and Enhancement in Simulation) framework, which integrates a photorealistic NVIDIA Isaac Sim simulator with evolutionary search to identify critical scenarios aiming at improving the deep learning vision model and uncovering system design flaws. Evaluation of two industrial case studies demonstrated that MARTENS effectively reveals robotic manipulator system failures, detecting 25 % to 50 % more failures with greater diversity compared to random test generation. The model trained and repaired using the MARTENS approach achieved mean average precision (mAP) scores of 0.91 and 0.82 on real-world images with no prior retraining. Further fine-tuning on real-world images for a few epochs (less than 10) increased the mAP to 0.95 and 0.89 for the first and second use cases, respectively. In contrast, a model trained solely on real-world data achieved mAPs of 0.8 and 0.75 for use case 1 and use case 2 after more than 25 epochs.
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > California > Sacramento County > Sacramento (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- Research Report > New Finding (0.66)
- Research Report > Experimental Study (0.46)
- Information Technology > Robotics & Automation (0.46)
- Transportation > Ground > Road (0.46)
Proof Automation with Large Language Models
Lu, Minghai, Delaware, Benjamin, Zhang, Tianyi
Interactive theorem provers such as Coq are powerful tools to formally guarantee the correctness of software. However, using these tools requires significant manual effort and expertise. While Large Language Models (LLMs) have shown promise in automatically generating informal proofs in natural language, they are less effective at generating formal proofs in interactive theorem provers. In this paper, we conduct a formative study to identify common mistakes made by LLMs when asked to generate formal proofs. By analyzing 520 proof generation errors made by GPT-3.5, we found that GPT-3.5 often identified the correct high-level structure of a proof, but struggled to get the lower-level details correct. Based on this insight, we propose PALM, a novel generate-then-repair approach that first prompts an LLM to generate an initial proof and then leverages targeted symbolic methods to iteratively repair low-level problems. We evaluate PALM on a large dataset that includes more than 10K theorems. Our results show that PALM significantly outperforms other state-of-the-art approaches, successfully proving 76.6% to 180.4% more theorems. Moreover, PALM proves 1270 theorems beyond the reach of existing approaches. We also demonstrate the generalizability of PALM across different LLMs.
- North America > United States > California > Sacramento County > Sacramento (0.05)
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
- (5 more...)
What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing
Yang, Chenyang, Hong, Yining, Lewis, Grace A., Wu, Tongshuang, Kästner, Christian
Machine learning models make mistakes, yet sometimes it is difficult to identify the systematic problems behind the mistakes. Practitioners engage in various activities, including error analysis, testing, auditing, and red-teaming, to form hypotheses of what can go (or has gone) wrong with their models. To validate these hypotheses, practitioners employ data slicing to identify relevant examples. However, traditional data slicing is limited by available features and programmatic slicing functions. In this work, we propose SemSlicer, a framework that supports semantic data slicing, which identifies a semantically coherent slice, without the need for existing features. SemSlicer uses Large Language Models to annotate datasets and generate slices from any user-defined slicing criteria. We show that SemSlicer generates accurate slices with low cost, allows flexible trade-offs between different design dimensions, reliably identifies under-performing data slices, and helps practitioners identify useful data slices that reflect systematic problems.
- North America > United States > California > Sacramento County > Sacramento (0.05)
- Asia > Singapore (0.05)
- Asia > Indonesia > Bali (0.04)
- (14 more...)
- Government (0.46)
- Education (0.46)
Efficient Detection of Toxic Prompts in Large Language Models
Liu, Yi, Yu, Junzhe, Sun, Huijia, Shi, Ling, Deng, Gelei, Chen, Yuqi, Liu, Yang
Large language models (LLMs) like ChatGPT and Gemini have significantly advanced natural language processing, enabling various applications such as chatbots and automated content generation. However, these models can be exploited by malicious individuals who craft toxic prompts to elicit harmful or unethical responses. These individuals often employ jailbreaking techniques to bypass safety mechanisms, highlighting the need for robust toxic prompt detection methods. Existing detection techniques, both blackbox and whitebox, face challenges related to the diversity of toxic prompts, scalability, and computational efficiency. In response, we propose ToxicDetector, a lightweight greybox method designed to efficiently detect toxic prompts in LLMs. ToxicDetector leverages LLMs to create toxic concept prompts, uses embedding vectors to form feature vectors, and employs a Multi-Layer Perceptron (MLP) classifier for prompt classification. Our evaluation on various versions of the LLama models, Gemma-2, and multiple datasets demonstrates that ToxicDetector achieves a high accuracy of 96.39\% and a low false positive rate of 2.00\%, outperforming state-of-the-art methods. Additionally, ToxicDetector's processing time of 0.0780 seconds per prompt makes it highly suitable for real-time applications. ToxicDetector achieves high accuracy, efficiency, and scalability, making it a practical method for toxic prompt detection in LLMs.
- Europe > Austria > Vienna (0.14)
- North America > United States > California > Sacramento County > Sacramento (0.05)
- Asia > China > Shanghai > Shanghai (0.04)
- (6 more...)
- Law (0.46)
- Government (0.46)
- Information Technology (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
ROCAS: Root Cause Analysis of Autonomous Driving Accidents via Cyber-Physical Co-mutation
Feng, Shiwei, Ye, Yapeng, Shi, Qingkai, Cheng, Zhiyuan, Xu, Xiangzhe, Cheng, Siyuan, Choi, Hongjun, Zhang, Xiangyu
As Autonomous driving systems (ADS) have transformed our daily life, safety of ADS is of growing significance. While various testing approaches have emerged to enhance the ADS reliability, a crucial gap remains in understanding the accidents causes. Such post-accident analysis is paramount and beneficial for enhancing ADS safety and reliability. Existing cyber-physical system (CPS) root cause analysis techniques are mainly designed for drones and cannot handle the unique challenges introduced by more complex physical environments and deep learning models deployed in ADS. In this paper, we address the gap by offering a formal definition of ADS root cause analysis problem and introducing ROCAS, a novel ADS root cause analysis framework featuring cyber-physical co-mutation. Our technique uniquely leverages both physical and cyber mutation that can precisely identify the accident-trigger entity and pinpoint the misconfiguration of the target ADS responsible for an accident. We further design a differential analysis to identify the responsible module to reduce search space for the misconfiguration. We study 12 categories of ADS accidents and demonstrate the effectiveness and efficiency of ROCAS in narrowing down search space and pinpointing the misconfiguration. We also show detailed case studies on how the identified misconfiguration helps understand rationale behind accidents.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > California > Sacramento County > Sacramento (0.05)
- (15 more...)
- Transportation > Ground > Road (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Robotics & Automation (1.00)
- (2 more...)
B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests
Chen, Mouxiang, Liu, Zhongxin, Tao, He, Hong, Yusu, Lo, David, Xia, Xin, Sun, Jianling
Selecting the best code solution from multiple generated ones is an essential task in code generation, which can be achieved by using some reliable validators (e.g., developer-written test cases) for assistance. Since reliable test cases are not always available and can be expensive to build in practice, researchers propose to automatically generate test cases to assess code solutions. However, when both code solutions and test cases are plausible and not reliable, selecting the best solution becomes challenging. Although some heuristic strategies have been proposed to tackle this problem, they lack a strong theoretical guarantee and it is still an open question whether an optimal selection strategy exists. Our work contributes in two ways. First, we show that within a Bayesian framework, the optimal selection strategy can be defined based on the posterior probability of the observed passing states between solutions and tests. The problem of identifying the best solution is then framed as an integer programming problem. Second, we propose an efficient approach for approximating this optimal (yet uncomputable) strategy, where the approximation error is bounded by the correctness of prior knowledge. We then incorporate effective prior knowledge to tailor code generation tasks. Both theoretical and empirical studies confirm that existing heuristics are limited in selecting the best solutions with plausible test cases. Our proposed approximated optimal strategy B4 significantly surpasses existing heuristics in selecting code solutions generated by large language models (LLMs) with LLM-generated tests, achieving a relative performance improvement by up to 50% over the strongest heuristic and 246% over the random selection in the most challenging scenarios. Our code is publicly available at https://github.com/ZJU-CTAG/B4.
- North America > United States > California > Sacramento County > Sacramento (0.05)
- Asia > China > Zhejiang Province > Hangzhou (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (5 more...)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)