Goto

Collaborating Authors

 traceback


Universal Hirschberg for Width Bounded Dynamic Programs

arXiv.org Artificial Intelligence

Hirschberg's algorithm (1975) reduces the space complexity for the longest common subsequence problem from $O(N^2)$ to $O(N)$ via recursive midpoint bisection on a grid dynamic program (DP). We show that the underlying idea generalizes to a broad class of dynamic programs with local dependencies on directed acyclic graphs (DP DAGs). Modeling a DP as deterministic time evolution over a topologically ordered DAG with frontier width $ω$ and bounded in-degree, and assuming a max-type semiring with deterministic tie breaking, we prove that in a standard offline random-access model any such DP admits deterministic traceback in space $O(ω\log T + (\log T)^{O(1)})$ cells over a fixed finite alphabet, where $T$ is the number of states. Our construction replaces backward dynamic programs by forward-only recomputation and organizes the time order into a height-compressed recursion tree whose nodes expose small "middle frontiers'' across which every optimal path must pass. The framework yields near-optimal traceback bounds for asymmetric and banded sequence alignment, one-dimensional recurrences, and dynamic-programming formulations on graphs of bounded pathwidth. We also show that an $Ω(ω)$ space term (in bits) is unavoidable in forward single-pass models and discuss conjectured $\sqrt{T}$-type barriers in streaming settings, supporting the view that space-efficient traceback is a structural property of width-bounded DP DAGs rather than a peculiarity of grid-based algorithms.


Hear Your Code Fail, Voice-Assisted Debugging for Python

arXiv.org Artificial Intelligence

This staggering performance drain translates to roughly $61 billion in yearly financial losses throughout the worldwide software market, as quantified by the Standish Group's 2023 analysis of advancement workflows. The core inefficiency stems from traditional debugging's visual - only paradigm, where deve lopers must manually parse dense, technical stack traces while mentally reconstructing error context a process requiring intense cognitive focus that fragments attention between code logic and exception diagnostics. Neuroergonomic research from MIT's Human - Computer Interaction Lab reveals this context - switching triggers measurable cognitive overload, increasing prefrontal cortex activation by 60% compared to focused coding tasks, ultimately leading to mental fatigue that compounds debugging errors. The accessibility limitations of conventional debugging tools create additional barriers for approximately 12.5% of professional developers with visual impairments (World Health Organization, 2024), who struggle with screen readers that poorly interpret te chnical tracebacks. As Sarah Parker, a blind Python developer at Microsoft, testified during the 2023 Accessible Tech Symposium: "NVDA reads exception blocks as disconnected fragments I spend more time reassembling error narratives than solving actual prob lems."


UTrace: Poisoning Forensics for Private Collaborative Learning

arXiv.org Artificial Intelligence

Privacy-preserving machine learning (PPML) enables multiple data owners to contribute their data privately to a set of servers that run a secure multi-party computation (MPC) protocol to train a joint ML model. In these protocols, the input data remains private throughout the training process, and only the resulting model is made available. While this approach benefits privacy, it also exacerbates the risks of data poisoning, where compromised data owners induce undesirable model behavior by contributing malicious datasets. Existing MPC mechanisms can mitigate certain poisoning attacks, but these measures are not exhaustive. To complement existing poisoning defenses, we introduce UTrace: a framework for User-level Traceback of poisoning attacks in PPML. Utrace computes user responsibility scores using gradient similarity metrics aggregated across the most relevant samples in an owner's dataset. UTrace is effective at low poisoning rates and is resilient to poisoning attacks distributed across multiple data owners, unlike existing unlearning-based methods. We introduce methods for checkpointing gradients with low storage overhead, enabling traceback in the absence of data owners at deployment time. We also design several optimizations that reduce traceback time and communication in MPC. We provide a comprehensive evaluation of UTrace across four datasets from three data modalities (vision, text, and malware) and show its effectiveness against 10 poisoning attacks.


Searching by Code: a New SearchBySnippet Dataset and SnippeR Retrieval Model for Searching by Code Snippets

arXiv.org Artificial Intelligence

Code search is an important task that has seen many developments in recent years. However, previous attempts have mostly considered the problem of searching for code by a text query. We argue that using a code snippet (and possibly an associated traceback) as a query and looking for answers with bugfixing instructions and code samples is a natural use case that is not covered by existing approaches. Moreover, existing datasets use comments extracted from code rather than full-text descriptions as text, making them unsuitable for this use case. We present a new SearchBySnippet dataset implementing the search-by-code use case based on StackOverflow data; it turns out that in this setting, existing architectures fall short of the simplest BM25 baseline even after fine-tuning. We present a new single encoder model SnippeR that outperforms several strong baselines on the SearchBySnippet dataset with a result of 0.451 Recall@10; we propose the SearchBySnippet dataset and SnippeR as a new important benchmark for code search evaluation.