practicality
Ask, Attend, Attack: An Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models
While image-to-text models have demonstrated significant advancements in various vision-language tasks, they remain susceptible to adversarial attacks. Existing white-box attacks on image-to-text models require access to the architecture, gradients, and parameters of the target model, resulting in low practicality. Although the recently proposed gray-box attacks have improved practicality, they suffer from semantic loss during the training process, which limits their targeted attack performance. To advance adversarial attacks of image-to-text models, this paper focuses on a challenging scenario: decision-based black-box targeted attacks where the attackers only have access to the final output text and aim to perform targeted attacks. Specifically, we formulate the decision-based black-box targeted attack as a large-scale optimization problem. To efficiently solve the optimization problem, a three-stage process \textit{Ask, Attend, Attack}, called \textit{AAA}, is proposed to coordinate with the solver.
Street Review: A Participatory AI-Based Framework for Assessing Streetscape Inclusivity
Mushkani, Rashid, Koseki, Shin
City streets, sidewalks, and public areas often serve as primary interaction points among diverse user groups, including residents, commuters, and visitors ( Gehl, 2011). These spaces carry social, economic, and cultural signifi - cance that influences navigation and user experience ( Mitra ˇ sinovi c & Mehta, 2021). Municipal governments and planning agencies recognize the importance of inclusive public spaces but face challenges in operation - alizing inclusivity ( Anttiroiko & De Jong, 2020). Traditional approaches may draw on universal design principles intended to accommodate a broad range of users, but these frameworks often take a one-size-fits-all approach that prioritizes physical accessibility over the social and cul - tural dimensions of public space use ( Low, 2020). In multicultural cities, where multiple languages, cultures, and religious practices converge, these complexities become particularly evident ( Fan et al., 2023; Lit - man, 2025; Salgado et al., 2021; Youngbloom et al., 2023). Research on inclusive design has provided valuable insights, but few methods combine qualitative depth with quantitative scale to under - stand inclusivity in urban contexts ( Anttiroiko & De Jong, 2020; Mehta, 2019; Zamanifard et al., 2019). Ethnographic research and interviews offer detailed perspectives on lived experience, while computer vision and machine learning enable assessments at larger scales ( Ibrahim et al., 2020). However, large-scale computational approaches often overlook intersectional dimensions ( Zhu et al., 2025). This gap calls for integrated models that merge qualitative and quantitative methodologies.
- North America > Canada > Quebec > Montreal (0.05)
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (6 more...)
- Law (0.93)
- Health & Medicine > Therapeutic Area (0.46)
Scalable Supervising Software Agents with Patch Reasoner
Xu, Junjielong, Tan, Boyin, Liu, Xiaoyuan, Peng, Chao, Gao, Pengfei, He, Pinjia
While large language model agents have advanced software engineering tasks, the unscalable nature of existing test-based supervision is limiting the potential improvement of data scaling. The reason is twofold: (1) building and running test sandbox is rather heavy and fragile, and (2) data with high-coverage tests is naturally rare and threatened by test hacking via edge cases. In this paper, we propose R4P, a patch verifier model to provide scalable rewards for training and testing SWE agents via reasoning. We consider that patch verification is fundamentally a reasoning task, mirroring how human repository maintainers review patches without writing and running new reproduction tests. To obtain sufficient reference and reduce the risk of reward hacking, R4P uses a group-wise objective for RL training, enabling it to verify multiple patches against each other's modification and gain a dense reward for stable training. R4P achieves 72.2% Acc. for verifying patches from SWE-bench-verified, surpassing OpenAI o3. To demonstrate R4P's practicality, we design and train a lite scaffold, Mini-SE, with pure reinforcement learning where all rewards are derived from R4P. As a result, Mini-SE achieves 26.2% Pass@1 on SWE-bench-verified, showing a 10.0% improvement over the original Qwen3-32B. This can be further improved to 32.8% with R4P for test-time scaling. Furthermore, R4P verifies patches within a second, 50x faster than testing on average. The stable scaling curves of rewards and accuracy along with high efficiency reflect R4P's practicality.
Reviewer # 1 2 > the computational complexity is not studied or evaluated so the practicality of this approach might look questionable
We would like to thank the reviewers for their time and helpful comments. We will clarify/fix the paper as suggested. Thank you for pointing that out. Also, the Batch-RL setup is constrained by samples and not computational complexity. There was a tradeoff in writing and explaining the ideas while satisfying the page limit constraints.
Towards Evaluation for Real-World LLM Unlearning
Miao, Ke, Hu, Yuke, Li, Xiaochen, Bao, Wenjie, Liu, Zhihao, Qin, Zhan, Ren, Kui
This paper analyzes the limitations of existing unlearning evaluation metrics in terms of practicality, exactness, and robustness in real-world LLM unlearning scenarios. To overcome these limitations, we propose a new metric called Distribution Correction-based Unlearning Evaluation (DCUE). It identifies core tokens and corrects distributional biases in their confidence scores using a validation set. The evaluation results are quantified using the Kolmogorov-Smirnov test. Experimental results demonstrate that DCUE overcomes the limitations of existing metrics, which also guides the design of more practical and reliable unlearning algorithms in the future.
- North America > United States > Virginia (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
Ask, Attend, Attack: An Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models
While image-to-text models have demonstrated significant advancements in various vision-language tasks, they remain susceptible to adversarial attacks. Existing white-box attacks on image-to-text models require access to the architecture, gradients, and parameters of the target model, resulting in low practicality. Although the recently proposed gray-box attacks have improved practicality, they suffer from semantic loss during the training process, which limits their targeted attack performance. To advance adversarial attacks of image-to-text models, this paper focuses on a challenging scenario: decision-based black-box targeted attacks where the attackers only have access to the final output text and aim to perform targeted attacks. Specifically, we formulate the decision-based black-box targeted attack as a large-scale optimization problem.
Negotiative Alignment: Embracing Disagreement to Achieve Fairer Outcomes -- Insights from Urban Studies
Mushkani, Rashid, Berard, Hugo, Koseki, Shin
Cities are not monolithic; they are arenas of negotiation among groups that hold varying needs, values, and experiences. Conventional methods of urban assessment -- from standardized surveys to AI-driven evaluations -- frequently rely on a single consensus metric (e.g., an average measure of inclusivity or safety). Although such aggregations simplify design decisions, they risk obscuring the distinct perspectives of marginalized populations. In this paper, we present findings from a community-centered study in Montreal involving 35 residents with diverse demographic and social identities, particularly wheelchair users, seniors, and LGBTQIA2+ individuals. Using rating and ranking tasks on 20 urban sites, we observe that disagreements are systematic rather than random, reflecting structural inequalities, differing cultural values, and personal experiences of safety and accessibility. Based on these empirical insights, we propose negotiative alignment, an AI framework that treats disagreement as an essential input to be preserved, analyzed, and addressed. Negotiative alignment builds on pluralistic models by dynamically updating stakeholder preferences through multi-agent negotiation mechanisms, ensuring no single perspective is marginalized. We outline how this framework can be integrated into urban analytics -- and other decision-making contexts -- to retain minority viewpoints, adapt to changing stakeholder concerns, and enhance fairness and accountability. The study demonstrates that preserving and engaging with disagreement, rather than striving for an artificial consensus, can produce more equitable and responsive AI-driven outcomes in urban design.
- North America > Canada > Quebec > Montreal (0.26)
- Europe > Austria > Vienna (0.14)
- North America > United States > New York (0.04)
- (7 more...)