AITopics | Industry

Collaborating Authors

Industry

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Neural Information Processing SystemsJun-19-2026, 14:12:31 GMT

Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest. Existing benchmarks have highlighted the impressive performance of LMMs in text recognition; however, their abilities in certain challenging tasks, such as text localization, handwritten content extraction, and logical reasoning, remain underexplored. To bridge this gap, we introduce OCRBench v2, a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks (4 more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios (31diverse scenarios), and thorough evaluation metrics, with 10,000human-verified questionanswering pairs and a high proportion of difficult samples. Moreover, we construct a private test set with 1,500 manually annotated images. The consistent evaluation trends observed across both public and private test sets validate the OCRBench v2's reliability. After carefully benchmarking state-of-the-art LMMs, we find that most LMMs score below 50 (100 in total) and suffer from five-type limitations, including less frequently encountered text recognition, fine-grained perception, layout perception, complex element parsing, and logical reasoning.

large language model, machine learning, pattern recognition, (21 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Education (1.00)
Banking & Finance (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.90)
(4 more...)

Add feedback

8c2e2925e75e501088004dd685f0ae81-Paper-Conference.pdf

Neural Information Processing SystemsJun-19-2026, 14:12:22 GMT

We study the sample complexity of Bayesian recovery for solving inverse problems with general prior, forward operator and noise distributions. We consider posterior sampling according to an approximate prior P, and establish sufficient conditions for stable and accurate recovery with high probability. Our main result is a non-asymptotic bound that shows that the sample complexity depends on (i) the intrinsic complexity of P, quantified by its approximate covering number, and (ii) concentration bounds for the forward operator and noise distributions. As a key application, we specialize to generative priors, where P is the pushforward of a latent distribution via a Deep Neural Network (DNN). We show that the sample complexity scales log-linearly with the latent dimension k, thus establishing the efficacy of DNN-based priors. Generalizing existing results on deterministic (i.e., non-Bayesian) recovery for the important problem of random sampling with an orthogonal matrix U, we show how the sample complexity is determined by the coherence of U with respect to the support of P. Hence, we establish that coherence plays a fundamental role in Bayesian recovery as well. Overall, our framework unifies and extends prior work, providing rigorous guarantees for the sample complexity of solving Bayesian inverse problems with arbitrary distributions.

artificial intelligence, machine learning, supp, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.92)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)

Add feedback

For Efficient Private LLMInference

Neural Information Processing SystemsJun-19-2026, 14:11:32 GMT

Private large language model (LLM) inference based on secure multi-party computation (MPC) achieves formal data privacy protection but suffers from significant latency overhead, especially for long input sequences. While key-value (KV) cache eviction and sparse attention algorithms have been proposed for efficient LLM inference in plaintext, they are not designed for MPC and cannot benefit private LLM inference directly. In this paper, we propose an accurate and MPC-friendly KV cache eviction framework, dubbed MPCACHE, building on the observation that historical tokens in a long sequence may have different effects on the downstream decoding. Hence, MPCACHE combines a look-once static eviction algorithm to discard unimportant KV cache and a query-aware dynamic selection algorithm to activate only a small subset of KV cache for attention computation. MPCACHE further incorporates a series of optimizations for efficient dynamic KV cache selection, including MPC-friendly similarity approximation, hierarchical KV cache clustering, and cross-layer index-sharing strategy. Extensive experiments demonstrate that MPCACHE consistently outperforms prior-art KV cache eviction baselines across different generation tasks and achieves 1.8 2.01 and 3.39 8.37 decoding latency and communication reduction on different sequence lengths, respectively. The code can be found here.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia (0.46)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

User 1000 Model4o 4o MistralMistral LLaMALLaMA QwenQwen Safety: 5/5 ModelSafety: 2/5

Neural Information Processing SystemsJun-19-2026, 14:10:31 GMT

Large language models (LLMs) typically generate identical or similar responses for all users given the same prompt, posing serious safety risks in high-stakes applications where user vulnerabilities differ widely. Existing safety evaluations primarily rely on context-independent metrics--such as factuality, bias, or toxicity--overlooking the fact that the same response may carry divergent risks depending on the user's background or condition. We introduce "personalized safety" to fill this gap and present PENGUIN--a benchmark comprising 14,000scenarios across seven sensitive domains with both context-rich and context-free variants. Evaluating six leading LLMs, we demonstrate that personalized user information significantly improves safety scores by 43.2%, confirming the effectiveness of personalization in safety alignment. However, not all context attributes contribute equally to safety enhancement. To address this, we develop RAISE--a training-free, two-stage agent framework that strategically acquires user-specific background. RAISE improves safety scores by up to 31.6%over six vanilla LLMs, while maintaining a low interaction cost of just 2.7 user queries on average. Our findings highlight the importance of selective information gathering in safety-critical domains and offer a practical solution for personalizing LLM responses without model retraining. This work establishes a foundation for safety research that adapts to individual user contexts rather than assuming a universal harm standard.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia (0.67)
North America > United States > California (0.45)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.92)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
Health & Medicine > Consumer Health (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

9 Claude tips and tricks to get more out of the AI chatbot

More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. You can access Claude on mobile and on the desktop. Breakthroughs, discoveries, and DIY tips sent six days a week. By signing up, you confirm you are 16+, will receive newsletters and promotional content and agree to our Terms of Use and acknowledge the data practices in our Privacy Policy . If you're discussing the best AI chatbots, Anthropic's Claude is going to get a mention.

artificial intelligence, claude, natural language, (11 more...)

Popular Science

Industry:

Law (0.57)
Information Technology > Security & Privacy (0.36)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)

Add feedback

CPO: Condition Preference Optimization for Controllable Image Generation

Neural Information Processing SystemsJun-19-2026, 14:01:33 GMT

To enhance controllability in text-to-image generation, ControlNet introduces image-based control signals, while ControlNet++ improves pixel-level cycle consistency between generated images and the input control signal. To avoid the prohibitive cost of back-propagating through the sampling process, ControlNet++ optimizes only low-noise timesteps (e.g., t < 200) using a single-step approximation, which not only ignores the contribution of high-noise timesteps but also introduces additional approximation errors. A straightforward alternative for optimizing controllability across all timesteps is Direct Preference Optimization (DPO), a fine-tuning method that increases model preference for more controllable images (Iw) over less controllable ones (Il). However, due to uncertainty in generative models, it is difficult to ensure that win-lose image pairs differ only in controllability while keeping other factors, such as image quality, fixed. To address this, we propose performing preference learning over control conditions rather than generated images.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Food & Agriculture (0.93)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Aiper Scuba V3 Ultra review: This 2,000 pool robot needs more time to cook

PCWorldJun-19-2026, 14:00:00 GMT

When you purchase through links in our articles, we may earn a small commission. Aiper's highest-end pool robot hits the market as a rocky work in progress and isn't easy to recommend in its current condition. Aiper's highest-end robot hits the market as a rocky work in progress and isn't easy to recommend in its current condition. Introduced at CES 2026, Aiper's latest pool robot is the Scuba V3 Ultra, the company's most powerful (and expensive) robot to hit the market to date. Designed with every feature currently available in the category, including waterline docking, plug-free charging, surface skimming, and AI-driven navigation, it's designed to compete directly with the top-shelf Beatbot AquaSense 2 Ultra .

artificial intelligence, robot, ultra, (12 more...)

PCWorld

Industry:

Information Technology > Security & Privacy (0.96)
Leisure & Entertainment > Games > Computer Games (0.52)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

SWE-SQL: Illuminating LLMPathways to Solve User SQLIssues in Real-World Applications

Neural Information Processing SystemsJun-19-2026, 13:59:39 GMT

Resolution of complex SQL issues persists as a significant bottleneck in realworld database applications. Current Large Language Models (LLMs), while adept at text-to-SQL translation, have not been rigorously evaluated on the more challenging task of debugging on SQL issues. In order to address this gap, we introduce BIRD-CRITIC, a new SQL issue debugging benchmark comprising 530 carefully curated PostgreSQL tasks (BIRD-CRITIC-PG) and 570 multi-dialect tasks (BIRD-CRITIC-MULTI), which are distilled from authentic user issues and replayed within new environments to facilitate rigorous and contamination-free evaluation. Baseline evaluations on BIRD-CRITIC underscore the task's complexity, with the leading reasoning model O3-MINI achieving only 38.87% success rate on BIRD-CRITIC-PG and 33.33% on BIRD-CRITIC-MULTI. Meanwhile, realizing open-source models for database tasks is crucial which can empower local development while safeguarding data privacy.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Genre: