AITopics

Country: Asia > Japan (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Neural Information Processing SystemsApr-25-2026, 06:35:11 GMT

2bce32ed409f5ebcee2a7b417ad9beed-Supplemental.pdf

actor, artificial intelligence, machine learning, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.96)

Neural Information Processing SystemsFeb-12-2026, 11:58:01 GMT

SnAKe: Bayesian Optimizationvia Pathwise Exploration

Weconsidertheproblem 9 Figure 5: V ofa SnAK Table 1: the multiplied 103.

artificial intelligence, machine learning, optimization, (10 more...)

Country:

Europe > United Kingdom > England > Greater London > London (0.05)
Europe > Germany (0.05)
South America > Paraguay (0.05)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-11-2026, 16:32:29 GMT

RecommendationModels

Although synchronous AR training is designed to have higher training efficiency,asynchronous PStraining would beabetter choice for training speed when there are stragglers (slow workers) in the shared cluster, especially under limited computing resources.

artificial intelligence, machine learning, staleness, (17 more...)

Country:

Europe > Czechia > Prague (0.05)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-8-2026, 00:49:05 GMT

2bce32ed409f5ebcee2a7b417ad9beed-Supplemental.pdf

actor, gradient, rllib flow, (13 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)

arXiv.org Artificial IntelligenceSep-23-2025

Fast ECoT: Efficient Embodied Chain-of-Thought via Thoughts Reuse

Duan, Zhekai, Zhang, Yuan, Geng, Shikai, Liu, Gaowen, Boedecker, Joschka, Lu, Chris Xiaoxuan

Embodied Chain-of-Thought (ECoT) reasoning enhances vision-language-action (VLA) models by improving performance and interpretability through intermediate reasoning steps. However, its sequential autoregressive token generation introduces significant inference latency, limiting real-time deployment. We propose Fast ECoT, an inference-time acceleration method that exploits the structured and repetitive nature of ECoT to (1) cache and reuse high-level reasoning across timesteps and (2) parallelise the generation of modular reasoning steps. Additionally, we introduce an asynchronous scheduler that decouples reasoning from action decoding, further boosting responsiveness. Fast ECoT requires no model changes or additional training and integrates easily into existing VLA pipelines. Experiments in both simulation (LIBERO) and real-world robot tasks show up to a 7.5% reduction in latency with comparable or improved task success rate and reasoning faithfulness, bringing ECoT policies closer to practical real-time deployment.

large language model, machine learning, natural language, (18 more...)

2506.07639

Country: Europe (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceMay-15-2025

Tests as Prompt: A Test-Driven-Development Benchmark for LLM Code Generation

Cui, Yi

We introduce WebApp1K, a novel benchmark for evaluating large language models (LLMs) in test-driven development (TDD) tasks, where test cases serve as both prompt and verification for code generation. Unlike traditional approaches relying on natural language prompts, our benchmark emphasizes the ability of LLMs to interpret and implement functionality directly from test cases, reflecting real-world software development practices. Comprising 1000 diverse challenges across 20 application domains, the benchmark evaluates LLMs on their ability to generate compact, functional code under the constraints of context length and multi-feature complexity. Our findings highlight instruction following and in-context learning as critical capabilities for TDD success, surpassing the importance of general coding proficiency or pretraining knowledge. Through comprehensive evaluation of 19 frontier models, we reveal performance bottlenecks, such as instruction loss in long prompts, and provide a detailed error analysis spanning multiple root causes. This work underscores the practical value of TDD-specific benchmarks and lays the foundation for advancing LLM capabilities in rigorous, application-driven coding scenarios.

benchmark, large language model, machine learning, (19 more...)

2505.09027

Country: North America > United States (0.04)

Genre: Research Report (0.72)

Industry:

Information Technology (1.00)
Media (0.94)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

arXiv.org Artificial IntelligenceDec-18-2024

I0T: Embedding Standardization Method Towards Zero Modality Gap

An, Na Min, Kim, Eunki, Thorne, James, Shim, Hyunjung

Contrastive Language-Image Pretraining (CLIP) enables zero-shot inference in downstream tasks such as image-text retrieval and classification. However, recent works extending CLIP suffer from the issue of modality gap, which arises when the image and text embeddings are projected to disparate manifolds, deviating from the intended objective of image-text contrastive learning. We discover that this phenomenon is linked to the modality-specific characteristic that each image/text encoder independently possesses and propose two methods to address the modality gap: (1) a post-hoc embedding standardization method, $\text{I0T}_{\text{post}}$ that reduces the modality gap approximately to zero and (2) a trainable method, $\text{I0T}_{\text{async}}$, to alleviate the modality gap problem by adding two normalization layers for each encoder. Our I0T framework can significantly reduce the modality gap while preserving the original embedding representations of trained models with their locked parameters. In practice, $\text{I0T}_{\text{post}}$ can serve as an alternative explainable automatic evaluation metric of widely used CLIPScore (CLIP-S).

artificial intelligence, machine learning, natural language, (16 more...)

2412.14384

Country: Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceSep-19-2024

A Case Study of Web App Coding with OpenAI Reasoning Models

Cui, Yi

This paper presents a case study of coding tasks by the latest reasoning models of OpenAI, i.e. o1-preview and o1-mini, in comparison with other frontier models. The o1 models deliver SOTA results for WebApp1K, a single-task benchmark. To this end, we introduce WebApp1K-Duo, a harder benchmark doubling number of tasks and test cases. The new benchmark causes the o1 model performances to decline significantly, falling behind Claude 3.5. Moreover, they consistently fail when confronted with atypical yet correct test cases, a trap non-reasoning models occasionally avoid. We hypothesize that the performance variability is due to instruction comprehension. Specifically, the reasoning mechanism boosts performance when all expectations are captured, meanwhile exacerbates errors when key expectations are missed, potentially impacted by input lengths. As such, we argue that the coding success of reasoning models hinges on the top-notch base model and SFT to ensure meticulous adherence to instructions.

async, benchmark, fetchmock, (15 more...)

2409.13773

Country: Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.53)

Industry: Information Technology > Software (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.62)

arXiv.org Artificial IntelligenceJul-30-2024

WebApp1K: A Practical Code-Generation Benchmark for Web App Development

Cui, Yi

We introduce WebApp1K, a practical code-generation benchmark to measure LLM ability to develop web apps. This benchmark aims to calibrate LLM output and aid the models to progressively improve code correctness and functionality. The benchmark is lightweight and easy to run. We present the initial version of WebApp1K, and share our findings of running the benchmark against the latest frontier LLMs. First, open source LLMs deliver impressive performance, closely trailing behind GPT-4o and Claude 3.5. Second, model size has strong correlation with code correctness. Third, no prompting techniques have been found to lift performance either universally to all models, or significantly to a single model.

application, benchmark, llm, (14 more...)

2408.00019

Genre: Research Report (0.71)

Industry: Information Technology > Software (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)