AITopics | testset

Collaborating Authors

testset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

0b7f639ef28a9035a71f7e0c04c1d681-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 10:15:03 GMT

ForDM, due to high memory requirements, we were able to go up to aBatchEnsemble with an ensemble size of 8, while being able to use only batch size of 32. In addition, for this baseline we used a bigger memory GPU, unable tofitthetraining toourstandard 11GBGPU usedfortherestofour experiments. In the procedure of creating a Mixup [8] auxiliary dataset, we used a Beta distribution withα = 0.2. In Mixup augmentation, and valueλ [0,1] is sampled from a Beta distribution. We use batch size of 64.

agreementece, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Hardware (0.56)
Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

Do Generalisation Results Generalise?

Boglioni, Matteo, Sgobbi, Andrea, Tavernini, Gabriel, Rita, Francesco, Mosbach, Marius, Pimentel, Tiago

arXiv.org Artificial IntelligenceDec-9-2025

A large language model's (LLM's) out-of-distribution (OOD) generalisation ability is crucial to its deployment. Previous work assessing LLMs' generalisation performance, however, typically focuses on a single out-of-distribution dataset. This approach may fail to precisely evaluate the capabilities of the model, as the data shifts encountered once a model is deployed are much more diverse. In this work, we investigate whether OOD generalisation results generalise. More specifically, we evaluate a model's performance across multiple OOD testsets throughout a finetuning run; we then evaluate the partial correlation of performances across these testsets, regressing out in-domain performance. This allows us to assess how correlated are generalisation performances once in-domain performance is controlled for. Analysing OLMo2 and OPT, we observe no overarching trend in generalisation results: the existence of a positive or negative correlation between any two OOD testsets depends strongly on the specific choice of model analysed.

computational linguistic, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2512.07832

Country:

Europe (1.00)
North America > United States (0.46)
North America > Canada (0.46)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.76)

Add feedback

MolMole: Molecule Mining from Scientific Literature

Research, LG AI, Chun, Sehyun, Kim, Jiye, Jo, Ahra, Jo, Yeonsik, Oh, Seungyul, Lee, Seungjun, Ryoo, Kwangrok, Lee, Jongmin, Kim, Seung Hwan, Kang, Byung Jun, Lee, Soonyoung, Park, Jun Ha, Moon, Chanwoo, Ham, Jiwon, Lee, Haein, Han, Heejae, Byun, Jaeseung, Do, Soojong, Ha, Minju, Kim, Dongyun, Bae, Kyunghoon, Lim, Woohyung, Lee, Edward Hwayoung, Park, Yongmin, Yu, Jeongsang, Jo, Gerrard Jeongwon, Hong, Yeonjung, Yoo, Kyungjae, Han, Sehui, Lee, Jaewan, Park, Changyoung, Jeon, Kijeong, Yi, Sihyuk

arXiv.org Artificial IntelligenceMay-9-2025

The extraction of molecular structures and reaction data from scientific documents is challenging due to their varied, unstructured chemical formats and complex document layouts. To address this, we introduce MolMole, a vision-based deep learning framework that unifies molecule detection, reaction diagram parsing, and optical chemical structure recognition (OCSR) into a single pipeline for automating the extraction of chemical data directly from page-level documents. Recognizing the lack of a standard page-level benchmark and evaluation metric, we also present a testset of 550 pages annotated with molecule bounding boxes, reaction labels, and MOLfiles, along with a novel evaluation metric. Experimental results demonstrate that MolMole outperforms existing toolkits on both our benchmark and public datasets. The benchmark testset will be publicly available, and the MolMole toolkit will be accessible soon through an interactive demo on the LG AI Research website. For commercial inquiries, please contact us at contact ddu@lgresearch.ai.

machine learning, natural language, reaction diagram, (18 more...)

arXiv.org Artificial Intelligence

2505.03777

Country: Europe > Switzerland (0.46)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLVD: LSTM-based Explicit Motion Modeling in Latent Space for Blind Video Denoising

Rashid, Loay, Roheda, Siddharth, Unde, Amit

arXiv.org Artificial IntelligenceJan-10-2025

Video restoration plays a pivotal role in revitalizing degraded video content by rectifying imperfections caused by various degradations introduced during capturing (sensor noise, motion blur, etc.), saving/sharing (compression, resizing, etc.) and editing. This paper introduces a novel algorithm designed for scenarios where noise is introduced during video capture, aiming to enhance the visual quality of videos by reducing unwanted noise artifacts. We propose the Latent space LSTM Video Denoiser (LLVD), an end-to-end blind denoising model. LLVD uniquely combines spatial and temporal feature extraction, employing Long Short Term Memory (LSTM) within the encoded feature domain. This integration of LSTM layers is crucial for maintaining continuity and minimizing flicker in the restored video. Moreover, processing frames in the encoded feature domain significantly reduces computations, resulting in a very lightweight architecture. LLVD's blind nature makes it versatile for real, in-the-wild denoising scenarios where prior information about noise characteristics is not available. Experiments reveal that LLVD demonstrates excellent performance for both synthetic and captured noise. Specifically, LLVD surpasses the current State-Of-The-Art (SOTA) in RAW denoising by 0.3dB, while also achieving a 59\% reduction in computational complexity.

artificial intelligence, machine learning, video, (15 more...)

arXiv.org Artificial Intelligence

2501.05744

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Chen, Xingyu, Xu, Jiahao, Liang, Tian, He, Zhiwei, Pang, Jianhui, Yu, Dian, Song, Linfeng, Liu, Qiuzhi, Zhou, Mengfei, Zhang, Zhuosheng, Wang, Rui, Tu, Zhaopeng, Mi, Haitao, Yu, Dong

arXiv.org Artificial IntelligenceDec-30-2024

The remarkable performance of models like the OpenAI o1 can be attributed to their ability to emulate human-like long-time thinking during inference. These models employ extended chain-of-thought (CoT) processes, exploring multiple strategies to enhance problem-solving capabilities. However, a critical question remains: How to intelligently and efficiently scale computational resources during testing. This paper presents the first comprehensive study on the prevalent issue of overthinking in these models, where excessive computational resources are allocated for simple problems with minimal benefit. We introduce novel efficiency metrics from both outcome and process perspectives to evaluate the rational use of computational resources by o1-like models. Using a self-training paradigm, we propose strategies to mitigate overthinking, streamlining reasoning processes without compromising accuracy. Experimental results show that our approach successfully reduces computational overhead while preserving model performance across a range of testsets with varying difficulty levels, such as GSM8K, MATH500, GPQA, and AIME.

large language model, machine learning, o1-like model, (19 more...)

arXiv.org Artificial Intelligence

2412.21187

Country:

Asia (0.68)
North America (0.46)

Genre: Research Report > New Finding (0.66)

Industry: Education > Educational Setting > K-12 Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models

Banerjee, Somnath, Layek, Sayan, Shrawgi, Hari, Mandal, Rajarshi, Halder, Avik, Kumar, Shanu, Basu, Sagnik, Agrawal, Parag, Hazra, Rima, Mukherjee, Animesh

arXiv.org Artificial IntelligenceDec-23-2024

As LLMs are increasingly deployed in global applications, the importance of cultural sensitivity becomes paramount, ensuring that users from diverse backgrounds feel respected and understood. Cultural harm can arise when these models fail to align with specific cultural norms, resulting in misrepresentations or violations of cultural values. This work addresses the challenges of ensuring cultural sensitivity in LLMs, especially in small-parameter models that often lack the extensive training data needed to capture global cultural nuances. We present two key contributions: (1) A cultural harm test dataset, created to assess model outputs across different cultural contexts through scenarios that expose potential cultural insensitivities, and (2) A culturally aligned preference dataset, aimed at restoring cultural sensitivity through fine-tuning based on feedback from diverse annotators. These datasets facilitate the evaluation and enhancement of LLMs, ensuring their ethical and safe deployment across different cultural landscapes. Our results show that integrating culturally aligned feedback leads to a marked improvement in model behavior, significantly reducing the likelihood of generating culturally insensitive or harmful content. Ultimately, this work paves the way for more inclusive and respectful AI systems, fostering a future where LLMs can safely and ethically navigate the complexities of diverse cultural landscapes.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.1288

Country:

Asia > North Korea (0.27)
Europe > Russia (0.14)
Asia > Russia (0.14)
(33 more...)

Genre:

Research Report > New Finding (1.00)
Personal > Interview (0.93)

Industry:

Media > News (1.00)
Law > Criminal Law (1.00)
Law > Civil Rights & Constitutional Law (1.00)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Diffusion Prior Interpolation for Flexibility Real-World Face Super-Resolution

Yang, Jiarui, Dai, Tao, Zhu, Yufei, Li, Naiqi, Li, Jinmin, Xia, Shutao

arXiv.org Artificial IntelligenceDec-21-2024

Diffusion models represent the state-of-the-art in generative modeling. Due to their high training costs, many works leverage pre-trained diffusion models' powerful representations for downstream tasks, such as face super-resolution (FSR), through fine-tuning or prior-based methods. However, relying solely on priors without supervised training makes it challenging to meet the pixel-level accuracy requirements of discrimination task. Although prior-based methods can achieve high fidelity and high-quality results, ensuring consistency remains a significant challenge. In this paper, we propose a masking strategy with strong and weak constraints and iterative refinement for real-world FSR, termed Diffusion Prior Interpolation (DPI). We introduce conditions and constraints on consistency by masking different sampling stages based on the structural characteristics of the face. Furthermore, we propose a condition Corrector (CRT) to establish a reciprocal posterior sampling process, enhancing FSR performance by mutual refinement of conditions and samples. DPI can balance consistency and diversity and can be seamlessly integrated into pre-trained models. In extensive experiments conducted on synthetic and real datasets, along with consistency validation in face recognition, DPI demonstrates superiority over SOTA FSR methods. The code is available at \url{https://github.com/JerryYann/DPI}.

artificial intelligence, consistency, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.16552

Country: Asia > China (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.89)

Add feedback

Speech Editing -- a Summary

Kässmann, Tobias, Liu, Yining, Liu, Danni

arXiv.org Artificial IntelligenceJul-24-2024

With the rise of video production and social media, speech editing has become crucial for creators to address issues like mispronunciations, missing words, or stuttering in audio recordings. This paper explores text-based speech editing methods that modify audio via text transcripts without manual waveform editing. These approaches ensure edited audio is indistinguishable from the original by altering the mel-spectrogram. Recent advancements, such as context-aware prosody correction and advanced attention mechanisms, have improved speech editing quality. This paper reviews state-of-the-art methods, compares key metrics, and examines widely used datasets. The aim is to highlight ongoing issues and inspire further research and innovation in speech editing.

architecture, editspeech, speech editing, (14 more...)

arXiv.org Artificial Intelligence

2407.17172

Country: Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology (0.46)
Media > Music (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Communications > Social Media (0.66)

Add feedback

Adaptive Selection for Homogeneous Tools: An Instantiation in the RAG Scenario

Mu, Feiteng, Jiang, Yong, Zhang, Liwen, Liu, Chu, Li, Wenjie, Xie, Pengjun, Huang, Fei

arXiv.org Artificial IntelligenceJul-11-2024

Current research on tool learning primarily focuses on selecting the most effective tool from a wide array of options, often overlooking cost-effectiveness, a crucial factor in human problem-solving. In this paper, we address the selection of homogeneous tools by predicting both their performance and the associated cost required to accomplish a given task. We then assign queries to the optimal tools in a cost-effective manner. Our experimental results demonstrate that our method achieves higher performance at a lower cost compared to strong baseline approaches.

arxiv, language model, query, (16 more...)

arXiv.org Artificial Intelligence

2406.12429

Country: