testset
0b7f639ef28a9035a71f7e0c04c1d681-Supplemental-Conference.pdf
ForDM, due to high memory requirements, we were able to go up to aBatchEnsemble with an ensemble size of 8, while being able to use only batch size of 32. In addition, for this baseline we used a bigger memory GPU, unable tofitthetraining toourstandard 11GBGPU usedfortherestofour experiments. In the procedure of creating a Mixup [8] auxiliary dataset, we used a Beta distribution withα = 0.2. In Mixup augmentation, and valueλ [0,1] is sampled from a Beta distribution. We use batch size of 64.
Do Generalisation Results Generalise?
Boglioni, Matteo, Sgobbi, Andrea, Tavernini, Gabriel, Rita, Francesco, Mosbach, Marius, Pimentel, Tiago
A large language model's (LLM's) out-of-distribution (OOD) generalisation ability is crucial to its deployment. Previous work assessing LLMs' generalisation performance, however, typically focuses on a single out-of-distribution dataset. This approach may fail to precisely evaluate the capabilities of the model, as the data shifts encountered once a model is deployed are much more diverse. In this work, we investigate whether OOD generalisation results generalise. More specifically, we evaluate a model's performance across multiple OOD testsets throughout a finetuning run; we then evaluate the partial correlation of performances across these testsets, regressing out in-domain performance. This allows us to assess how correlated are generalisation performances once in-domain performance is controlled for. Analysing OLMo2 and OPT, we observe no overarching trend in generalisation results: the existence of a positive or negative correlation between any two OOD testsets depends strongly on the specific choice of model analysed.
- North America > Mexico > Mexico City > Mexico City (0.04)
- North America > Dominican Republic (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (12 more...)
MolMole: Molecule Mining from Scientific Literature
Research, LG AI, Chun, Sehyun, Kim, Jiye, Jo, Ahra, Jo, Yeonsik, Oh, Seungyul, Lee, Seungjun, Ryoo, Kwangrok, Lee, Jongmin, Kim, Seung Hwan, Kang, Byung Jun, Lee, Soonyoung, Park, Jun Ha, Moon, Chanwoo, Ham, Jiwon, Lee, Haein, Han, Heejae, Byun, Jaeseung, Do, Soojong, Ha, Minju, Kim, Dongyun, Bae, Kyunghoon, Lim, Woohyung, Lee, Edward Hwayoung, Park, Yongmin, Yu, Jeongsang, Jo, Gerrard Jeongwon, Hong, Yeonjung, Yoo, Kyungjae, Han, Sehui, Lee, Jaewan, Park, Changyoung, Jeon, Kijeong, Yi, Sihyuk
The extraction of molecular structures and reaction data from scientific documents is challenging due to their varied, unstructured chemical formats and complex document layouts. To address this, we introduce MolMole, a vision-based deep learning framework that unifies molecule detection, reaction diagram parsing, and optical chemical structure recognition (OCSR) into a single pipeline for automating the extraction of chemical data directly from page-level documents. Recognizing the lack of a standard page-level benchmark and evaluation metric, we also present a testset of 550 pages annotated with molecule bounding boxes, reaction labels, and MOLfiles, along with a novel evaluation metric. Experimental results demonstrate that MolMole outperforms existing toolkits on both our benchmark and public datasets. The benchmark testset will be publicly available, and the MolMole toolkit will be accessible soon through an interactive demo on the LG AI Research website. For commercial inquiries, please contact us at contact ddu@lgresearch.ai.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Switzerland > Vaud > Lausanne (0.04)
LLVD: LSTM-based Explicit Motion Modeling in Latent Space for Blind Video Denoising
Rashid, Loay, Roheda, Siddharth, Unde, Amit
Video restoration plays a pivotal role in revitalizing degraded video content by rectifying imperfections caused by various degradations introduced during capturing (sensor noise, motion blur, etc.), saving/sharing (compression, resizing, etc.) and editing. This paper introduces a novel algorithm designed for scenarios where noise is introduced during video capture, aiming to enhance the visual quality of videos by reducing unwanted noise artifacts. We propose the Latent space LSTM Video Denoiser (LLVD), an end-to-end blind denoising model. LLVD uniquely combines spatial and temporal feature extraction, employing Long Short Term Memory (LSTM) within the encoded feature domain. This integration of LSTM layers is crucial for maintaining continuity and minimizing flicker in the restored video. Moreover, processing frames in the encoded feature domain significantly reduces computations, resulting in a very lightweight architecture. LLVD's blind nature makes it versatile for real, in-the-wild denoising scenarios where prior information about noise characteristics is not available. Experiments reveal that LLVD demonstrates excellent performance for both synthetic and captured noise. Specifically, LLVD surpasses the current State-Of-The-Art (SOTA) in RAW denoising by 0.3dB, while also achieving a 59\% reduction in computational complexity.
- North America > United States > District of Columbia > Washington (0.04)
- Asia > Middle East > Israel (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Chen, Xingyu, Xu, Jiahao, Liang, Tian, He, Zhiwei, Pang, Jianhui, Yu, Dian, Song, Linfeng, Liu, Qiuzhi, Zhou, Mengfei, Zhang, Zhuosheng, Wang, Rui, Tu, Zhaopeng, Mi, Haitao, Yu, Dong
The remarkable performance of models like the OpenAI o1 can be attributed to their ability to emulate human-like long-time thinking during inference. These models employ extended chain-of-thought (CoT) processes, exploring multiple strategies to enhance problem-solving capabilities. However, a critical question remains: How to intelligently and efficiently scale computational resources during testing. This paper presents the first comprehensive study on the prevalent issue of overthinking in these models, where excessive computational resources are allocated for simple problems with minimal benefit. We introduce novel efficiency metrics from both outcome and process perspectives to evaluate the rational use of computational resources by o1-like models. Using a self-training paradigm, we propose strategies to mitigate overthinking, streamlining reasoning processes without compromising accuracy. Experimental results show that our approach successfully reduces computational overhead while preserving model performance across a range of testsets with varying difficulty levels, such as GSM8K, MATH500, GPQA, and AIME.
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models
Banerjee, Somnath, Layek, Sayan, Shrawgi, Hari, Mandal, Rajarshi, Halder, Avik, Kumar, Shanu, Basu, Sagnik, Agrawal, Parag, Hazra, Rima, Mukherjee, Animesh
As LLMs are increasingly deployed in global applications, the importance of cultural sensitivity becomes paramount, ensuring that users from diverse backgrounds feel respected and understood. Cultural harm can arise when these models fail to align with specific cultural norms, resulting in misrepresentations or violations of cultural values. This work addresses the challenges of ensuring cultural sensitivity in LLMs, especially in small-parameter models that often lack the extensive training data needed to capture global cultural nuances. We present two key contributions: (1) A cultural harm test dataset, created to assess model outputs across different cultural contexts through scenarios that expose potential cultural insensitivities, and (2) A culturally aligned preference dataset, aimed at restoring cultural sensitivity through fine-tuning based on feedback from diverse annotators. These datasets facilitate the evaluation and enhancement of LLMs, ensuring their ethical and safe deployment across different cultural landscapes. Our results show that integrating culturally aligned feedback leads to a marked improvement in model behavior, significantly reducing the likelihood of generating culturally insensitive or harmful content. Ultimately, this work paves the way for more inclusive and respectful AI systems, fostering a future where LLMs can safely and ethically navigate the complexities of diverse cultural landscapes.
- Asia > North Korea (0.27)
- Europe > Russia (0.14)
- Asia > Russia (0.14)
- (33 more...)
- Research Report > New Finding (1.00)
- Personal > Interview (0.93)
- Media > News (1.00)
- Law > Criminal Law (1.00)
- Law > Civil Rights & Constitutional Law (1.00)
- (8 more...)
Diffusion Prior Interpolation for Flexibility Real-World Face Super-Resolution
Yang, Jiarui, Dai, Tao, Zhu, Yufei, Li, Naiqi, Li, Jinmin, Xia, Shutao
Diffusion models represent the state-of-the-art in generative modeling. Due to their high training costs, many works leverage pre-trained diffusion models' powerful representations for downstream tasks, such as face super-resolution (FSR), through fine-tuning or prior-based methods. However, relying solely on priors without supervised training makes it challenging to meet the pixel-level accuracy requirements of discrimination task. Although prior-based methods can achieve high fidelity and high-quality results, ensuring consistency remains a significant challenge. In this paper, we propose a masking strategy with strong and weak constraints and iterative refinement for real-world FSR, termed Diffusion Prior Interpolation (DPI). We introduce conditions and constraints on consistency by masking different sampling stages based on the structural characteristics of the face. Furthermore, we propose a condition Corrector (CRT) to establish a reciprocal posterior sampling process, enhancing FSR performance by mutual refinement of conditions and samples. DPI can balance consistency and diversity and can be seamlessly integrated into pre-trained models. In extensive experiments conducted on synthetic and real datasets, along with consistency validation in face recognition, DPI demonstrates superiority over SOTA FSR methods. The code is available at \url{https://github.com/JerryYann/DPI}.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
Speech Editing -- a Summary
Kässmann, Tobias, Liu, Yining, Liu, Danni
With the rise of video production and social media, speech editing has become crucial for creators to address issues like mispronunciations, missing words, or stuttering in audio recordings. This paper explores text-based speech editing methods that modify audio via text transcripts without manual waveform editing. These approaches ensure edited audio is indistinguishable from the original by altering the mel-spectrogram. Recent advancements, such as context-aware prosody correction and advanced attention mechanisms, have improved speech editing quality. This paper reviews state-of-the-art methods, compares key metrics, and examines widely used datasets. The aim is to highlight ongoing issues and inspire further research and innovation in speech editing.
- Information Technology (0.46)
- Media > Music (0.34)
- Information Technology > Artificial Intelligence > Speech (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Information Technology > Communications > Social Media (0.66)
Adaptive Selection for Homogeneous Tools: An Instantiation in the RAG Scenario
Mu, Feiteng, Jiang, Yong, Zhang, Liwen, Liu, Chu, Li, Wenjie, Xie, Pengjun, Huang, Fei
Current research on tool learning primarily focuses on selecting the most effective tool from a wide array of options, often overlooking cost-effectiveness, a crucial factor in human problem-solving. In this paper, we address the selection of homogeneous tools by predicting both their performance and the associated cost required to accomplish a given task. We then assign queries to the optimal tools in a cost-effective manner. Our experimental results demonstrate that our method achieves higher performance at a lower cost compared to strong baseline approaches.