AITopics | length normalization

Collaborating Authors

length normalization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

e099c1c9699814af0be873a175361713-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 10:24:52 GMT

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

South America > Argentina (0.04)
North America > United States > Virginia (0.04)
South America > Uruguay (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SimPO: Simple Preference Optimization with a Reference-Free Reward Y u Meng

Neural Information Processing SystemsOct-10-2025, 19:12:16 GMT

Additionally, we introduce a target reward margin to the Bradley-Terry objective to encourage a larger margin between the winning and losing responses, further improving the algorithm's performance.

alpacaeval 2, arxiv preprint arxiv, simpo, (14 more...)

Neural Information Processing Systems

Country:

South America > Argentina (0.04)
North America > United States > Virginia (0.04)
South America > Uruguay (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs

Peng, Shangpin, Wang, Weinong, Tian, Zhuotao, Yang, Senqiao, Wu, Xing, Xu, Haotian, Zhang, Chengquan, Isobe, Takashi, Hu, Baotian, Zhang, Min

arXiv.org Artificial IntelligenceAug-18-2025

Direct Preference Optimization (DPO) has become a cornerstone of reinforcement learning from human feedback (RLHF) due to its simplicity and efficiency. However, existing DPO-based approaches typically treat all preference pairs uniformly, ignoring critical variations in their inherent quality and learning utility, leading to suboptimal data utilization and performance. To address this challenge, we propose Omni-DPO, a dual-perspective optimization framework that jointly accounts for (1) the inherent quality of each preference pair and (2) the model's evolving performance on those pairs. By adaptively weighting samples according to both data quality and the model's learning dynamics during training, Omni-DPO enables more effective training data utilization and achieves better performance. Experimental results on various models and benchmarks demonstrate the superiority and generalization capabilities of Omni-DPO. On textual understanding tasks, Gemma-2-9b-it finetuned with Omni-DPO beats the leading LLM, Claude 3 Opus, by a significant margin of 6.7 points on the Arena-Hard benchmark. On mathematical reasoning tasks, Omni-DPO consistently outperforms the baseline methods across all benchmarks, providing strong empirical evidence for the effectiveness and robustness of our approach. Code and models will be available at https://github.com/pspdada/Omni-DPO.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.10054

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An approach to melodic segmentation and classification based on filtering with the Haar-wavelet

Velarde, Gissel, Weyde, Tillman, Meredith, David

arXiv.org Artificial IntelligenceApr-30-2025

We present a novel method of classification and segmentation of melodies in symbolic representation. The method is based on filtering pitch as a signal over time with the Haar-wavelet, and we evaluate it on two tasks. The filtered signal corresponds to a single-scale signal ws from the continuous Haar wavelet transform. The melodies are first segmented using local maxima or zero-crossings of w_s. The segments of w_s are then classified using the k-nearest neighbour algorithm with Euclidian and city-block distances. The method proves more effective than using unfiltered pitch signals and Gestalt-based segmentation when used to recognize the parent works of segments from Bach's Two-Part Inventions (BWV 772-786). When used to classify 360 Dutch folk tunes into 26 tune families, the performance of the method is comparable to the use of pitch signals, but not as good as that of string-matching methods based on multiple features.

data quality, machine learning, segmentation, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1080/09298215.2013.841713

2504.20822

Country:

North America > United States > New York (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Data Science > Data Quality > Data Transformation (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

REFA: Reference Free Alignment for multi-preference optimization

Gupta, Taneesh, Madhavan, Rahul, Zhang, Xuchao, Bansal, Chetan, Rajmohan, Saravan

arXiv.org Artificial IntelligenceJan-9-2025

We introduce REFA, a family of reference-free alignment methods that optimize over multiple user preferences while enforcing fine-grained length control. Our approach integrates deviation-based weighting to emphasize high-quality responses more strongly, length normalization to prevent trivial short-response solutions, and an EOS-probability regularizer to mitigate dataset-induced brevity biases. Theoretically, we show that under the Uncertainty Reduction with Sequence Length Assertion (URSLA), naive length normalization can still incentivize length-based shortcuts. By contrast, REFA corrects these subtle incentives, guiding models toward genuinely more informative and higher-quality outputs. Empirically, REFA sets a new state-of-the-art among reference-free alignment methods, producing richer responses aligned more closely with human preferences. Compared to a base supervised fine-tuned (SFT) mistral-7b model that achieves 8.4% length-controlled win rate (LC-WR) and 6.2% win rate (WR), our best REFA configuration attains 21.62% LC-WR and 19.87% WR on the AlpacaEval v2 benchmark. This represents a substantial improvement over both the strongest multi-preference baseline, InfoNCA (16.82% LC-WR, 10.44% WR), and the strongest reference-free baseline, SimPO (20.01% LC-WR, 17.65% WR)

length normalization, optimization, probability, (14 more...)

arXiv.org Artificial Intelligence

2412.16378

Country: Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

Zhao, Hanyang, Winata, Genta Indra, Das, Anirban, Zhang, Shi-Xiong, Yao, David D., Tang, Wenpin, Sahu, Sambit

arXiv.org Artificial IntelligenceOct-5-2024

Recently, numerous preference optimization algorithms have been introduced as extensions to the Direct Preference Optimization (DPO) family. While these methods have successfully aligned models with human preferences, there is a lack of understanding regarding the contributions of their additional components. Moreover, fair and consistent comparisons are scarce, making it difficult to discern which components genuinely enhance downstream performance. In this work, we propose RainbowPO, a unified framework that demystifies the effectiveness of existing DPO methods by categorizing their key components into seven broad directions. We integrate these components into a single cohesive objective, enhancing the performance of each individual element. Through extensive experiments, we demonstrate that RainbowPO outperforms existing DPO variants. Additionally, we provide insights to guide researchers in developing new DPO methods and assist practitioners in their implementations.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.04203

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)

Add feedback

Preference Alignment Improves Language Model-Based TTS

Tian, Jinchuan, Zhang, Chunlei, Shi, Jiatong, Zhang, Hao, Yu, Jianwei, Watanabe, Shinji, Yu, Dong

arXiv.org Artificial IntelligenceSep-18-2024

Recent advancements in text-to-speech (TTS) have shown that language model (LM)-based systems offer competitive performance to their counterparts. Further optimization can be achieved through preference alignment algorithms, which adjust LMs to align with the preferences of reward models, enhancing the desirability of the generated content. This study presents a thorough empirical evaluation of how preference alignment algorithms, particularly Direct Preference Optimization (DPO), enhance LM-based TTS. With a 1.15B parameter LM-based TTS model, we demonstrate that preference alignment consistently improves intelligibility, speaker similarity, and proxy subjective evaluation scores, with the latter two metrics surpassing even human speech in certain evaluations. We also show preference alignment is applicable to low-resource scenarios and effectively generalized to out-of-domain applications.

arxiv preprint arxiv, language model, optimization, (12 more...)

arXiv.org Artificial Intelligence

2409.12403

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.50)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

SimPO: Simple Preference Optimization with a Reference-Free Reward

Meng, Yu, Xia, Mengzhou, Chen, Danqi

arXiv.org Artificial IntelligenceJul-8-2024

Direct Preference Optimization (DPO) is a widely used offline preference optimization algorithm that reparameterizes reward functions in reinforcement learning from human feedback (RLHF) to enhance simplicity and training stability. In this work, we propose SimPO, a simpler yet more effective approach. The effectiveness of SimPO is attributed to a key design: using the average log probability of a sequence as the implicit reward. This reward formulation better aligns with model generation and eliminates the need for a reference model, making it more compute and memory efficient. Additionally, we introduce a target reward margin to the Bradley-Terry objective to encourage a larger margin between the winning and losing responses, further enhancing the algorithm's performance. We compare SimPO to DPO and its latest variants across various state-of-the-art training setups, including both base and instruction-tuned models like Mistral and Llama3. We evaluated on extensive instruction-following benchmarks, including AlpacaEval 2, MT-Bench, and the recent challenging Arena-Hard benchmark. Our results demonstrate that SimPO consistently and significantly outperforms existing approaches without substantially increasing response length. Specifically, SimPO outperforms DPO by up to 6.4 points on AlpacaEval 2 and by up to 7.5 points on Arena-Hard. Our top-performing model, built on Llama3-8B-Instruct, achieves a remarkable 53.7 length-controlled win rate on AlpacaEval 2 -- surpassing Claude 3 Opus on the leaderboard, and a 36.5 win rate on Arena-Hard -- making it the strongest 8B open-source model.

alpacaeval 2, arxiv preprint arxiv, simpo, (14 more...)

arXiv.org Artificial Intelligence

2405.14734

Country:

South America > Argentina (0.04)
North America > United States > Virginia (0.04)
South America > Uruguay (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Citation-Based Summarization of Landmark Judgments

Bindal, Purnima, Kumar, Vikas, Bhatnagar, Vasudha, Sirohi, Parikshet, Siwal, Ashwini

arXiv.org Artificial IntelligenceJun-16-2024

Landmark judgments are of prime importance in the Common Law System because of their exceptional jurisprudence and frequent references in other judgments. In this work, we leverage contextual references available in citing judgments to create an extractive summary of the target judgment. We evaluate the proposed algorithm on two datasets curated from the judgments of Indian Courts and find the results promising.

judgment, judgment sentence, summarization, (16 more...)

arXiv.org Artificial Intelligence

2406.10824

Country:

Asia > India > NCT > Delhi (0.05)
Oceania > Australia (0.04)
North America > Canada (0.04)
(4 more...)

Genre: Research Report (0.40)

Industry: Law (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.96)

Add feedback

Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition

Zeineldeen, Mohammad, Zeyer, Albert, Schlüter, Ralf, Ney, Hermann

arXiv.org Machine LearningSep-15-2023

We study a streamable attention-based encoder-decoder model in which either the decoder, or both the encoder and decoder, operate on pre-defined, fixed-size windows called chunks. A special end-of-chunk (EOC) symbol advances from one chunk to the next chunk, effectively replacing the conventional end-of-sequence symbol. This modification, while minor, situates our model as equivalent to a transducer model that operates on chunks instead of frames, where EOC corresponds to the blank symbol. We further explore the remaining differences between a standard transducer and our model. Additionally, we examine relevant aspects such as long-form speech generalization, beam size, and length normalization. Through experiments on Librispeech and TED-LIUM-v2, and by concatenating consecutive sequences for long-form trials, we find that our streamable model maintains competitive performance compared to the non-streamable variant and generalizes very well to long-form speech.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2309.08436

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Germany (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)

Add feedback