fluency
- North America > United States > New Hampshire (0.05)
- North America > United States > Virginia (0.04)
- North America > United States > Massachusetts (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.31)
b125999bde7e80910cbdbd323087df8f-Supplemental-Conference.pdf
Foreachprompt, wecompare 6 pairs of models: Quark versus other baselines, as shown in Table 2. These agreement scores are moderate as result of subjectivity involved in ratings of text quality. PPLM (Plug and Play Language Model) uses one or more classifiers to control attributes of model generations. Figure 8: Screenshot of the mechanical turk interfaced used to gather human judgments for the sentimentevaluation. Unlikelihood represents a GPT-2 model fine-tuned with unlikelihoodobjective(Eqn.5)[79].
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (11 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)
Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages
Samuel, David, Øvrelid, Lilja, Velldal, Erik, Kutuzov, Andrey
We propose a post-training method for lower-resource languages that preserves fluency of language models even when aligned by disfluent reward models. Preference-optimization is now a well-researched topic, but previous work has mostly addressed models for English and Chinese. Lower-resource languages lack both datasets written by native speakers and language models capable of generating fluent synthetic data. Thus, in this work, we focus on developing a fluent preference-aligned language model without any instruction-tuning data in the target language. Our approach uses an on-policy training method, which we compare with two common approaches: supervised finetuning on machine-translated data and multilingual finetuning. We conduct a case study on Norwegian Bokmål and evaluate fluency through native-speaker assessments. The results show that the on-policy aspect is crucial and outperforms the alternatives without relying on any hard-to-obtain data.
- Europe > Austria > Vienna (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Norway > Eastern Norway > Oslo (0.04)
- (22 more...)
- Media > Music (0.50)
- Leisure & Entertainment (0.50)
A Definition of AGI
Hendrycks, Dan, Song, Dawn, Szegedy, Christian, Lee, Honglak, Gal, Yarin, Brynjolfsson, Erik, Li, Sharon, Zou, Andy, Levine, Lionel, Han, Bo, Fu, Jie, Liu, Ziwei, Shin, Jinwoo, Lee, Kimin, Mazeika, Mantas, Phan, Long, Ingebretsen, George, Khoja, Adam, Xie, Cihang, Salaudeen, Olawale, Hein, Matthias, Zhao, Kevin, Pan, Alexander, Duvenaud, David, Li, Bo, Omohundro, Steve, Alfour, Gabriel, Tegmark, Max, McGrew, Kevin, Marcus, Gary, Tallinn, Jaan, Schmidt, Eric, Bengio, Yoshua
The lack of a concrete definition for Artificial General Intelligence (AGI) obscures the gap between today's specialized AI and human-level cognition. This paper introduces a quantifiable framework to address this, defining AGI as matching the cognitive versatility and proficiency of a well-educated adult. To operationalize this, we ground our methodology in Cattell-Horn-Carroll theory, the most empirically validated model of human cognition. The framework dissects general intelligence into ten core cognitive domains-including reasoning, memory, and perception-and adapts established human psychometric batteries to evaluate AI systems. Application of this framework reveals a highly "jagged" cognitive profile in contemporary models. While proficient in knowledge-intensive domains, current AI systems have critical deficits in foundational cognitive machinery, particularly long-term memory storage. The resulting AGI scores (e.g., GPT-4 at 27%, GPT-5 at 57%) concretely quantify both rapid progress and the substantial gap remaining before AGI.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Asia > China > Shanghai > Shanghai (0.04)
- (22 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Education (1.00)
- (2 more...)
On the Difficulty of Token-Level Modeling of Dysfluency and Fluency Shaping Artifacts
Gulzar, Kashaf, Wagner, Dominik, Bayerl, Sebastian P., Hönig, Florian, Bocklet, Tobias, Riedhammer, Korbinian
Automatic transcription of stuttered speech remains a challenge, even for modern end-to-end (E2E) automatic speech recognition (ASR) frameworks. Dysfluencies and fluency-shaping artifacts are often overlooked, resulting in non-verbatim transcriptions with limited clinical and research value. We propose a parameter-efficient adaptation method to decode dysfluencies and fluency modifications as special tokens within transcriptions, evaluated on simulated (LibriStutter, English) and natural (KSoF, German) stuttered speech datasets. To mitigate ASR performance disparities and bias towards English, we introduce a multi-step fine-tuning strategy with language-adaptive pretraining. Tokenization analysis further highlights the tokenizer's English-centric bias, which poses challenges for improving performance on German data. Our findings demonstrate the effectiveness of lightweight adaptation techniques for dysfluency-aware ASR while exposing key limitations in multilingual E2E systems.
Asm2SrcEval: Evaluating Large Language Models for Assembly-to-Source Code Translation
Hamedi, Parisa, Jelodar, Hamed, Bai, Samita, Meymani, Mohammad, Razavi-Far, Roozbeh, Ghorbani, Ali A.
Assembly-to-source code translation is a critical task in reverse engineering, cybersecurity, and software maintenance, yet systematic benchmarks for evaluating large language models on this problem remain scarce. In this work, we present the first comprehensive evaluation of five state-of-the-art large language models on assembly-to-source translation. We assess model performance using a diverse set of metrics capturing lexical similarity (BLEU, ROUGE, and METEOR), semantic alignment (BERTScore), fluency (Perplexity), and efficiency (time prediction). Our results reveal clear trade-offs: while certain models excel in text similarity metrics, others demonstrate lower perplexity or faster inference times. We further provide qualitative analyses of typical model successes and failure cases, highlighting challenges such as control flow recovery and identifier reconstruction. Taken together, our benchmark offers actionable insights into the strengths and limitations of current large language models for program translation, establishing a foundation for future research in combining accuracy with efficiency for real-world applications.