AITopics | sophia

Collaborating Authors

sophia

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models Jean Kaddour 1 Oscar Key

Neural Information Processing SystemsFeb-11-2026, 19:36:07 GMT

This trend has motivated research on efficient training algorithms designed to improve training, validation, and downstream performance faster than standard training.

arxiv preprint arxiv, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Robot and the Philosopher

The New YorkerJan-10-2026, 11:00:00 GMT

In the age of A.I., we endlessly debate what consciousness looks like. Can a camera see things more clearly? Earlier that day, she'd been onstage at the conference I was attending and had been teased for a gesture that looked as though she were flipping off the audience. Now she was in the hotel lobby, in a black gown, holding court. She stepped in front of a bright-orange wall. I had brought an 85-mm. "What are your hopes for the future of humanity?" She wasn't keen to answer, but she responded to the camera.

artificial intelligence, consciousness, sophia, (8 more...)

The New Yorker

Country:

North America > United States > New York (0.05)
North America > United States > Florida > Broward County > Deerfield Beach (0.04)
North America > United States > District of Columbia > Washington (0.04)
(3 more...)

Industry:

Media > Photography (0.68)
Leisure & Entertainment (0.68)
Media > Film (0.46)

Technology: Information Technology > Artificial Intelligence > Robots (0.66)

Add feedback

Gold-Switch: Training-Free Superposition of Slow- and Fast- Thinking LLMs

Lee, Jaeseong, Kwon, Dayoung, hwang, seung-won

arXiv.org Artificial IntelligenceOct-9-2025

Large Reasoning Models (LRMs) excel in structured tasks by emulating deliberate human reasoning but often suffer from overthinking, degrading performance and wasting resources. One possible baseline is to deploy both LLM and LRM, then route input by predicting whether it requires reasoning and may cause overthinking. However, deploying multiple models can be costly or impractical. We propose a superposed deployment strategy with a lightweight, training-free regulation to optimize inference by switching one model on and off. Instead of routing, we selectively unlearn from LRM at inference, scaling down computation while preserving reasoning. By analyzing the cumulative energy of singular values, we identify optimal low-rank projections to adjust reasoning just right.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.0675

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models Jean Kaddour 1 Oscar Key

Neural Information Processing SystemsOct-8-2025, 16:47:46 GMT

This trend has motivated research on efficient training algorithms designed to improve training, validation, and downstream performance faster than standard training.

arxiv preprint arxiv, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training

Hu, Yue, Cao, Zanxia, Liu, Yingchao

arXiv.org Machine LearningJul-29-2025

First-order optimization methods, such as SGD and Adam, are widely used for training large-scale deep neural networks due to their computational efficiency and robust performance. However, relying solely on gradient information, these methods often struggle to navigate complex loss landscapes with flat regions, plateaus, and saddle points. Second-order methods, which use curvature information from the Hessian matrix, can address these challenges but are computationally infeasible for large models. The Dimer method, a first-order technique that constructs two closely spaced points to probe the local geometry of a potential energy surface, efficiently estimates curvature using only gradient information. Inspired by its use in molecular dynamics simulations for locating saddle points, we propose Dimer-Enhanced Optimization (DEO), a novel framework to escape saddle points in neural network training. DEO adapts the Dimer method to explore a broader region of the loss landscape, approximating the Hessian's smallest eigenvector without computing the full matrix. By periodically projecting the gradient onto the subspace orthogonal to the minimum curvature direction, DEO guides the optimizer away from saddle points and flat regions, enhancing training efficiency with non-stepwise updates. Preliminary experiments on a Transformer toy model show DEO achieves competitive performance compared to standard first-order methods, improving navigation of complex loss landscapes. Our work repurposes physics-inspired, first-order curvature estimation to enhance neural network training in high-dimensional spaces.

artificial intelligence, machine learning, optimizer, (17 more...)

arXiv.org Machine Learning

2507.19968

Country: Asia > China > Shandong Province (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Pre-Training LLMs on a budget: A comparison of three optimizers

Schlotthauer, Joel, Kroos, Christian, Hinze, Chris, Hangya, Viktor, Hahn, Luzian, Küch, Fabian

arXiv.org Artificial IntelligenceJul-23-2025

Optimizers play a decisive role in reducing pre-training times for LLMs and achieving better-performing models. In this study, we compare three major variants: the de-facto standard AdamW, the simpler Lion, developed through an evolutionary search, and the second-order optimizer Sophia. For better generalization, we train with two different base architectures and use a single- and a multiple-epoch approach while keeping the number of tokens constant. Using the Maximal Update Parametrization and smaller proxy models, we tune relevant hyperparameters separately for each combination of base architecture and optimizer. We found that while the results from all three optimizers were in approximately the same range, Sophia exhibited the lowest training and validation loss, Lion was fastest in terms of training GPU hours but AdamW led to the best downstream evaluation results.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.08472

Country:

Europe > Germany (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Xie, Tian, Gao, Zitian, Ren, Qingnan, Luo, Haoming, Hong, Yuqian, Dai, Bryan, Zhou, Joey, Qiu, Kai, Wu, Zhirong, Luo, Chong

arXiv.org Artificial IntelligenceFeb-20-2025

Inspired by the success of DeepSeek-R1, we explore the potential of rule-based reinforcement learning (RL) in large reasoning models. To analyze reasoning dynamics, we use synthetic logic puzzles as training data due to their controllable complexity and straightforward answer verification. We make some key technical contributions that lead to effective and stable RL training: a system prompt that emphasizes the thinking and answering process, a stringent format reward function that penalizes outputs for taking shortcuts, and a straightforward training recipe that achieves stable convergence. Our 7B model develops advanced reasoning skills-such as reflection, verification, and summarization-that are absent from the logic corpus. Remarkably, after training on just 5K logic problems, it demonstrates generalization abilities to the challenging math benchmarks AIME and AMC.

knight, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2502.14768

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

HELENE: Hessian Layer-wise Clipping and Gradient Annealing for Accelerating Fine-tuning LLM with Zeroth-order Optimization

Zhao, Huaqin, Li, Jiaxi, Pan, Yi, Liang, Shizhe, Yang, Xiaofeng, Liu, Wei, Li, Xiang, Dou, Fei, Liu, Tianming, Lu, Jin

arXiv.org Artificial IntelligenceNov-15-2024

Fine-tuning large language models (LLMs) poses significant memory challenges, as the back-propagation process demands extensive resources, especially with growing model sizes. Recent work, MeZO, addresses this issue using a zerothorder (ZO) optimization method, which reduces memory consumption by matching the usage to the inference phase. To overcome this limitation, we introduce HELENE, a novel scalable and memory-efficient optimizer that integrates annealed A-GNB gradients with a diagonal Hessian estimation and layerwise clipping, serving as a second-order pre-conditioner. This combination allows for faster and more stable convergence. Our theoretical analysis demonstrates that HELENE improves convergence rates, particularly for models with heterogeneous layer dimensions, by reducing the dependency on the total parameter space dimension. Furthermore, HELENE remains compatible with both full parameter tuning and parameter-efficient fine-tuning (PEFT), outperforming several state-of-the-art optimizers. The codes will be released after reviewing. LLMs have demonstrated remarkable capabilities across various downstream tasks. Fine-tuning these models has become the standard approach for improving task-specific performance, in which the firstorder optimizers like Stochastic Gradient Descent (SGD) (Robbins & Monro, 1951), Adam (Diederik, 2014) and AdamW (Hutter & Loshchilov, 2017) are widely used.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.10696

Country:

North America > United States > Massachusetts (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

Automate or Assist? The Role of Computational Models in Identifying Gendered Discourse in US Capital Trial Transcripts

Wen-Yi, Andrea W, Adamson, Kathryn, Greenfield, Nathalie, Goldberg, Rachel, Babcock, Sandra, Mimno, David, Koenecke, Allison

arXiv.org Artificial IntelligenceJul-17-2024

The language used by US courtroom actors in criminal trials has long been studied for biases. However, systematic studies for bias in high-stakes court trials have been difficult, due to the nuanced nature of bias and the legal expertise required. New large language models offer the possibility to automate annotation, saving time and cost. But validating these approaches requires both high quantitative performance as well as an understanding of how automated methods fit in existing workflows, and what they really offer. In this paper we present a case study of adding an automated system to a complex and high-stakes problem: identifying gender-biased language in US capital trials for women defendants. Our team of experienced death-penalty lawyers and NLP technologists pursued a three-phase study: first annotating manually, then training and evaluating computational models, and finally comparing human annotations to model predictions. Unlike many typical NLP tasks, annotating for gender bias in months-long capital trials was a complicated task that involves with many individual judgment calls. In contrast to standard arguments for automation that are based on efficiency and scalability, legal experts found the computational models most useful in challenging their personal bias in annotation and providing opportunities to refine and build consensus on rules for annotation. This suggests that seeking to replace experts with computational models is both unrealistic and undesirable. Rather, computational models offer valuable opportunities to assist the legal experts in annotation-based studies.

annotation, defendant, transcript, (15 more...)

arXiv.org Artificial Intelligence

2407.125

Country:

Asia > Singapore (0.04)
North America > United States > Texas (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(11 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Law > Litigation (1.00)
Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Add feedback

Sylvester Stallone's daughters learned how to fight off a coyote, use pepper spray growing up: 'He is crazy'

FOX NewsFeb-20-2024, 23:42:03 GMT

Sylvester Stallone wants his daughters, Sistine, Scarlet and Sophia, to be ready for anything. In new clips from the second season of their Paramount reality series, "The Family Stallone," Stallone spoke about his two eldest daughters, Sophia and Sistine, moving to New York, calling it "traumatic" as he recalled his own experiences with robbery, car accidents, and more. "Since you guys have moved to New York, it's made me very uneasy. You know I'm paranoid anyway because I have a responsibility as a father to do everything I can," he told them early in the episode. The girls then joked about him being "the most paranoid person on the planet," with the youngest daughter Scarlet saying "he is crazy!"

artificial intelligence, machine learning, stallone, (17 more...)

FOX News

Country: North America > United States > New York (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback