AITopics

2510.05109

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry:

Energy > Energy Storage (1.00)
Electrical Industrial Apparatus (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

arXiv.org Artificial IntelligenceDec-3-2025

Apertus: Democratizing Open and Compliant LLMs for Global Language Environments

Apertus, Project, Hernández-Cano, Alejandro, Hägele, Alexander, Huang, Allen Hao, Romanou, Angelika, Solergibert, Antoni-Joan, Pasztor, Barna, Messmer, Bettina, Garbaya, Dhia, Ďurech, Eduard Frank, Hakimi, Ido, Giraldo, Juan García, Ismayilzada, Mete, Foroutan, Negar, Moalla, Skander, Chen, Tiancheng, Sabolčec, Vinko, Xu, Yixuan, Aerni, Michael, AlKhamissi, Badr, Mariñas, Inés Altemir, Amani, Mohammad Hossein, Ansaripour, Matin, Badanin, Ilia, Benoit, Harold, Boros, Emanuela, Browning, Nicholas, Bösch, Fabian, Böther, Maximilian, Canova, Niklas, Challier, Camille, Charmillot, Clement, Coles, Jonathan, Deriu, Jan, Devos, Arnout, Drescher, Lukas, Dzenhaliou, Daniil, Ehrmann, Maud, Fan, Dongyang, Fan, Simin, Gao, Silin, Gila, Miguel, Grandury, María, Hashemi, Diba, Hoyle, Alexander, Jiang, Jiaming, Klein, Mark, Kucharavy, Andrei, Kucherenko, Anastasiia, Lübeck, Frederike, Machacek, Roman, Manitaras, Theofilos, Marfurt, Andreas, Matoba, Kyle, Matrenok, Simon, Mendonça, Henrique, Mohamed, Fawzi Roberto, Montariol, Syrielle, Mouchel, Luca, Najem-Meyer, Sven, Ni, Jingwei, Oliva, Gennaro, Pagliardini, Matteo, Palme, Elia, Panferov, Andrei, Paoletti, Léo, Passerini, Marco, Pavlov, Ivan, Poiroux, Auguste, Ponkshe, Kaustubh, Ranchin, Nathan, Rando, Javi, Sauser, Mathieu, Saydaliev, Jakhongir, Sayfiddinov, Muhammad Ali, Schneider, Marian, Schuppli, Stefano, Scialanga, Marco, Semenov, Andrei, Shridhar, Kumar, Singhal, Raghav, Sotnikova, Anna, Sternfeld, Alexander, Tarun, Ayush Kumar, Teiletche, Paul, Vamvas, Jannis, Yao, Xiaozhe, Zhao, Hao, Ilic, Alexander, Klimovic, Ana, Krause, Andreas, Gulcehre, Caglar, Rosenthal, David, Ash, Elliott, Tramèr, Florian, VandeVondele, Joost, Veraldi, Livio, Rajman, Martin, Schulthess, Thomas, Hoefler, Torsten, Bosselut, Antoine, Jaggi, Martin, Schlag, Imanol

We present Apertus, a fully open suite of large language models (LLMs) designed to address two systemic shortcomings in today's open model ecosystem: data compliance and multilingual representation. Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively respecting `robots.txt` exclusions and filtering for non-permissive, toxic, and personally identifiable content. To mitigate risks of memorization, we adopt the Goldfish objective during pretraining, strongly suppressing verbatim recall of data while retaining downstream task performance. The Apertus models also expand multilingual coverage, training on 15T tokens from over 1800 languages, with ~40% of pretraining data allocated to non-English content. Released at 8B and 70B scales, Apertus approaches state-of-the-art results among fully open models on multilingual benchmarks, rivalling or surpassing open-weight counterparts. Beyond model weights, we release all scientific artifacts from our development cycle with a permissive license, including data preparation scripts, checkpoints, evaluation suites, and training code, enabling transparent audit and extension.

large language model, machine learning, natural language, (20 more...)

2509.14233

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.27)
North America > United States > California (0.27)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Personal > Interview (0.67)

Industry:

Media (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
(14 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Al JazeeraDec-2-2025, 11:46:35 GMT

Russian tanker struck off Turkiye as Ukraine targets 'shadow fleet'

What is in the 28-point US plan for Ukraine? 'Ukraine is running out of men, money and time' Can the US get all sides to end the war? Why is Europe opposing Trump's peace plan? Russian tanker struck off Turkiye as Ukraine targets'shadow fleet' A Russian-flagged tanker in the Black Sea has reported being attacked off the Turkish coast, the third such vessel to have been targeted within a week. The Turkish Directorate General of Maritime Affairs said on Tuesday that the Midvolga-2 had reported coming under attack about 130km (80 miles) from land.

artificial intelligence, shadow fleet, turkiye, (17 more...)

Al Jazeera

Country:

Asia > Russia (0.59)
Atlantic Ocean > Black Sea (0.30)
Europe > Ukraine > Kyiv Oblast > Kyiv (0.10)
(9 more...)

Industry:

Energy (0.74)
Government > Regional Government > Asia Government > Middle East Government > Republic of Türkiye Government (0.51)
Government > Regional Government > Europe Government > Russia Government (0.31)
Government > Regional Government > Asia Government > Russia Government (0.31)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.31)

Towards Efficient and Accurate Spiking Neural Networks via Adaptive Bit Allocation

Yao, Xingting, Hu, Qinghao, Zhou, Fei, Liu, Tielong, Li, Gang, Wang, Peisong, Cheng, Jian

Multi-bit spiking neural networks (SNNs) have recently become a heated research spot, pursuing energy-efficient and high-accurate AI. However, with more bits involved, the associated memory and computation demands escalate to the point where the performance improvements become disproportionate. Based on the insight that different layers demonstrate different importance and extra bits could be wasted and interfering, this paper presents an adaptive bit allocation strategy for direct-trained SNNs, achieving fine-grained layer-wise allocation of memory and computation resources. Thus, SNN's efficiency and accuracy can be improved. Specifically, we parametrize the temporal lengths and the bit widths of weights and spikes, and make them learnable and controllable through gradients. To address the challenges caused by changeable bit widths and temporal lengths, we propose the refined spiking neuron, which can handle different temporal lengths, enable the derivation of gradients for temporal lengths, and suit spike quantization better. In addition, we theoretically formulate the step-size mismatch problem of learnable bit widths, which may incur severe quantization errors to SNN, and accordingly propose the step-size renewal mechanism to alleviate this issue. Experiments on various datasets, including the static CIFAR and ImageNet datasets and the dynamic CIFAR-DVS and DVS-GESTURE datasets, demonstrate that our methods can reduce the overall memory and computation cost while achieving higher accuracy. Particularly, our SEWResNet-34 can achieve a 2.69\% accuracy gain and 4.16$\times$ lower bit budgets over the advanced baseline work on ImageNet. This work is open-sourced at \href{https://github.com/Ikarosy/Towards-Efficient-and-Accurate-Spiking-Neural-Networks-via-Adaptive-Bit-Allocation}{this link}.

artificial intelligence, machine learning, neural network, (17 more...)

doi: 10.1016/j.neunet.2025.108350

2506.23717

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Belgium > Flanders > West Flanders > Bruges (0.04)

Genre: Research Report (0.50)

Industry: Energy (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Sawhney, Medha, Neog, Abhilash, Khurana, Mridul, Karpatne, Anuj

Beyond Loss Guidance: Using PDE Residuals as Spectral Attention in Diffusion Neural Operators

arXiv.org Machine LearningDec-2-2025

Diffusion-based solvers for partial differential equations (PDEs) are often bottle-necked by slow gradient-based test-time optimization routines that use PDE residuals for loss guidance. They additionally suffer from optimization instabilities and are unable to dynamically adapt their inference scheme in the presence of noisy PDE residuals. To address these limitations, we introduce PRISMA (PDE Residual Informed Spectral Modulation with Attention), a conditional diffusion neural operator that embeds PDE residuals directly into the model's architecture via attention mechanisms in the spectral domain, enabling gradient-descent free inference. We show that PRISMA has competitive accuracy, at substantially lower inference costs, compared to previous methods across five benchmark PDEs especially with noisy observations, while using 10x to 100x fewer denoising steps, leading to 15x to 250x faster inference. Given the ubiquitous presence of partial differential equations (PDEs) in almost every scientific discipline, there is a rapidly growing literature on using neural networks for solving PDEs (Raissi et al., 2019a; Lu et al., 2019). This includes seminal works in operator learning methods such as the Fourier Neural Operator (FNO) Li et al. (2020) that learns resolution-independent mappings between function spaces of input parameters a and solution fields u. However, a major limitation of these methods is their reliance on complete and clean observations of either a or u, a condition rarely met in real-world applications where data is inherently noisy and sparse. The rise of generative models has inspired another class of methods for solving PDEs by modeling the joint distribution of a and u using diffusion-based backbones (Huang et al., 2024; Y ao et al., 2025; Lim et al., 2023; Shu et al., 2023; Bastek et al., 2024; Jacobsen et al., 2025). These methods offer two key advantages over operator learning methods: (i) they generate full posterior distributions of a and/or u, enabling principled uncertainty quantification crucial for ill-posed inverse problems, and (ii) they naturally accommodate sparse observations during inference using likelihood-based and PDE residual-based loss guidance, termed diffusion posterior sampling or test-time optimization.

inference, pde residual, prisma, (14 more...)

arXiv.org Machine Learning

2512.0137

Country: North America > United States > Virginia (0.04)

Genre: Research Report (0.50)

Industry: Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Machine LearningDec-2-2025

DAISI: Data Assimilation with Inverse Sampling using Stochastic Interpolants

Andrae, Martin, Larsson, Erik, Takao, So, Landelius, Tomas, Lindsten, Fredrik

Data assimilation (DA) is a cornerstone of scientific and engineering applications, combining model forecasts with sparse and noisy observations to estimate latent system states. Classical DA methods, such as the ensemble Kalman filter, rely on Gaussian approximations and heuristic tuning (e.g., inflation and localization) to scale to high dimensions. While often successful, these approximations can make the methods unstable or inaccurate when the underlying distributions of states and observations depart significantly from Gaussianity. To address this limitation, we introduce DAISI, a scalable filtering algorithm built on flow-based generative models that enables flexible probabilistic inference using data-driven priors. The core idea is to use a stationary, pre-trained generative prior to assimilate observations via guidance-based conditional sampling while incorporating forecast information through a novel inverse-sampling step. This step maps the forecast ensemble into a latent space to provide initial conditions for the conditional sampling, allowing us to encode model dynamics into the DA pipeline without having to retrain or fine-tune the generative prior at each assimilation step. Experiments on challenging nonlinear systems show that DAISI achieves accurate filtering results in regimes with sparse, noisy, and nonlinear observations where traditional methods struggle.

daisi, data assimilation, inverse sampling, (9 more...)

arXiv.org Machine Learning

2512.00252

Country:

Europe > Sweden (0.04)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.45)

Industry:

Energy (1.00)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Sensing and Signal Processing (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Crowdsourcing the Frontier: Advancing Hybrid Physics-ML Climate Simulation via a $50,000 Kaggle Competition

Lin, Jerry, Hu, Zeyuan, Beucler, Tom, Frields, Katherine, Christensen, Hannah, Hannah, Walter, Heuer, Helge, Ukkonnen, Peter, Mansfield, Laura A., Zheng, Tian, Peng, Liran, Gupta, Ritwik, Gentine, Pierre, Al-Naher, Yusef, Duan, Mingjiang, Hattori, Kyo, Ji, Weiliang, Li, Chunhan, Matsuda, Kippei, Murakami, Naoki, Ron, Shlomo, Serlin, Marec, Song, Hongjian, Tanabe, Yuma, Yamamoto, Daisuke, Zhou, Jianyao, Pritchard, Mike

Subgrid machine-learning (ML) parameterizations have the potential to introduce a new generation of climate models that incorporate the effects of higher-resolution physics without incurring the prohibitive computational cost associated with more explicit physics-based simulations. However, important issues, ranging from online instability to inconsistent online performance, have limited their operational use for long-term climate projections. To more rapidly drive progress in solving these issues, domain scientists and machine learning researchers opened up the offline aspect of this problem to the broader machine learning and data science community with the release of ClimSim, a NeurIPS Datasets and Benchmarks publication, and an associated Kaggle competition. This paper reports on the downstream results of the Kaggle competition by coupling emulators inspired by the winning teams' architectures to an interactive climate model (including full cloud microphysics, a regime historically prone to online instability) and systematically evaluating their online performance. Our results demonstrate that online stability in the low-resolution, real-geography setting is reproducible across multiple diverse architectures, which we consider a key milestone. All tested architectures exhibit strikingly similar offline and online biases, though their responses to architecture-agnostic design choices (e.g., expanding the list of input variables) can differ significantly. Multiple Kaggle-inspired architectures achieve state-of-the-art (SOTA) results on certain metrics such as zonal mean bias patterns and global RMSE, indicating that crowdsourcing the essence of the offline problem is one path to improving online performance in hybrid physics-AI climate simulation.

artificial intelligence, machine learning, social media, (18 more...)

2511.20963

Country:

Europe (1.00)
North America > United States > California (0.46)
North America > United States > Maryland (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.67)
Energy (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.70)

RoaD: Rollouts as Demonstrations for Closed-Loop Supervised Fine-Tuning of Autonomous Driving Policies

Garcia-Cobo, Guillermo, Igl, Maximilian, Karkus, Peter, Zhang, Zhejun, Watson, Michael, Chen, Yuxiao, Ivanovic, Boris, Pavone, Marco

Autonomous driving policies are typically trained via open-loop behavior cloning of human demonstrations. However, such policies suffer from covariate shift when deployed in closed loop, leading to compounding errors. W e introduce Rollouts as Demonstrations (RoaD), a simple and efficient method to mitigate covariate shift by leveraging the policy's own closed-loop rollouts as additional training data. During rollout generation, RoaD incorporates expert guidance to bias trajectories toward high-quality behavior, producing informative yet realistic demonstrations for fine-tuning. This approach enables robust closed-loop adaptation with orders of magnitude less data than reinforcement learning, and avoids restrictive assumptions of prior closed-loop supervised fine-tuning (CL-SFT) methods, allowing broader applications domains including end-to-end driving. W e demonstrate the effectiveness of RoaD on WOSAC, a large-scale traffic simulation benchmark, where it performs similar or better than the prior CL-SFT method; and in AlpaSim, a high-fidelity neural reconstruction-based simulator for end-to-end driving, where it improves driving score by 41% and reduces collisions by 54%.

artificial intelligence, machine learning, rollout, (16 more...)

2512.01993

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities > Geothermal System for Power Generation > Advanced Geothermal System (AGS) (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Jeong, Eunjeong, Perin, Giovanni, Yang, Howard H., Pappas, Nikolaos

Feature-Based Semantics-Aware Scheduling for Energy-Harvesting Federated Learning

Federated Learning (FL) on resource-constrained edge devices faces a critical challenge: The computational energy required for training Deep Neural Networks (DNNs) often dominates communication costs. However, most existing Energy-Harvesting FL (EHFL) strategies fail to account for this reality, resulting in wasted energy due to redundant local computations. For efficient and proactive resource management, algorithms that predict local update contributions must be devised. We propose a lightweight client scheduling framework using the Version Age of Information (VAoI), a semantics-aware metric that quantifies update timeliness and significance. Crucially, we overcome VAoI's typical prohibitive computational cost, which requires statistical distance over the entire parameter space, by introducing a feature-based proxy. This proxy estimates model redundancy using intermediate-layer extraction from a single forward pass, dramatically reducing computational complexity. Experiments conducted under extreme non-IID data distributions and scarce energy availability demonstrate superior learning performance while achieving energy reduction compared to existing baseline selection policies. Our framework establishes semantics-aware scheduling as a practical and vital solution for EHFL in realistic scenarios where training costs dominate transmission costs.

artificial intelligence, federated learning, machine learning, (14 more...)

2512.01983

Country: Europe (0.68)

Genre: Research Report (0.50)

Industry:

Energy > Energy Storage (1.00)
Electrical Industrial Apparatus (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Propp, Adrienne M., Perego, Mauro, Cyr, Eric C., Gruber, Anthony, Howard, Amanda A., Heinlein, Alexander, Stinis, Panos, Tartakovsky, Daniel M.

Domain-Decomposed Graph Neural Network Surrogate Modeling for Ice Sheets

Accurate yet efficient surrogate models are essential for large-scale simulations of partial differential equations (PDEs), particularly for uncertainty quantification (UQ) tasks that demand hundreds or thousands of evaluations. We develop a physics-inspired graph neural network (GNN) surrogate that operates directly on unstructured meshes and leverages the flexibility of graph attention. To improve both training efficiency and generalization properties of the model, we introduce a domain decomposition (DD) strategy that partitions the mesh into subdomains, trains local GNN surrogates in parallel, and aggregates their predictions. We then employ transfer learning to fine-tune models across subdomains, accelerating training and improving accuracy in data-limited settings. Applied to ice sheet simulations, our approach accurately predicts full-field velocities on high-resolution meshes, substantially reduces training time relative to training a single global surrogate model, and provides a ripe foundation for UQ objectives. Our results demonstrate that graph-based DD, combined with transfer learning, provides a scalable and reliable pathway for training GNN surrogates on massive PDE-governed systems, with broad potential for application beyond ice sheet dynamics.

artificial intelligence, machine learning, prediction, (17 more...)

2512.01888

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.86)

Industry:

Energy (1.00)
Government > Regional Government > North America Government > United States Government (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)