AITopics

Inverse problems, which involve estimating parameters from incomplete or noisy observations, arise in various fields such as medical imaging, geophysics, and signal processing. These problems are often ill-posed, requiring regularization techniques to stabilize the solution. In this work, we employ $\textit{Stochastic Interpolation}$ (SI), a generative framework that integrates both deterministic and stochastic processes to map a simple reference distribution, such as a Gaussian, to the target distribution. Our method $\textbf{DAWN-SI}$: $\textbf{D}$ata-$\textbf{AW}$are and $\textbf{N}$oise-informed $\textbf{S}$tochastic $\textbf{I}$nterpolation incorporates data and noise embedding, allowing the model to access representations about the measured data explicitly and also account for noise in the observations, making it particularly robust in scenarios where data is noisy or incomplete. By learning a time-dependent velocity field, SI not only provides accurate solutions but also enables uncertainty quantification by generating multiple plausible outcomes. Unlike pre-trained diffusion models, which may struggle in highly ill-posed settings, our approach is trained specifically for each inverse problem and adapts to varying noise levels. We validate the effectiveness and robustness of our method through extensive numerical experiments on tasks such as image deblurring and tomography.

artificial intelligence, inverse problem, machine learning, (18 more...)

2412.04766

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

An Efficient Model Maintenance Approach for MLOps

Majidi, Forough, Khomh, Foutse, Li, Heng, Nikanjam, Amin

In recent years, many industries have utilized machine learning models (ML) in their systems. Ideally, machine learning models should be trained on and applied to data from the same distributions. However, the data evolves over time in many application areas, leading to data and concept drift, which in turn causes the performance of the ML models to degrade over time. Therefore, maintaining up to date ML models plays a critical role in the MLOps pipeline. Existing ML model maintenance approaches are often computationally resource intensive, costly, time consuming, and model dependent. Thus, we propose an improved MLOps pipeline, a new model maintenance approach and a Similarity Based Model Reuse (SimReuse) tool to address the challenges of ML model maintenance. We identify seasonal and recurrent distribution patterns in time series datasets throughout a preliminary study. Recurrent distribution patterns enable us to reuse previously trained models for similar distributions in the future, thus avoiding frequent retraining. Then, we integrated the model reuse approach into the MLOps pipeline and proposed our improved MLOps pipeline. Furthermore, we develop SimReuse, a tool to implement the new components of our MLOps pipeline to store models and reuse them for inference of data segments with similar data distributions in the future. Our evaluation results on four time series datasets demonstrate that our model reuse approach can maintain the performance of models while significantly reducing maintenance time and costs. Our model reuse approach achieves ML performance comparable to the best baseline, while being 15 times more efficient in terms of computation time and costs. Therefore, industries and practitioners can benefit from our approach and use our tool to maintain the performance of their ML models in the deployment phase to reduce their maintenance costs.

artificial intelligence, dataset, machine learning, (13 more...)

2412.04657

Country:

Europe > Switzerland > Zürich > Zürich (0.07)
North America > Canada > Quebec (0.04)
Oceania > Australia > New South Wales (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry:

Information Technology > Services (0.46)
Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Monroy, Brayan, Bacca, Jorge, Tachella, Julián

Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise

arXiv.org Machine LearningDec-5-2024

Recorrupted-to-Recorrupted (R2R) has emerged as a methodology for training deep networks for image restoration in a self-supervised manner from noisy measurement data alone, demonstrating equivalence in expectation to the supervised squared loss in the case of Gaussian noise. However, its effectiveness with non-Gaussian noise remains unexplored. In this paper, we propose Generalized R2R (GR2R), extending the R2R framework to handle a broader class of noise distribution as additive noise like log-Rayleigh and address the natural exponential family including Poisson and Gamma noise distributions, which play a key role in many applications including low-photon imaging and synthetic aperture radar. We show that the GR2R loss is an unbiased estimator of the supervised loss and that the popular Stein's unbiased risk estimator can be seen as a special case. A series of experiments with Gaussian, Poisson, and Gamma noise validate GR2R's performance, showing its effectiveness compared to other self-supervised methods.

artificial intelligence, machine learning, noise distribution, (15 more...)

arXiv.org Machine Learning

2412.04648

Country:

South America > Colombia > Santander Department (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Lambert, Nathan, Morrison, Jacob, Pyatkin, Valentina, Huang, Shengyi, Ivison, Hamish, Brahman, Faeze, Miranda, Lester James V., Liu, Alisa, Dziri, Nouha, Lyu, Shane, Gu, Yuling, Malik, Saumya, Graf, Victoria, Hwang, Jena D., Yang, Jiangjiang, Bras, Ronan Le, Tafjord, Oyvind, Wilhelm, Chris, Soldaini, Luca, Smith, Noah A., Wang, Yizhong, Dasigi, Pradeep, Hajishirzi, Hannaneh

Language model post-training is applied to refine behaviors and unlock new skills across a wide range of recent language models, but open recipes for applying these techniques lag behind proprietary ones. The underlying training data and recipes for post-training are simultaneously the most important pieces of the puzzle and the portion with the least transparency. To bridge this gap, we introduce Tulu 3, a family of fully-open state-of-the-art post-trained models, alongside its data, code, and training recipes, serving as a comprehensive guide for modern post-training techniques. Tulu 3, which builds on Llama 3.1 base models, achieves results surpassing the instruct versions of Llama 3.1, Qwen 2.5, Mistral, and even closed models such as GPT-4o-mini and Claude 3.5-Haiku. The training algorithms for our models include supervised finetuning (SFT), Direct Preference Optimization (DPO), and a novel method we call Reinforcement Learning with Verifiable Rewards (RLVR). With Tulu 3, we introduce a multi-task evaluation scheme for post-training recipes with development and unseen evaluations, standard benchmark implementations, and substantial decontamination of existing open datasets on said benchmarks. We conclude with analysis and discussion of training methods that did not reliably improve performance. In addition to the Tulu 3 model weights and demo, we release the complete recipe -- including datasets for diverse core skills, a robust toolkit for data curation and evaluation, the training code and infrastructure, and, most importantly, a detailed report for reproducing and further adapting the Tulu 3 approach to more domains.

large language model, machine learning, natural language, (20 more...)

2411.15124

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Middle East > Jordan (0.04)
(8 more...)

Genre:

Research Report > Promising Solution (0.47)
Research Report > New Finding (0.45)

Industry:

Information Technology > Security & Privacy (0.67)
Education > Educational Setting (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Pinheiro, Tiago F. L. L., Sfair, Rafael, Ramon, Giovana

Machine learning approach for mapping the stable orbits around planets

Numerical N-body simulations are commonly used to explore stability regions around exoplanets, offering insights into the possible existence of satellites and ring systems. This study aims to utilize Machine Learning (ML) techniques to generate predictive maps of stable regions surrounding a hypothetical planet. The approach can also be extended to planet-satellite systems, planetary ring systems, and other similar configurations. A dataset was generated using 10^5 numerical simulations, each incorporating nine orbital features for the planet and a test particle in a star-planet-test particle system. The simulations were classified as stable or unstable based on stability criteria, requiring particles to remain stable over a timespan equivalent to 10,000 orbital periods of the planet. Various ML algorithms were tested and fine-tuned through hyperparameter optimization to determine the most effective predictive model. Tree-based algorithms showed comparable accuracy in performance. The best-performing model, using the Extreme Gradient Boosting (XGBoost) algorithm, achieved an accuracy of 98.48%, with 94% recall and precision for stable particles and 99% for unstable particles. ML algorithms significantly reduce the computational time required for three-body simulations, operating approximately 100,000 times faster than traditional numerical methods. Predictive models can generate entire stability maps in less than a second, compared to the days required by numerical simulations. The results from the trained ML models will be made accessible through a public web interface, enabling broader scientific applications.

algorithm, artificial intelligence, machine learning, (19 more...)

2412.04568

Country:

South America > Brazil > São Paulo (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.92)

arXiv.org Machine LearningDec-5-2024

Graph Classification Gaussian Processes via Hodgelet Spectral Features

Alain, Mathieu, Takao, So, Dong, Xiaowen, Rieck, Bastian, Noutahi, Emmanuel

The problem of classifying graphs is ubiquitous in machine learning. While it is standard to apply graph neural networks or graph kernel methods, Gaussian processes can be employed by transforming spatial features from the graph domain into spectral features in the Euclidean domain, and using them as the input points of classical kernels. However, this approach currently only takes into account features on vertices, whereas some graph datasets also support features on edges. In this work, we present a Gaussian process-based classification algorithm that can leverage one or both vertex and edges features. Furthermore, we take advantage of the Hodge decomposition to better capture the intricate richness of vertex and edge features, which can be beneficial on diverse tasks.

artificial intelligence, gaussian process, machine learning, (18 more...)

arXiv.org Machine Learning

2410.10546

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Switzerland > Fribourg > Fribourg (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Gokce, Abdulkadir, Schrimpf, Martin

Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream

When trained on large-scale object classification datasets, certain artificial neural network models begin to approximate core object recognition (COR) behaviors and neural response patterns in the primate visual ventral stream (VVS). While recent machine learning advances suggest that scaling model size, dataset size, and compute resources improve task performance, the impact of scaling on brain alignment remains unclear. In this study, we explore scaling laws for modeling the primate VVS by systematically evaluating over 600 models trained under controlled conditions on benchmarks spanning V1, V2, V4, IT and COR behaviors. We observe that while behavioral alignment continues to scale with larger models, neural alignment saturates. This observation remains true across model architectures and training datasets, even though models with stronger inductive bias and datasets with higher-quality images are more compute-efficient. Increased scaling is especially beneficial for higher-level visual areas, where small models trained on few samples exhibit only poor alignment. Finally, we develop a scaling recipe, indicating that a greater proportion of compute should be allocated to data samples over model size. Our results suggest that while scaling alone might suffice for alignment with human core object recognition behavior, it will not yield improved models of the brain's visual ventral stream with current architectures and datasets, highlighting the need for novel strategies in building brain-like models.

alignment, alignment score, dataset, (14 more...)

2411.05712

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Utah (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Approximate Top-$k$ for Increased Parallelism

Key, Oscar, Ribar, Luka, Cattaneo, Alberto, Hudlass-Galley, Luke, Orr, Douglas

We present an evaluation of bucketed approximate top-$k$ algorithms. Computing top-$k$ exactly suffers from limited parallelism, because the $k$ largest values must be aggregated along the vector, thus is not well suited to computation on highly-parallel machine learning accelerators. By relaxing the requirement that the top-$k$ is exact, bucketed algorithms can dramatically increase the parallelism available by independently computing many smaller top-$k$ operations. We explore the design choices of this class of algorithms using both theoretical analysis and empirical evaluation on downstream tasks. Our motivating examples are sparsity algorithms for language models, which often use top-$k$ to select the most important parameters or activations. We also release a fast bucketed top-$k$ implementation for PyTorch.

algorithm, cost model, implementation, (17 more...)

2412.04358

Country:

North America > United States > New York > New York County > New York City (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > Texas > Harris County > Houston (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Theile, Mirco, Dirnberger, Lukas, Trumpp, Raphael, Caccamo, Marco, Sangiovanni-Vincentelli, Alberto L.

Action Mapping for Reinforcement Learning in Continuous Environments with Constraints

Deep reinforcement learning (DRL) has had success across various domains, but applying it to environments with constraints remains challenging due to poor sample efficiency and slow convergence. Recent literature explored incorporating model knowledge to mitigate these problems, particularly through the use of models that assess the feasibility of proposed actions. However, integrating feasibility models efficiently into DRL pipelines in environments with continuous action spaces is non-trivial. We propose a novel DRL training strategy utilizing action mapping that leverages feasibility models to streamline the learning process. By decoupling the learning of feasible actions from policy optimization, action mapping allows DRL agents to focus on selecting the optimal action from a reduced feasible action set. We demonstrate through experiments that action mapping significantly improves training performance in constrained environments with continuous action spaces, especially with imperfect feasibility models.

agent, feasibility model, feasible action, (14 more...)

2412.04327

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Transportation (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier

Dang, John, Singh, Shivalika, D'souza, Daniel, Ahmadian, Arash, Salamanca, Alejandro, Smith, Madeline, Peppin, Aidan, Hong, Sungjin, Govindassamy, Manoj, Zhao, Terrence, Kublik, Sandra, Amer, Meor, Aryabumi, Viraat, Campos, Jon Ander, Tan, Yi-Chern, Kocmi, Tom, Strub, Florian, Grinsztajn, Nathan, Flet-Berliac, Yannis, Locatelli, Acyr, Lin, Hangyu, Talupuru, Dwarak, Venkitesh, Bharat, Cairuz, David, Yang, Bowen, Chung, Tim, Ko, Wei-Yin, Shi, Sylvie Shang, Shukayev, Amir, Bae, Sammie, Piktus, Aleksandra, Castagné, Roman, Cruz-Salinas, Felipe, Kim, Eddie, Crawhall-Stein, Lucas, Morisot, Adrien, Roy, Sudip, Blunsom, Phil, Zhang, Ivan, Gomez, Aidan, Frosst, Nick, Fadaee, Marzieh, Ermis, Beyza, Üstün, Ahmet, Hooker, Sara

We introduce the Aya Expanse model family, a new generation of 8B and 32B parameter multilingual language models, aiming to address the critical challenge of developing highly performant multilingual models that match or surpass the capabilities of monolingual models. By leveraging several years of research at Cohere For AI and Cohere, including advancements in data arbitrage, multilingual preference training, and model merging, Aya Expanse sets a new state-of-the-art in multilingual performance. Our evaluations on the Arena-Hard-Auto dataset, translated into 23 languages, demonstrate that Aya Expanse 8B and 32B outperform leading open-weight models in their respective parameter classes, including Gemma 2, Qwen 2.5, and Llama 3.1, achieving up to a 76.6% win-rate. Notably, Aya Expanse 32B outperforms Llama 3.1 70B, a model with twice as many parameters, achieving a 54.0% win-rate. In this short technical report, we present extended evaluation results for the Aya Expanse model family and release their open-weights, together with a new multilingual evaluation dataset m-ArenaHard.

arxiv, aya expanse model, language model, (12 more...)

2412.04261

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)