AITopics | South America

Collaborating Authors

South America

Applying the maximum entropy principle to neural networks enhances multi-species distribution models

Ryckewaert, Maxime, Marcos, Diego, Botella, Christophe, Servajean, Maximilien, Bonnet, Pierre, Joly, Alexis

arXiv.org Machine LearningJan-15-2025

The rapid expansion of citizen science initiatives has led to a significant growth of biodiversity databases, and particularly presence-only (PO) observations. PO data are invaluable for understanding species distributions and their dynamics, but their use in a Species Distribution Model (SDM) is curtailed by sampling biases and the lack of information on absences. Poisson point processes are widely used for SDMs, with Maxent being one of the most popular methods. Maxent maximises the entropy of a probability distribution across sites as a function of predefined transformations of variables, called features. In contrast, neural networks and deep learning have emerged as a promising technique for automatic feature extraction from complex input variables. Arbitrarily complex transformations of input variables can be learned from the data efficiently through backpropagation and stochastic gradient descent (SGD). In this paper, we propose DeepMaxent, which harnesses neural networks to automatically learn shared features among species, using the maximum entropy principle. To do so, it employs a normalised Poisson loss where for each species, presence probabilities across sites are modelled by a neural network. We evaluate DeepMaxent on a benchmark dataset known for its spatial sampling biases, using PO data for calibration and presence-absence (PA) data for validation across six regions with different biological groups and covariates. Our results indicate that DeepMaxent performs better than Maxent and other leading SDMs across all regions and taxonomic groups. The method performs particularly well in regions of uneven sampling, demonstrating substantial potential to increase SDM performances. In particular, our approach yields more accurate predictions than traditional single-species models, which opens up new possibilities for methodological enhancement.

batch size, correction, deepmaxent, (14 more...)

arXiv.org Machine Learning

2412.19217

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > France > Occitanie > Hérault > Montpellier (0.04)
Oceania > Australia > New South Wales (0.04)
(6 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DNN-Powered MLOps Pipeline Optimization for Large Language Models: A Framework for Automated Deployment and Resource Management

Krishnamoorthy, Mahesh Vaijainthymala, Palavesam, Kuppusamy Vellamadam, Arcot, Siva Venkatesh, Kuppuswami, Rajarajeswari Chinniah

arXiv.org Artificial IntelligenceJan-14-2025

The exponential growth in the size and complexity of Large Language Models (LLMs) has introduced unprecedented challenges in their deployment and operational management. Traditional MLOps approaches often fail to efficiently handle the scale, resource requirements, and dynamic nature of these models. This research presents a novel framework that leverages Deep Neural Networks (DNNs) to optimize MLOps pipelines specifically for LLMs. Our approach introduces an intelligent system that automates deployment decisions, resource allocation, and pipeline optimization while maintaining optimal performance and cost efficiency. Through extensive experimentation across multiple cloud environments and deployment scenarios, we demonstrate significant improvements: 40% enhancement in resource utilization, 35% reduction in deployment latency, and 30% decrease in operational costs compared to traditional MLOps approaches. The framework's ability to adapt to varying workloads and automatically optimize deployment strategies represents a significant advancement in automated MLOps management for large-scale language models. Our framework introduces several novel components including a multi-stream neural architecture for processing heterogeneous operational metrics, an adaptive resource allocation system that continuously learns from deployment patterns, and a sophisticated deployment orchestration mechanism that automatically selects optimal strategies based on model characteristics and environmental conditions. The system demonstrates robust performance across various deployment scenarios, including multi-cloud environments, high-throughput production systems, and cost-sensitive deployments. Through rigorous evaluation using production workloads from multiple organizations, we validate our approach's effectiveness in reducing operational complexity while improving system reliability and cost efficiency.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.14802

Country:

South America (0.04)
Oceania > Australia (0.04)
North America > United States > Texas > Tarrant County > Fort Worth (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Automating Credit Card Limit Adjustments Using Machine Learning

Pestana, Diego

arXiv.org Artificial IntelligenceJan-14-2025

Venezuelan banks have historically made credit card limit adjustment decisions manually through committees. However, since the number of credit card holders in Venezuela is expected to increase in the upcoming months due to economic improvements, manual decisions are starting to become unfeasible. In this project, a machine learning model that uses cost-sensitive learning is proposed to automate the task of handing out credit card limit increases. To accomplish this, several neural network and XGBoost models are trained and compared, leveraging Venezolano de Credito's data and using grid search with 10-fold cross-validation. The proposed model is ultimately chosen due to its superior balance of accuracy, cost-effectiveness, and interpretability. The model's performance is evaluated against the committee's decisions using Cohen's kappa coefficient, showing an almost perfect agreement.

adjustment, artificial intelligence, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2501.10451

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
South America > Venezuela > Capital District > Caracas (0.04)
(4 more...)

Genre: Research Report (0.65)

Industry: Banking & Finance > Credit (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.73)

Add feedback

Weight Averaging for Out-of-Distribution Generalization and Few-Shot Domain Adaptation

Xu, Shijian

arXiv.org Artificial IntelligenceJan-14-2025

Empirical risk minimization (ERM) is not robust to changes in the distribution of data. When the distribution of test data is different from that of training data, the problem is known as out-of-distribution generalization. Recently, two techniques have been developed for addressing out-of-distribution generalization in computer vision: weight averaging (WA) and sharpness-aware minimization (SAM). WA involves training multiple models with different hyperparameters and then averaging the weights of these models, which can significantly improve out-of-distribution generalization performance. SAM optimizes a neural network to find minima in flat regions, which have been proven to perform well under distribution shifts. While these techniques have made great progress, there is still room for improvement and further exploration. In this thesis, we propose increasing the model diversity in WA explicitly by introducing gradient similarity as a loss regularizer to further improve out-of-distribution generalization performance. We also propose combining WA and SAM to solve the problem of few-shot domain adaptation. Our extensive experiments on digits datasets (MNIST, SVHN, USPS, MNIST-M) and other domain adaptation datasets (VLCS, PACS) show that combining WA and SAM leads to improved out-of-distribution generalization performance and significantly increases few-shot domain adaptation accuracy.

adaptation, domain adaptation, generalization, (14 more...)

arXiv.org Artificial Intelligence

2501.08361

Country:

North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

UFGraphFR: An attempt at a federated recommendation system based on user text characteristics

Wang, Xudong

arXiv.org Artificial IntelligenceJan-14-2025

Federated learning has become an important research area in 'private computing' due to the 'useable invisibility' of data during training. Inspired by Federated learning, the federated recommendation system has gradually become a new recommendation service architecture that can protect users' privacy. The use of user diagrams to enhance federated recommendations is a promising topic. How to use user diagrams to enhance federated recommendations is a promising research topic. However, it's a great challenge to construct a user diagram without compromising privacy in a federated learning scenario. Inspired by the simple idea that similar users often have the same attribute characteristics, we propose a personalized federated recommendation algorithm based on the user relationship graph constructed by the user text characteristics(Graph Federation Recommendation System based on User Text description Features, UFGraphFR). The method uses the embedding layer weight of the user's text feature description to construct the user relationship graph. It introduces the Transformer mechanism to capture the sequence modeling of the user's historical interaction sequence. Without access to user history interactions and specific user attributes, the federal learning privacy protection of data 'useable invisibility' is embodied. Preliminary experiments on some benchmark datasets demonstrate the superior performance of UFGraphFR. Our experiments show that this model can protect user privacy to some extent without affecting the performance of the recommendation system. The code will be easily available on https://github.com/trueWangSyutung/UFGraphFR.

recommendation, recommendation system, ufgraphfr, (16 more...)

arXiv.org Artificial Intelligence

2501.08044

Country:

North America > United States > New York > New York County > New York City (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)

Genre: Research Report (0.84)

Industry: Information Technology > Security & Privacy (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism

Tang, Chen, Lv, Bo, Zheng, Zifan, Yang, Bohao, Zhao, Kun, Liao, Ning, Wang, Xiaoxing, Xiong, Feiyu, Li, Zhiyu, Liu, Nayu, Jiang, Jingchi

arXiv.org Artificial IntelligenceJan-14-2025

Traditional Mixture-of-Experts (MoE) networks benefit from utilizing multiple smaller expert models as opposed to a single large network. However, these experts typically operate independently, leaving a question open about whether interconnecting these models could enhance the performance of MoE networks. In response, we introduce GRAPHMOE, a novel method aimed at augmenting the cognitive depth of language models via a self-rethinking mechanism constructed on Pseudo GraphMoE networks. GRAPHMOE employs a recurrent routing strategy to simulate iterative thinking steps, thereby facilitating the flow of information among expert nodes. We implement the GRAPHMOE architecture using Low-Rank Adaptation techniques (LoRA) and conduct extensive experiments on various benchmark datasets. The experimental results reveal that GRAPHMOE outperforms other LoRA based models, achieving state-of-the-art (SOTA) performance. Additionally, this study explores a novel recurrent routing strategy that may inspire further advancements in enhancing the reasoning capabilities of language models.

architecture, arxiv preprint arxiv, preprint arxiv, (14 more...)

arXiv.org Artificial Intelligence

2501.0789

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
(5 more...)

Genre:

Research Report > Promising Solution (0.66)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.66)

Add feedback

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Lozano, Alejandro, Sun, Min Woo, Burgess, James, Chen, Liangyu, Nirschl, Jeffrey J, Gu, Jeffrey, Lopez, Ivan, Aklilu, Josiah, Katzer, Austin Wolfgang, Chiu, Collin, Rau, Anita, Wang, Xiaohan, Zhang, Yuhui, Song, Alfred Seunghoon, Tibshirani, Robert, Yeung-Levy, Serena

arXiv.org Artificial IntelligenceJan-14-2025

The development of vision-language models (VLMs) is driven by large-scale and diverse multimodal datasets. However, progress toward generalist biomedical VLMs is limited by the lack of annotated, publicly accessible datasets across biology and medicine. Existing efforts are restricted to narrow domains, missing the full diversity of biomedical knowledge encoded in scientific literature. To address this gap, we introduce BIOMEDICA, a scalable, open-source framework to extract, annotate, and serialize the entirety of the PubMed Central Open Access subset into an easy-to-use, publicly accessible dataset. Our framework produces a comprehensive archive with over 24 million unique image-text pairs from over 6 million articles. Metadata and expert-guided annotations are also provided. We demonstrate the utility and accessibility of our resource by releasing BMCA-CLIP, a suite of CLIP-style models continuously pre-trained on the BIOMEDICA dataset via streaming, eliminating the need to download 27 TB of data locally. On average, our models achieve state-of-the-art performance across 40 tasks - spanning pathology, radiology, ophthalmology, dermatology, surgery, molecular biology, parasitology, and cell biology - excelling in zero-shot classification with a 6.56% average improvement (as high as 29.8% and 17.5% in dermatology and ophthalmology, respectively), and stronger image-text retrieval, all while using 10x less compute. To foster reproducibility and collaboration, we release our codebase and dataset for the broader research community.

caption, dataset, taxonomy, (13 more...)

arXiv.org Artificial Intelligence

2501.07171

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Maryland > Montgomery County > Bethesda (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.92)
Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Pareto Set Learning for Multi-Objective Reinforcement Learning

Liu, Erlong, Wu, Yu-Chang, Huang, Xiaobin, Gao, Chengrui, Wang, Ren-Jian, Xue, Ke, Qian, Chao

arXiv.org Artificial IntelligenceJan-14-2025

Multi-objective decision-making problems have emerged in numerous real-world scenarios, such as video games, navigation and robotics. Considering the clear advantages of Reinforcement Learning (RL) in optimizing decision-making processes, researchers have delved into the development of Multi-Objective RL (MORL) methods for solving multi-objective decision problems. However, previous methods either cannot obtain the entire Pareto front, or employ only a single policy network for all the preferences over multiple objectives, which may not produce personalized solutions for each preference. To address these limitations, we propose a novel decomposition-based framework for MORL, Pareto Set Learning for MORL (PSL-MORL), that harnesses the generation capability of hypernetwork to produce the parameters of the policy network for each decomposition weight, generating relatively distinct policies for various scalarized subproblems with high efficiency. PSL-MORL is a general framework, which is compatible for any RL algorithm. The theoretical result guarantees the superiority of the model capacity of PSL-MORL and the optimality of the obtained policy network. Through extensive experiments on diverse benchmarks, we demonstrate the effectiveness of PSL-MORL in achieving dense coverage of the Pareto front, significantly outperforming state-of-the-art MORL methods in the hypervolume and sparsity indicators.

algorithm, policy network, psl-morl, (14 more...)

arXiv.org Artificial Intelligence

2501.06773

Country:

Europe > Austria > Vienna (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
(16 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model

Hu, Xinshuo, Shan, Zifei, Zhao, Xinping, Sun, Zetian, Liu, Zhenyu, Li, Dongfang, Ye, Shaolin, Wei, Xinyuan, Chen, Qian, Hu, Baotian, Wang, Haofen, Yu, Jun, Zhang, Min

arXiv.org Artificial IntelligenceJan-14-2025

As retrieval-augmented generation prevails in large language models, embedding models are becoming increasingly crucial. Despite the growing number of general embedding models, prior work often overlooks the critical role of training data quality. In this work, we introduce KaLM-Embedding, a general multilingual embedding model that leverages a large quantity of cleaner, more diverse, and domain-specific training data. Our model has been trained with key techniques proven to enhance performance: (1) persona-based synthetic data to create diversified examples distilled from LLMs, (2) ranking consistency filtering to remove less informative samples, and (3) semi-homogeneous task batch sampling to improve training efficacy. Departing from traditional BERT-like architectures, we adopt Qwen2-0.5B as the pre-trained model, facilitating the adaptation of auto-regressive language models for general embedding tasks. Extensive evaluations of the MTEB benchmark across multiple languages show that our model outperforms others of comparable size, setting a new standard for multilingual embedding models with less than 1B parameters.

computational linguistic, proceedings, query, (13 more...)

arXiv.org Artificial Intelligence

2501.01028

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.14)
Europe > Austria > Vienna (0.14)
Asia > China > Hong Kong (0.04)
(36 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs

Chernyshev, Konstantin, Polshkov, Vitaliy, Artemova, Ekaterina, Myasnikov, Alex, Stepanov, Vlad, Miasnikov, Alexei, Tilga, Sergei

arXiv.org Artificial IntelligenceJan-14-2025

The current evaluation of mathematical skills in LLMs is limited, as existing benchmarks are either relatively small, primarily focus on elementary and highschool problems, or lack diversity in topics. Additionally, the inclusion of visual elements in tasks remains largely under-explored. To address these gaps, we introduce U-MATH, a novel benchmark of 1,100 unpublished open-ended university-level problems sourced from teaching materials. It is balanced across six core subjects, with 20% of multimodal problems. Given the open-ended nature of U-MATH problems, we employ an LLM to judge the correctness of generated solutions. To this end, we release µ-MATH, a dataset to evaluate the LLMs' capabilities in judging solutions. The evaluation of general domain, math-specific, and multimodal LLMs highlights the challenges presented by U-MATH. Our findings reveal that LLMs achieve a maximum accuracy of only 63% on text-based tasks, with even lower 45% on visual problems. The solution assessment proves challenging for LLMs, with the best LLM judge having an F1-score of 80% on µ-MATH. Mathematical reasoning is a fundamental domain for assessing the true capabilities of Large Language Models (LLMs) to reason (Ahn et al., 2024). While existing benchmarks like GSM8K (Cobbe et al., 2021) and MATH (Hendrycks et al., 2021) provide valuable insights, they primarily focus on schoollevel mathematics. This leaves a significant gap in understanding how LLMs perform on more advanced, university-level problems.

benchmark, evaluation, qwen2, (15 more...)

arXiv.org Artificial Intelligence

2412.03205

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > Middle East > Jordan (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback