AITopics

2501.01059

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Singapore (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(13 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

arXiv.org Artificial IntelligenceJan-2-2025

Tensor-Based Foundations of Ordinary Least Squares and Neural Network Regression Models

Algarte, Roberto Dias

This article introduces a novel approach to the mathematical development of Ordinary Least Squares and Neural Network regression models, diverging from traditional methods in current Machine Learning literature. By leveraging Tensor Analysis and fundamental matrix computations, the theoretical foundations of both models are meticulously detailed and extended to their complete algorithmic forms. The study culminates in the presentation of three algorithms, including a streamlined version of the Backpropagation Algorithm for Neural Networks, illustrating the benefits of this new mathematical approach. The following sections of this article require some important mathematical concepts and notations that need to be addressed here in this section. However, we assume the reader to be proficient on the basic and intermediate topics of Linear Algebra and Calculus, which will not be covered.

equality, scalar, tensor, (15 more...)

2411.12873

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.70)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Goel, Samarth, Lee, Reagan J., Ramchandran, Kannan

Quantifying Positional Biases in Text Embedding Models

arXiv.org Artificial IntelligenceJan-1-2025

Embedding models are crucial for tasks in Information Retrieval (IR) and semantic similarity measurement, yet their handling of longer texts and associated positional biases remains underexplored. In this study, we investigate the impact of content position and input size on text embeddings. Our experiments reveal that embedding models, irrespective of their positional encoding mechanisms, disproportionately prioritize the beginning of an input. Ablation studies demonstrate that insertion of irrelevant text or removal at the start of a document reduces cosine similarity between altered and original embeddings by up to 12.3% more than ablations at the end. Regression analysis further confirms this bias, with sentence importance declining as position moves further from the start, even with with content-agnosticity. We hypothesize that this effect arises from pre-processing strategies and chosen positional encoding techniques. These findings quantify the sensitivity of retrieval systems and suggest a new lens towards embedding model robustness.

2412.15241

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
Asia > Nepal (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Marini, Niccolò, Pampaloni, Leonardo, Di Martino, Filippo, Verdecchia, Roberto, Vicario, Enrico

Green AI: Which Programming Language Consumes the Most?

arXiv.org Artificial IntelligenceDec-31-2024

AI is demanding an evergrowing portion of environmental resources. Despite their potential impact on AI environmental sustainability, the role that programming languages play in AI (in)efficiency is to date still unknown. With this study, we aim to understand the impact that programming languages can have on AI environmental sustainability. To achieve our goal, we conduct a controlled empirical experiment by considering five programming languages (C++, Java, Python, MATLAB, and R), seven AI algorithms (KNN, SVC, AdaBoost, decision tree, logistic regression, naive bayses, and random forest), three popular datasets, and the training and inference phases. The collected results show that programming languages have a considerable impact on AI environmental sustainability. Compiled and semi-compiled languages (C++, Java) consistently consume less than interpreted languages (Python, MATLAB, R), which require up to 54x more energy. Some languages are cumulatively more efficient in training, while others in inference. Which programming language consumes the most highly depends on the algorithm considered. Ultimately, algorithm implementation might be the most determining factor in Green AI, regardless of the language used. As conclusion, while making AI more environmentally sustainable is paramount, a trade-off between energy efficiency and implementation ease should always be considered. Green AI can be achieved without the need of completely disrupting the development practices and technologies currently in place.

artificial intelligence, machine learning, programming language, (17 more...)

2501.14776

Country:

North America > United States > Wisconsin (0.04)
North America > United States > New York (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Energy (0.53)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)

Debiased Nonparametric Regression for Statistical Inference and Distributionally Robustness

Kato, Masahiro

This study proposes a debiasing method for smooth nonparametric estimators. While machine learning techniques such as random forests and neural networks have demonstrated strong predictive performance, their theoretical properties remain relatively underexplored. Specifically, many modern algorithms lack assurances of pointwise asymptotic normality and uniform convergence, which are critical for statistical inference and robustness under covariate shift and have been well-established for classical methods like Nadaraya-Watson regression. To address this, we introduce a model-free debiasing method that guarantees these properties for smooth estimators derived from any nonparametric regression approach. By adding a correction term that estimates the conditional expected residual of the original estimator, or equivalently, its estimation error, we obtain a debiased estimator with proven pointwise asymptotic normality, and uniform convergence. These properties enable statistical inference and enhance robustness to covariate shift, making the method broadly applicable to a wide range of nonparametric regression problems.

convergence, estimator, statistics, (15 more...)

2412.20173

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Sahtout, Mohammad Omar, Wang, Haiyan, Ghimire, Santosh

Different thresholding methods on Nearest Shrunken Centroid algorithm

This article considers the impact of different thresholding methods to the Nearest Shrunken Centroid algorithm, which is popularly referred as the Prediction Analysis of Microarrays (PAM) for high-dimensional classification. PAM uses soft thresholding to achieve high computational efficiency and high classification accuracy but in the price of retaining too many features. When applied to microarray human cancers, PAM selected 2611 features on average from 10 multi-class datasets. Such a large number of features make it difficult to perform follow up study. One reason behind this problem is the soft thresholding, which is known to produce biased parameter estimate in regression analysis. In this article, we extend the PAM algorithm with two other thresholding methods, hard and order thresholding, and a deep search algorithm to achieve better thresholding parameter estimate. The modified algorithms are extensively tested and compared to the original one based on real data and Monte Carlo studies. In general, the modification not only gave better cancer status prediction accuracy, but also resulted in more parsimonious models with significantly smaller number of features.

algorithm, informative gene, test error, (15 more...)

doi: 10.1080/03610918.2022.2047201

2501.00632

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.48)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)

Kim, Taejoon, Wang, Haiyan

Matrix factorization and prediction for high dimensional co-occurrence count data via shared parameter alternating zero inflated Gamma model

algorithm, kim & wang matrix factorization, wang matrix factorization and prediction, (9 more...)

High-dimensional sparse matrix data frequently arise in various applications. A notable example is the weighted word-word co-occurrence count data, which summarizes the weighted frequency of word pairs appearing within the same context window. This type of data typically contains highly skewed non-negative values with an abundance of zeros. Another example is the co-occurrence of item-item or user-item pairs in e-commerce, which also generates high-dimensional data. The objective is to utilize this data to predict the relevance between items or users. In this paper, we assume that items or users can be represented by unknown dense vectors. The model treats the co-occurrence counts as arising from zero-inflated Gamma random variables and employs cosine similarity between the unknown vectors to summarize item-item relevance. The unknown values are estimated using the shared parameter alternating zero-inflated Gamma regression models (SA-ZIG). Both canonical link and log link models are considered. Two parameter updating schemes are proposed, along with an algorithm to estimate the unknown parameters. Convergence analysis is presented analytically. Numerical studies demonstrate that the SA-ZIG using Fisher scoring without learning rate adjustment may fail to fi nd the maximum likelihood estimate. However, the SA-ZIG with learning rate adjustment performs satisfactorily in our simulation studies.

2501.00628

Country:

South America > Brazil (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Kansas (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.93)
Information Technology > Services > e-Commerce Services (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Kim, Taejoon, Wang, Haiyan

Global dense vector representations for words or items using shared parameter alternating Tweedie model

algorithm, kim & wang matrix factorization, wang matrix factorization and prediction, (10 more...)

In this article, we present a model for analyzing the cooccurrence count data derived from practical fields such as user-item or item-item data from online shopping platform, cooccurring word-word pairs in sequences of texts. Such data contain important information for developing recommender systems or studying relevance of items or words from non-numerical sources. Different from traditional regression models, there are no observations for covariates. Additionally, the cooccurrence matrix is typically of so high dimension that it does not fit into a computer's memory for modeling. We extract numerical data by defining windows of cooccurrence using weighted count on the continuous scale. Positive probability mass is allowed for zero observations. We present Shared parameter Alternating Tweedie (SA-Tweedie) model and an algorithm to estimate the parameters. We introduce a learning rate adjustment used along with the Fisher scoring method in the inner loop to help the algorithm stay on track of optimizing direction. Gradient descent with Adam update was also considered as an alternative method for the estimation. Simulation studies and an application showed that our algorithm with Fisher scoring and learning rate adjustment outperforms the other two methods. Pseudo-likelihood approach with alternating parameter update was also studied. Numerical studies showed that the pseudo-likelihood approach is not suitable in our shared parameter alternating regression models with unobserved covariates.

2501.00623

Country:

North America > United States > Kansas (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe (0.04)
(2 more...)

Genre: Research Report > New Finding (0.65)

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceDec-30-2024

Advanced Displacement Magnitude Prediction in Multi-Material Architected Lattice Structure Beams Using Physics Informed Neural Network Architecture

Mishra, Akshansh

This paper proposes an innovative method for predicting deformation in architected lattice structures that combines Physics-Informed Neural Networks (PINNs) with finite element analysis. A thorough study was carried out on FCC-based lattice beams utilizing five different materials (Structural Steel, AA6061, AA7075, Ti6Al4V, and Inconel 718) under varied edge loads (1000-10000 N). The PINN model blends data-driven learning with physics-based limitations via a proprietary loss function, resulting in much higher prediction accuracy than linear regression. PINN outperforms linear regression, achieving greater R-square (0.7923 vs 0.5686) and lower error metrics (MSE: 0.00017417 vs 0.00036187). Among the materials examined, AA6061 had the highest displacement sensitivity (0.1014 mm at maximum load), while Inconel718 had better structural stability.

artificial intelligence, deep learning, machine learning, (15 more...)

2501.03254

Country:

Europe > Italy > Lombardy > Milan (0.04)
Asia > India > Tripura (0.04)

Genre:

Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.75)

Lorenzoni, Giuliano, Portugal, Ivens, Alencar, Paulo, Cowan, Donald

Exploring Variability in Fine-Tuned Models for Text Classification with DistilBERT

arXiv.org Artificial IntelligenceDec-30-2024

This study evaluates fine-tuning strategies for text classification using the DistilBERT model, specifically the distilbert-base-uncased-finetuned-sst-2-english variant. Through structured experiments, we examine the influence of hyperparameters such as learning rate, batch size, and epochs on accuracy, F1-score, and loss. Polynomial regression analyses capture foundational and incremental impacts of these hyperparameters, focusing on fine-tuning adjustments relative to a baseline model. Results reveal variability in metrics due to hyperparameter configurations, showing trade-offs among performance metrics. For example, a higher learning rate reduces loss in relative analysis (p=0.027) but challenges accuracy improvements. Meanwhile, batch size significantly impacts accuracy and F1-score in absolute regression (p=0.028 and p=0.005) but has limited influence on loss optimization (p=0.170). The interaction between epochs and batch size maximizes F1-score (p=0.001), underscoring the importance of hyperparameter interplay. These findings highlight the need for fine-tuning strategies addressing non-linear hyperparameter interactions to balance performance across metrics. Such variability and metric trade-offs are relevant for tasks beyond text classification, including NLP and computer vision. This analysis informs fine-tuning strategies for large language models and promotes adaptive designs for broader model applicability.

information retrieval, large language model, machine learning, (21 more...)

2501.00241

Country: North America > Canada > Ontario > Waterloo Region > Waterloo (0.15)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.82)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.77)
(2 more...)