Goto

Collaborating Authors

 multiple linear regression


A Self-Evolving AI Agent System for Climate Science

Guo, Zijie, Wang, Jiong, Ling, Fenghua, Wei, Wangxu, Yue, Xiaoyu, Jiang, Zhe, Xu, Wanghan, Luo, Jing-Jia, Cheng, Lijing, Ham, Yoo-Geun, Song, Fengfei, Gentine, Pierre, Yamagata, Toshio, Fei, Ben, Zhang, Wenlong, Gu, Xinyu, Li, Chao, Wang, Yaqiang, Chen, Tao, Ouyang, Wanli, Zhou, Bowen, Bai, Lei

arXiv.org Artificial Intelligence

Scientific progress in Earth science depends on integrating data across the planet's interconnected spheres. However, the accelerating volume and fragmentation of multi-sphere knowledge and data have surpassed human analytical capacity. This creates a major bottleneck for discovery, especially in climate science. To address this challenge, we introduce EarthLink, the first self-evolving AI agent system designed as an interactive "copilot" for Earth scientists. Through natural language interaction, EarthLink automates the entire research workflow by integrating planning, code execution, data analysis, and physical reasoning into a unified process that directly addresses this limitation. Beyond efficiency, it exhibits human-like cross-disciplinary analytical ability and achieves proficiency comparable to a junior researcher in expert evaluations on core large-scale climate tasks, including model-observation comparison and climate change understanding. When tasked with an open scientific problem, specifically the discovery of precursors of the Atlantic Niño, EarthLink autonomously developed a research strategy, identified sources of predictability, verified its hypotheses with available data, and proposed a physically consistent mechanism. These emerging capabilities enable a new human-AI research paradigm. Scientists can focus on value and result judgments, while AI systems handle complex data analysis and knowledge integration. This accelerates the pace and breadth of discovery in Earth sciences. The system is accessible at our website https://earthlink.intern-ai.org.cn.


Prediction of Highway Traffic Flow Based on Artificial Intelligence Algorithms Using California Traffic Data

Lee, Junseong, Cho, Jaegwan, Cho, Yoonju, Choi, Seoyoon, Shin, Yejin

arXiv.org Artificial Intelligence

--The study "Prediction of Highway Traffic Flow Based on Artificial Intelligence Algorithms Using California Traffic Data" presents a machine learning-based traffic flow prediction model to address global traffic congestion issues. The study employed Multiple Linear Regression (MLR) and Random Forest (RF) algorithms, analyzing data collection intervals ranging from 30 seconds to 15 minutes. Using R, MAE, and RMSE as performance metrics, the analysis revealed that both MLR and RF models performed optimally with 10-minute data collection intervals. These findings are expected to contribute to future traffic congestion solutions and efficient traffic management. Currently, traffic congestion is one of the most pressing issues faced globally.


Machine Learning Models for Reinforced Concrete Pipes Condition Prediction: The State-of-the-Art Using Artificial Neural Networks and Multiple Linear Regression in a Wisconsin Case Study

Mohammadagha, Mohsen, Najafi, Mohammad, Kaushal, Vinayak, Jibreen, Ahmad Mahmoud Ahmad

arXiv.org Artificial Intelligence

The aging sewer infrastructure in the U.S., covering 2.1 million kilometers, encounters increasing structural issues, resulting in around 75,000 yearly sanitary sewer overflows that present serious economic, environmental, and public health hazards. Conventional inspection techniques and deterministic models do not account for the unpredictable nature of sewer decline, whereas probabilistic methods depend on extensive historical data, which is frequently lacking or incomplete. This research intends to enhance predictive accuracy for the condition of sewer pipelines through machine learning models artificial neural networks (ANNs) and multiple linear regression (MLR) by integrating factors such as pipe age, material, diameter, environmental influences, and PACP ratings. ANNs utilized ReLU activation functions and Adam optimization, whereas MLR applied regularization to address multicollinearity, with both models assessed through metrics like RMSE, MAE, and R2. The findings indicated that ANNs surpassed MLR, attaining an R2 of 0.9066 compared to MLRs 0.8474, successfully modeling nonlinear relationships while preserving generalization. MLR, on the other hand, offered enhanced interpretability by pinpointing significant predictors such as residual buildup. As a result, pipeline degradation is driven by pipe length, age, and pipe diameter as key predictors, while depth, soil type, and segment show minimal influence in this analysis. Future studies ought to prioritize hybrid models that merge the accuracy of ANNs with the interpretability of MLR, incorporating advanced methods such as SHAP analysis and transfer learning to improve scalability in managing infrastructure and promoting environmental sustainability.


Evaluating authenticity and quality of image captions via sentiment and semantic analyses

Krotov, Aleksei, Tebo, Alison, Picart, Dylan K., Algave, Aaron Dean

arXiv.org Artificial Intelligence

The growth of deep learning (DL) relies heavily on huge amounts of labelled data for tasks such as natural language processing and computer vision. Specifically, in image-to-text or image-to-image pipelines, opinion (sentiment) may be inadvertently learned by a model from human-generated image captions. Additionally, learning may be affected by the variety and diversity of the provided captions. While labelling large datasets has largely relied on crowd-sourcing or data-worker pools, evaluating the quality of such training data is crucial. This study proposes an evaluation method focused on sentiment and semantic richness. That method was applied to the COCO-MS dataset, comprising approximately 150K images with segmented objects and corresponding crowd-sourced captions. We employed pre-trained models (Twitter-RoBERTa-base and BERT-base) to extract sentiment scores and variability of semantic embeddings from captions. The relation of the sentiment score and semantic variability with object categories was examined using multiple linear regression. Results indicate that while most captions were neutral, about 6% of the captions exhibited strong sentiment influenced by specific object categories. Semantic variability of within-image captions remained low and uncorrelated with object categories. Model-generated captions showed less than 1.5% of strong sentiment which was not influenced by object categories and did not correlate with the sentiment of the respective human-generated captions. This research demonstrates an approach to assess the quality of crowd- or worker-sourced captions informed by image content.


Forecasting Cryptocurrency Staking Rewards

Gupta, Sauren, Katharaki, Apoorva Hathi, Xu, Yifan, Krishnamachari, Bhaskar, Gupta, Rajarshi

arXiv.org Artificial Intelligence

This research explores a relatively unexplored area of predicting cryptocurrency staking rewards, offering potential insights to researchers and investors. We investigate two predictive methodologies: a) a straightforward sliding-window average, and b) linear regression models predicated on historical data. The findings reveal that ETH staking rewards can be forecasted with an RMSE within 0.7% and 1.1% of the mean value for 1-day and 7-day look-aheads respectively, using a 7-day sliding-window average approach. Additionally, we discern diverse prediction accuracies across various cryptocurrencies, including SOL, XTZ, ATOM, and MATIC. Linear regression is identified as superior to the moving-window average for perdicting in the short term for XTZ and ATOM. The results underscore the generally stable and predictable nature of staking rewards for most assets, with MATIC presenting a noteworthy exception.


A Data-Driven Approach to Positioning Grab Bars in the Sagittal Plane for Elderly Persons

Bolli, Roberto Jr., Asada, H. Harry

arXiv.org Artificial Intelligence

Abstract--The placement of grab bars for elderly users is based largely on ADA building codes and does not reflect the large differences in height, mobility, and muscle power between individual persons. The goal of this study is to see if there are any correlations between an elderly user's preferred handlebar pose and various demographic indicators, self-rated mobility for tasks requiring postural change, and biomechanical markers. For simplicity, we consider only the case where the handlebar is positioned directly in front of the user, as this confines the relevant body kinematics to a 2D sagittal plane. Previous eldercare devices have been constructed to position a handlebar in various poses in space. Our work augments these devices and adds to the body of knowledge by assessing how the handlebar should be positioned based on data on actual elderly people instead of simulations.


Multiple Linear Regression in R - Lituptech Digital

#artificialintelligence

We are going to learn how to implement a Multiple Linear Regression model in R. This is a bit more complex than Simple Linear Regression but it's going to be so practical and fun. Multiple Linear Regression is a data science technique that uses several explanatory variables to predict the outcome of a response variable. A Multiple linear regression model attempts to model the relationship between two or more explanatory variables (independent variables) and a response variable (dependent variable), by fitting a linear equation to observed data. Every value of the independent variable x is associated with a value of the dependent variable y.


Feature Transformation for Multiple Linear Regression in Python

#artificialintelligence

Data processing and transformation is an iterative process and in a way, it can never be'perfect'. Because as we gain more understanding on the dataset, such as the inner relationships between target variable and features, or the business context, we think of new ways to deal with them. Recently I started working on media mix models and some predictive models utilizing multiple linear regression. In this post, I will introduce the thought process and different ways to deal with variables for modeling purpose. I will use King County house price data set (a modified version for more fun) as an example.


Linear regression in detail. Linear regression is a statistical…

#artificialintelligence

Linear regression is a statistical method for modeling the relationship between a dependent variable and one or more independent variables. It is a widely-used technique for predicting the outcome of a continuous variable, and it is especially useful when you have a large amount of data. In this blog post, we will discuss the theory behind linear regression, how to perform it in practice, and some of its applications. The basic idea behind linear regression is to find a line that best fits a set of data points. The line is represented by the equation y mx b, where y is the dependent variable, x is the independent variable, m is the slope of the line, and b is the y-intercept.


REGRESSION -- HOW, WHY, AND WHEN? – Towards AI

#artificialintelligence

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. As we previously saw, the supervised part of machine learning is separated into two categories, and from those two categories, we have already ventured into the realm of classification and the many algorithms employed in the classification process.