Goto

Collaborating Authors

 statistical technique


Towards provable probabilistic safety for scalable embodied AI systems

He, Linxuan, Jia, Qing-Shan, Li, Ang, Sang, Hongyan, Wang, Ling, Lu, Jiwen, Zhang, Tao, Zhou, Jie, Zhang, Yi, Wang, Yisen, Wei, Peng, Wang, Zhongyuan, Liu, Henry X., Feng, Shuo

arXiv.org Artificial Intelligence

Embodied AI systems, comprising AI models and physical plants, are increasingly prevalent across various applications. Due to the rarity of system failures, ensuring their safety in complex operating environments remains a major challenge, which severely hinders their large-scale deployment in safety-critical domains, such as autonomous vehicles, medical devices, and robotics. While achieving provable deterministic safety--verifying system safety across all possible scenarios--remains theoretically ideal, the rarity and complexity of corner cases make this approach impractical for scalable embodied AI systems. Instead, empirical safety evaluation is employed as an alternative, but the absence of provable guarantees imposes significant limitations. To address these issues, we argue for a paradigm shift to provable probabilistic safety that integrates provable guarantees with progressive achievement toward a probabilistic safety boundary on overall system performance. The new paradigm better leverages statistical methods to enhance feasibility and scalability, and a well-defined probabilistic safety boundary enables embodied AI systems to be deployed at scale. In this Perspective, we outline a roadmap for provable probabilistic safety, along with corresponding challenges and potential solutions. By bridging the gap between theoretical safety assurance and practical deployment, this Perspective offers a pathway toward safer, large-scale adoption of embodied AI systems in safety-critical applications.


Benchmarking Time Series Forecasting Models: From Statistical Techniques to Foundation Models in Real-World Applications

Arab, Issar, Benitez, Rodrigo

arXiv.org Artificial Intelligence

Time series forecasting is essential for operational intelligence in the hospitality industry, and particularly challenging in large-scale, distributed systems. This study evaluates the performance of statistical, machine learning (ML), deep learning, and foundation models in forecasting hourly sales over a 14-day horizon using real-world data from a network of thousands of restaurants across Germany. The forecasting solution includes features such as weather conditions, calendar events, and time-of-day patterns. Results demonstrate the strong performance of ML-based meta-models and highlight the emerging potential of foundation models like Chronos and TimesFM, which deliver competitive performance with minimal feature engineering, leveraging only the pre-trained model (zero-shot inference). Additionally, a hybrid PySpark-Pandas approach proves to be a robust solution for achieving horizontal scalability in large-scale deployments.


A Comparison between Neural Networks and other Statistical Techniques for Modeling the Relationship between Tobacco and Alcohol and Cancer

Neural Information Processing Systems

Epidemiological data is traditionally analyzed with very simple techniques. Flexible models, such as neural networks, have the potential to discover unanticipated features in the data. However, to be useful, flexible models must have effective control on overfit(cid:173) ting. This paper reports on a comparative study of the predictive quality of neural networks and other flexible models applied to real and artificial epidemiological data. The results suggest that there are no major unanticipated complex features in the real data, and also demonstrate that MacKay's [1995] Bayesian neural network methodology provides effective control on overfitting while retain(cid:173) ing the ability to discover complex features in the artificial data.


Uncovering the Essence of Principle Component Analysis: A Comprehensive Guide

#artificialintelligence

Principal component analysis (PCA) is a popular statistical technique for reducing the dimensionality of a dataset while preserving important patterns and relationships in the data. At its core, PCA is a linear transformation method that projects the data onto a lower-dimensional space, revealing the underlying structure of the data. But what exactly is PCA and how does it work? In this article, we'll delve into the fundamentals of PCA and explore its applications in a variety of fields, including machine learning, data visualization, and image processing. We'll also discuss some of the key challenges and limitations of using PCA, and provide practical tips for implementing it in your own analyses.


Mastering the Art of Linear Regression: A Comprehensive Guide

#artificialintelligence

Linear regression is a statistical technique for modeling the relationship between a dependent variable and one or more independent variables. At its core, linear regression is a method for predicting a numerical outcome based on a set of input variables. But what exactly is linear regression and how does it work? In this article, we'll delve into the fundamentals of linear regression and explore its applications in a variety of fields, including economics, finance, and machine learning. We'll also discuss some of the key challenges and limitations of using linear regression, and provide practical tips for implementing it in your own analyses.


4 Techniques to Handle Missing values in Time Series Data

#artificialintelligence

The real-world data often contain missing values. All types of the dataset including time-series data have the problem with missing values. The cause of missing values can be data corruption or failure to record data at any given time. Time Series models work with the complete data and therefore they require to impute the missing values prior to the modeling or actual time series analysis. Dropping the missing value is however an inappropriate solution, as we may lose the correlation of adjacent observation.


How to Become a Data Scientist in 2022?

#artificialintelligence

Data Science offers lucrative career opportunities in this day and age. Data scientists produce actionable business insights using data and implement mathematical algorithms to solve complex business problems. In fact, Amazon product recommendations, Netflix movie suggestions, Google Maps traffic predictions are some of the prime examples of data scientist work that we use every day in our lives! Data scientists' algorithms are helping many companies generate more revenue and enhance the customer experience of their products and services. Owing to these reasons, everybody aspires to be a data scientist these days.


Artificial Intelligence versus Machine Learning: Explained

#artificialintelligence

Machine learning and artificial intelligence are often used interchangeably, but they are two very different things. Artificial Intelligence (AI) is a broad field that has been around for decades -- it's the technology behind Siri and Alexa. Machine Learning (ML), on the other hand, is a subset of AI that uses statistical techniques to allow computers to learn without being explicitly programmed with rules or instructions. Therefore, you can see that machine learning and artificial intelligence aren't interchangeable terms; rather they're related fields in computer science that both utilize mathematics and statistics to create intelligent behavior from machines. To better understand the difference between the two, think of it this way: AI is a toaster that can make toast; ML is an algorithm where you tell the toaster how dark you want your toast and when.


No-Code Analytics – The Best Introduction to Data Science

#artificialintelligence

Although reading books and watching lectures is a great way to learn analytics – it is best to start doing. However, it can be quite tricky to start doing when it comes to languages such as Python and R if someone does not have a coding background. Not only do you need to know what you are doing in terms of analytical procedures, but you also need to understand the nuances of programming languages which adds onto the list of things to learn to just get started. Therefore, the best middle ground between knowledge acquisition (books, videos, etc.) and conducting advanced analytics (Python, R, etc.) is by using open-source analytics software. These types of software are great for both knowledge acquisition and actually doing analysis as documentation is built into the software and you can start doing relatively complex tasks with only mouse clicks.


5 Free Books to Learn Statistics for Data Science

#artificialintelligence

Statistics is a fundamental skill that data scientists use every day. It is the branch of mathematics that allows us to collect, describe, interpret, visualise, and make inferences about data. Data scientists will use it for data analysis, experiment design, and statistical modelling. Statistics is also essential for machine learning. We will use statistics to understand the data prior to training a model.