AITopics | pjq

Collaborating Authors

pjq

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Analyze Additive and Interaction Effects via Collaborative Trees

Chi, Chien-Ming

arXiv.org Machine LearningMay-19-2024

We present Collaborative Trees, a novel tree model designed for regression prediction, along with its bagging version, which aims to analyze complex statistical associations between features and uncover potential patterns inherent in the data. We decompose the mean decrease in impurity from the proposed tree model to analyze the additive and interaction effects of features on the response variable. Additionally, we introduce network diagrams to visually depict how each feature contributes additively to the response and how pairs of features contribute interaction effects. Through a detailed demonstration using an embryo growth dataset, we illustrate how the new statistical tools aid data analysis, both visually and numerically. Moreover, we delve into critical aspects of tree modeling, such as prediction performance, inference stability, and bias in feature importance measures, leveraging real datasets and simulation experiments for comprehensive discussions. On the theory side, we show that Collaborative Trees, built upon a ``sum of trees'' approach with our own innovative tree model regularization, exhibit characteristics akin to matching pursuit, under the assumption of high-dimensional independent binary input features (or one-hot feature groups). This newfound link sheds light on the superior capability of our tree model in estimating additive effects of features, a crucial factor for accurate interaction effect estimation.

collaborative tree, erpfpxqq 2, feature group, (14 more...)

arXiv.org Machine Learning

2405.11477

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
(2 more...)

Add feedback

The Phase Transition Phenomenon of Shuffled Regression

Zhang, Hang, Li, Ping

arXiv.org Machine LearningOct-31-2023

We study the phase transition phenomenon inherent in the shuffled (permuted) regression problem, which has found numerous applications in databases, privacy, data analysis, etc. In this study, we aim to precisely identify the locations of the phase transition points by leveraging techniques from message passing (MP). In our analysis, we first transform the permutation recovery problem into a probabilistic graphical model. We then leverage the analytical tools rooted in the message passing (MP) algorithm and derive an equation to track the convergence of the MP algorithm. By linking this equation to the branching random walk process, we are able to characterize the impact of the signal-to-noise-ratio ($\snr$) on the permutation recovery. Depending on whether the signal is given or not, we separately investigate the oracle case and the non-oracle case. The bottleneck in identifying the phase transition regimes lies in deriving closed-form formulas for the corresponding critical points, but only in rare scenarios can one obtain such precise expressions. To tackle this technical challenge, this study proposes the Gaussian approximation method, which allows us to obtain the closed-form formulas in almost all scenarios. In the oracle case, our method can fairly accurately predict the phase transition $\snr$. In the non-oracle case, our algorithm can predict the maximum allowed number of permuted rows and uncover its dependency on the sample number.

artificial intelligence, machine learning, trpmq, (16 more...)

arXiv.org Machine Learning

2310.20438

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Washington > King County > Bellevue (0.04)
North America > United States > Massachusetts (0.04)
(4 more...)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Functional Linear Regression of Cumulative Distribution Functions

Zhang, Qian, Makur, Anuran, Azizzadenesheli, Kamyar

arXiv.org Artificial IntelligenceDec-20-2022

The estimation of cumulative distribution functions (CDFs) is an important learning task with a great variety of downstream applications, such as risk assessments in predictions and decision making. In this paper, we study functional regression of contextual CDFs where each data point is sampled from a linear combination of context dependent CDF basis functions. We propose functional ridge-regression-based estimation methods that estimate CDFs accurately everywhere. In particular, given $n$ samples with $d$ basis functions, we show estimation error upper bounds of $\widetilde{O}(\sqrt{d/n})$ for fixed design, random design, and adversarial context cases. We also derive matching information theoretic lower bounds, establishing minimax optimality for CDF functional regression. Furthermore, we remove the burn-in time in the random design setting using an alternative penalized estimator. Then, we consider agnostic settings where there is a mismatch in the data generation process. We characterize the error of the proposed estimators in terms of the mismatched error, and show that the estimators are well-behaved under model mismatch. Finally, to complete our study, we formalize infinite dimensional models where the parameter space is an infinite dimensional Hilbert space, and establish self-normalized estimation error upper bounds for this setting.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2205.14545

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Optimal high-dimensional and nonparametric distributed testing under communication constraints

Szabó, Botond, Vuursteen, Lasse, van Zanten, Harry

arXiv.org Machine LearningFeb-2-2022

We derive minimax testing errors in a distributed framework where the data is split over multiple machines and their communication to a central machine is limited to $b$ bits. We investigate both the $d$- and infinite-dimensional signal detection problem under Gaussian white noise. We also derive distributed testing algorithms reaching the theoretical lower bounds. Our results show that distributed testing is subject to fundamentally different phenomena that are not observed in distributed estimation. Among our findings, we show that testing protocols that have access to shared randomness can perform strictly better in some regimes than those that do not. Furthermore, we show that consistent nonparametric distributed testing is always possible, even with as little as $1$-bit of communication and the corresponding test outperforms the best local test using only the information available at a single local machine.

inequality, pjq, protocol, (15 more...)

arXiv.org Machine Learning

2202.00968

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
Information Technology > Communications > Networks > Sensor Networks (0.45)

Add feedback