Goto

Collaborating Authors

 Regression


Multidimensional Knowledge Graph Embeddings for International Trade Flow Analysis

arXiv.org Artificial Intelligence

Understanding the complex dynamics of high-dimensional, contingent, and strongly nonlinear economic data, often shaped by multiplicative processes, poses significant challenges for traditional regression methods as such methods offer limited capacity to capture the structural changes they feature. To address this, we propose leveraging the potential of knowledge graph embeddings for economic trade data, in particular, to predict international trade relationships. We implement KonecoKG, a knowledge graph representation of economic trade data with multidimensional relationships using SDM-RDFizer, and transform the relationships into a knowledge graph embedding using AmpliGraph.


Controllable RANSAC-based Anomaly Detection via Hypothesis Testing

arXiv.org Machine Learning

Detecting the presence of anomalies in regression models is a crucial task in machine learning, as anomalies can significantly impact the accuracy and reliability of predictions. Random Sample Consensus (RANSAC) is one of the most popular robust regression methods for addressing this challenge. However, this method lacks the capability to guarantee the reliability of the anomaly detection (AD) results. In this paper, we propose a novel statistical method for testing the AD results obtained by RANSAC, named CTRL-RANSAC (controllable RANSAC). The key strength of the proposed method lies in its ability to control the probability of misidentifying anomalies below a pre-specified level $\alpha$ (e.g., $\alpha = 0.05$). By examining the selection strategy of RANSAC and leveraging the Selective Inference (SI) framework, we prove that achieving controllable RANSAC is indeed feasible. Furthermore, we introduce a more strategic and computationally efficient approach to enhance the true detection rate and overall performance of the CTRL-RANSAC. Experiments conducted on synthetic and real-world datasets robustly support our theoretical results, showcasing the superior performance of the proposed method.


Towards Optimal Environmental Policies: Policy Learning under Arbitrary Bipartite Network Interference

arXiv.org Artificial Intelligence

The substantial effect of air pollution on cardiovascular disease and mortality burdens is well-established. Emissions-reducing interventions on coal-fired power plants -- a major source of hazardous air pollution -- have proven to be an effective, but costly, strategy for reducing pollution-related health burdens. Targeting the power plants that achieve maximum health benefits while satisfying realistic cost constraints is challenging. The primary difficulty lies in quantifying the health benefits of intervening at particular plants. This is further complicated because interventions are applied on power plants, while health impacts occur in potentially distant communities, a setting known as bipartite network interference (BNI). In this paper, we introduce novel policy learning methods based on Q- and A-Learning to determine the optimal policy under arbitrary BNI. We derive asymptotic properties and demonstrate finite sample efficacy in simulations. We apply our novel methods to a comprehensive dataset of Medicare claims, power plant data, and pollution transport networks. Our goal is to determine the optimal strategy for installing power plant scrubbers to minimize ischemic heart disease (IHD) hospitalizations under various cost constraints. We find that annual IHD hospitalization rates could be reduced in a range from 20.66-44.51 per 10,000 person-years through optimal policies under different cost constraints.


Isolated Causal Effects of Natural Language

arXiv.org Artificial Intelligence

As language technologies become widespread, it is important to understand how variations in language affect reader perceptions -- formalized as the isolated causal effect of some focal language-encoded intervention on an external outcome. A core challenge of estimating isolated effects is the need to approximate all non-focal language outside of the intervention. In this paper, we introduce a formal estimation framework for isolated causal effects and explore how different approximations of non-focal language impact effect estimates. Drawing on the principle of omitted variable bias, we present metrics for evaluating the quality of isolated effect estimation and non-focal language approximation along the axes of fidelity and overlap. In experiments on semi-synthetic and real-world data, we validate the ability of our framework to recover ground truth isolated effects, and we demonstrate the utility of our proposed metrics as measures of quality for both isolated effect estimates and non-focal language approximations.


Enhancing Cryptocurrency Market Forecasting: Advanced Machine Learning Techniques and Industrial Engineering Contributions

arXiv.org Artificial Intelligence

Cryptocurrencies, as decentralized digital assets, have experienced rapid growth and adoption, with over 23,000 cryptocurrencies and a market capitalization nearing \$1.1 trillion (about \$3,400 per person in the US) as of 2023. This dynamic market presents significant opportunities and risks, highlighting the need for accurate price prediction models to manage volatility. This chapter comprehensively reviews machine learning (ML) techniques applied to cryptocurrency price prediction from 2014 to 2024. We explore various ML algorithms, including linear models, tree-based approaches, and advanced deep learning architectures such as transformers and large language models. Additionally, we examine the role of sentiment analysis in capturing market sentiment from textual data like social media posts and news articles to anticipate price fluctuations. With expertise in optimizing complex systems and processes, industrial engineers are pivotal in enhancing these models. They contribute by applying principles of process optimization, efficiency, and risk mitigation to improve computational performance and data management. This chapter highlights the evolving landscape of cryptocurrency price prediction, the integration of emerging technologies, and the significant role of industrial engineers in refining predictive models. By addressing current limitations and exploring future research directions, this chapter aims to advance the development of more accurate and robust prediction systems, supporting better-informed investment decisions and more stable market behavior.


Provable In-context Learning for Mixture of Linear Regressions using Transformers

arXiv.org Machine Learning

We theoretically investigate the in-context learning capabilities of transformers in the context of learning mixtures of linear regression models. For the case of two mixtures, we demonstrate the existence of transformers that can achieve an accuracy, relative to the oracle predictor, of order $\mathcal{\tilde{O}}((d/n)^{1/4})$ in the low signal-to-noise ratio (SNR) regime and $\mathcal{\tilde{O}}(\sqrt{d/n})$ in the high SNR regime, where $n$ is the length of the prompt, and $d$ is the dimension of the problem. Additionally, we derive in-context excess risk bounds of order $\mathcal{O}(L/\sqrt{B})$, where $B$ denotes the number of (training) prompts, and $L$ represents the number of attention layers. The order of $L$ depends on whether the SNR is low or high. In the high SNR regime, we extend the results to $K$-component mixture models for finite $K$. Extensive simulations also highlight the advantages of transformers for this task, outperforming other baselines such as the Expectation-Maximization algorithm.


Diffusion-based Semi-supervised Spectral Algorithm for Regression on Manifolds

arXiv.org Machine Learning

We introduce a novel diffusion-based spectral algorithm to tackle regression analysis on high-dimensional data, particularly data embedded within lower-dimensional manifolds. Traditional spectral algorithms often fall short in such contexts, primarily due to the reliance on predetermined kernel functions, which inadequately address the complex structures inherent in manifold-based data. By employing graph Laplacian approximation, our method uses the local estimation property of heat kernel, offering an adaptive, data-driven approach to overcome this obstacle. Another distinct advantage of our algorithm lies in its semi-supervised learning framework, enabling it to fully use the additional unlabeled data. This ability enhances the performance by allowing the algorithm to dig the spectrum and curvature of the data manifold, providing a more comprehensive understanding of the dataset. Moreover, our algorithm performs in an entirely data-driven manner, operating directly within the intrinsic manifold structure of the data, without requiring any predefined manifold information. We provide a convergence analysis of our algorithm. Our findings reveal that the algorithm achieves a convergence rate that depends solely on the intrinsic dimension of the underlying manifold, thereby avoiding the curse of dimensionality associated with the higher ambient dimension.


Identifying High Consideration E-Commerce Search Queries

arXiv.org Artificial Intelligence

In e-commerce, high consideration search missions typically require careful and elaborate decision making, and involve a substantial research investment from customers. We consider the task of identifying High Consideration (HC) queries. Identifying such queries enables e-commerce sites to better serve user needs using targeted experiences such as curated QA widgets that help users reach purchase decisions. We explore the task by proposing an Engagement-based Query Ranking (EQR) approach, focusing on query ranking to indicate potential engagement levels with query-related shopping knowledge content during product search. Unlike previous studies on predicting trends, EQR prioritizes query-level features related to customer behavior, finance, and catalog information rather than popularity signals. We introduce an accurate and scalable method for EQR and present experimental results demonstrating its effectiveness. Offline experiments show strong ranking performance. Human evaluation shows a precision of 96% for HC queries identified by our model. The model was commercially deployed, and shown to outperform human-selected queries in terms of downstream customer impact, as measured through engagement.


Diffusing States and Matching Scores: A New Framework for Imitation Learning

arXiv.org Artificial Intelligence

Adversarial Imitation Learning is traditionally framed as a two-player zero-sum game between a learner and an adversarially chosen cost function, and can therefore be thought of as the sequential generalization of a Generative Adversarial Network (GAN). However, in recent years, diffusion models have emerged as a non-adversarial alternative to GANs that merely require training a score function via regression, yet produce generations of a higher quality. In response, we investigate how to lift insights from diffusion modeling to the sequential setting. We propose diffusing states and performing score-matching along diffused states to measure the discrepancy between the expert's and learner's states. Thus, our approach only requires training score functions to predict noises via standard regression, making it significantly easier and more stable to train than adversarial methods. Theoretically, we prove first-and second-order instance-dependent bounds with linear scaling in the horizon, proving that our approach avoids the compounding errors that stymie offline approaches to imitation learning. Empirically, we show our approach outperforms both GAN-style imitation learning baselines and discriminator-free imitation learning baselines across various continuous control problems, including complex tasks like controlling humanoids to walk, sit, crawl, and navigate through obstacles. Fundamentally, in imitation learning (IL, Osa et al. (2018)), we want to match the sequential behavior of an expert demonstrator. Different notions of what matching should mean for IL have been proposed in the literature, from f-divergences (Ho & Ermon, 2016; Ke et al., 2021) to Integral Probability Metrics (IPMs, Müller (1997); Sun et al. (2019); Kidambi et al. (2021); Swamy et al. (2021); Chang et al. (2021); Song et al. (2024)). To compute the chosen notion of divergence from the expert demonstrations so that the learner can then optimize it, it is common to train a discriminator (i.e. a classifier) between expert and learner data. This discriminator is then used as a reward function for a policy update, an approach known as inverse reinforcement learning (IRL, Abbeel & Ng (2004); Ziebart et al. (2008)).


Generalization for Least Squares Regression With Simple Spiked Covariances

arXiv.org Machine Learning

Random matrix theory has proven to be a valuable tool in analyzing the generalization of linear models. However, the generalization properties of even two-layer neural networks trained by gradient descent remain poorly understood. To understand the generalization performance of such networks, it is crucial to characterize the spectrum of the feature matrix at the hidden layer. Recent work has made progress in this direction by describing the spectrum after a single gradient step, revealing a spiked covariance structure. Y et, the generalization error for linear models with spiked covariances has not been previously determined. We derive their generalization error in the asymptotic proportional regime. Our analysis demonstrates that the eigenvector and eigenvalue corresponding to the spike significantly influence the generalization error. Significant theoretical work has been dedicated to understanding generalization in linear regression models (Dobriban & Wager, 2018; Advani et al., 2020; Mel & Ganguli, 2021; Derezinski et al., 2020; Hastie et al., 2022; Kausik et al., 2024; Wang et al., 2024a). For the random features approximation, the first layer of the neural network is considered fixed, and only the outer layer is trained. It has been shown that to understand the generalization, we need to analyze the distribution of singular values of F . Works such as Pennington & Worah (2017); Adlam et al. (2019); Benigni & Péché (2021); Fan & Wang (2020); Wang & Zhu (2024); Péché (2019); Piccolo & Schröder (2021) have studied the spectrum of F in the asymptotic limit, enabling us to understand the generalization. However, random feature models do not leverage the feature learning capabilities of neural networks. To gain further insights into the performance of two-layer neural networks and their feature learning capabilities, we need to train the inner layer. Recent studies such as Ba et al. (2022); Moniri et al. (2023) have examined the effects on F of taking one gradient step for the inner layer. Specifically, Ba et al. (2022) showed that with a sufficiently large step size η, two-layer models can already outperform random feature models after just one step. Moniri et al. (2023) extended this work to study many different scales for the step size. The bulk corresponds to F 0, while the spikes represent the effect of P .