Goto

Collaborating Authors

 relative risk



With the Rise of AI, Cisco Sounds an Urgent Alarm About the Risks of Aging Tech

WIRED

Generative AI is making it even easier for attackers to exploit old and often forgotten network equipment. Replacing it takes investment, but Cisco is making the case that it's worth it. Aging digital infrastructure equipment like routers, network switches, and network-attached storage--has long posed a silent risk to organizations. In the short term, it's cheaper and easier to just leave those boxes running in a forgotten closet. But this infrastructure may have old, insecure configurations, and legacy tech is often no longer supported by vendors for software patches and other protections.



Differentially private ratio statistics

arXiv.org Machine Learning

Ratio statistics--such as relative risk and odds ratios--play a central role in hypothesis testing, model evaluation, and decision-making across many areas of machine learning, including causal inference and fairness analysis. However, despite privacy concerns surrounding many datasets and despite increasing adoption of differential privacy, differentially private ratio statistics have largely been neglected by the literature and have only recently received an initial treatment by Lin et al. [1]. This paper attempts to fill this lacuna, giving results that can guide practice in evaluating ratios when the results must be protected by differential privacy. In particular, we show that even a simple algorithm can provide excellent properties concerning privacy, sample accuracy, and bias, not just asymptotically but also at quite small sample sizes. Additionally, we analyze a differentially private estimator for relative risk, prove its consistency, and develop a method for constructing valid confidence intervals. Our approach bridges a gap in the differential privacy literature and provides a practical solution for ratio estimation in private machine learning pipelines.


From Individual Experience to Collective Evidence: A Reporting-Based Framework for Identifying Systemic Harms

arXiv.org Artificial Intelligence

When an individual reports a negative interaction with some system, how can their personal experience be contextualized within broader patterns of system behavior? We study the incident database problem, where individual reports of adverse events arrive sequentially, and are aggregated over time. In this work, our goal is to identify whether there are subgroups--defined by any combination of relevant features--that are disproportionately likely to experience harmful interactions with the system. We formalize this problem as a sequential hypothesis test, and identify conditions on reporting behavior that are sufficient for making inferences about disparities in true rates of harm across subgroups. We show that algorithms for sequential hypothesis tests can be applied to this problem with a standard multiple testing correction. We then demonstrate our method on real-world datasets, including mortgage decisions and vaccine side effects; on each, our method (re-)identifies subgroups known to experience disproportionate harm using only a fraction of the data that was initially used to discover them.


TRG-planner: Traversal Risk Graph-Based Path Planning in Unstructured Environments for Safe and Efficient Navigation

arXiv.org Artificial Intelligence

Unstructured environments such as mountains, caves, construction sites, or disaster areas are challenging for autonomous navigation because of terrain irregularities. In particular, it is crucial to plan a path to avoid risky terrain and reach the goal quickly and safely. In this paper, we propose a method for safe and distance-efficient path planning, leveraging Traversal Risk Graph (TRG), a novel graph representation that takes into account geometric traversability of the terrain. TRG nodes represent stability and reachability of the terrain, while edges represent relative traversal risk-weighted path candidates. Additionally, TRG is constructed in a wavefront propagation manner and managed hierarchically, enabling real-time planning even in large-scale environments. Lastly, we formulate a graph optimization problem on TRG that leads the robot to navigate by prioritizing both safe and short paths. Our approach demonstrated superior safety, distance efficiency, and fast processing time compared to the conventional methods. It was also validated in several real-world experiments using a quadrupedal robot. Notably, TRG-planner contributed as the global path planner of an autonomous navigation framework for the DreamSTEP team, which won the Quadruped Robot Challenge at ICRA 2023. The project page is available at https://trg-planner.github.io .


Estimation of multiple mean vectors in high dimension

arXiv.org Machine Learning

We endeavour to estimate numerous multi-dimensional means of various probability distributions on a common space based on independent samples. Our approach involves forming estimators through convex combinations of empirical means derived from these samples. We introduce two strategies to find appropriate data-dependent convex combination weights: a first one employing a testing procedure to identify neighbouring means with low variance, which results in a closed-form plug-in formula for the weights, and a second one determining weights via minimization of an upper confidence bound on the quadratic risk.Through theoretical analysis, we evaluate the improvement in quadratic risk offered by our methods compared to the empirical means. Our analysis focuses on a dimensional asymptotics perspective, showing that our methods asymptotically approach an oracle (minimax) improvement as the effective dimension of the data increases.We demonstrate the efficacy of our methods in estimating multiple kernel mean embeddings through experiments on both simulated and real-world datasets.


Interpretable Sequence Clustering

arXiv.org Artificial Intelligence

Categorical sequence clustering plays a crucial role in various fields, but the lack of interpretability in cluster assignments poses significant challenges. Sequences inherently lack explicit features, and existing sequence clustering algorithms heavily rely on complex representations, making it difficult to explain their results. To address this issue, we propose a method called Interpretable Sequence Clustering Tree (ISCT), which combines sequential patterns with a concise and interpretable tree structure. ISCT leverages k-1 patterns to generate k leaf nodes, corresponding to k clusters, which provides an intuitive explanation on how each cluster is formed. More precisely, ISCT first projects sequences into random subspaces and then utilizes the k-means algorithm to obtain high-quality initial cluster assignments. Subsequently, it constructs a pattern-based decision tree using a boosting-based construction strategy in which sequences are re-projected and re-clustered at each node before mining the top-1 discriminative splitting pattern. Experimental results on 14 real-world data sets demonstrate that our proposed method provides an interpretable tree structure while delivering fast and accurate cluster assignments.


Pre-screening breast cancer with machine learning and deep learning

arXiv.org Artificial Intelligence

We suggest that deep learning can be used for pre-screening cancer by analyzing demographic and anthropometric information of patients, as well as biological markers obtained from routine blood samples and relative risks obtained from meta-analysis and international databases. We applied feature selection algorithms to a database of 116 women, including 52 healthy women and 64 women diagnosed with breast cancer, to identify the best pre-screening predictors of cancer. We utilized the best predictors to perform k-fold Monte Carlo cross-validation experiments that compare deep learning against traditional machine learning algorithms. Our results indicate that a deep learning model with an input-layer architecture that is fine-tuned using feature selection can effectively distinguish between patients with and without cancer. Additionally, compared to machine learning, deep learning has the lowest uncertainty in its predictions. These findings suggest that deep learning algorithms applied to cancer pre-screening offer a radiation-free, non-invasive, and affordable complement to screening methods based on imagery. The implementation of deep learning algorithms in cancer pre-screening offer opportunities to identify individuals who may require imaging-based screening, can encourage self-examination, and decrease the psychological externalities associated with false positives in cancer screening. The integration of deep learning algorithms for both screening and pre-screening will ultimately lead to earlier detection of malignancy, reducing the healthcare and societal burden associated to cancer treatment.


Causal Inference in Case-Control Studies

arXiv.org Machine Learning

We investigate partial identification of causal relative and attributable risk---the ratio of two counterfactual proportions and the difference between them---in case-control and case-population studies. The odds ratio is shown to be a sharp upper bound on causal relative risk under the monotone treatment response and monotone treatment selection assumptions, without resorting to strong ignorability, nor to the rare-disease assumption. Sharp bounds on causal attributable risk are also obtained under the same assumptions. Paying special attention to the (conditional) odds ratio, we propose a semiparametrically efficient estimator of the aggregated (log) odds ratio. Further, we develop easy-to-implement causal inference procedures for relative and attributable risk. Finally, we showcase our methodology by applying it to two unique datasets in the literature. We find that attending private school may have little effect on entering a very selective university in Pakistan and that dropping out of school could substantially increase relative and attributable risk of joining a criminal gang in Brazil.