Goto

Collaborating Authors

 Mumbai


Diagnostics for Individual-Level Prediction Instability in Machine Learning for Healthcare

Miller, Elizabeth W., Blume, Jeffrey D.

arXiv.org Machine Learning

In healthcare, predictive models increasingly inform patient-level decisions, yet little attention is paid to the variability in individual risk estimates and its impact on treatment decisions. For overparameterized models, now standard in machine learning, a substantial source of variability often goes undetected. Even when the data and model architecture are held fixed, randomness introduced by optimization and initialization can lead to materially different risk estimates for the same patient. This problem is largely obscured by standard evaluation practices, which rely on aggregate performance metrics (e.g., log-loss, accuracy) that are agnostic to individual-level stability. As a result, models with indistinguishable aggregate performance can nonetheless exhibit substantial procedural arbitrariness, which can undermine clinical trust. We propose an evaluation framework that quantifies individual-level prediction instability by using two complementary diagnostics: empirical prediction interval width (ePIW), which captures variability in continuous risk estimates, and empirical decision flip rate (eDFR), which measures instability in threshold-based clinical decisions. We apply these diagnostics to simulated data and GUSTO-I clinical dataset. Across observed settings, we find that for flexible machine-learning models, randomness arising solely from optimization and initialization can induce individual-level variability comparable to that produced by resampling the entire training dataset. Neural networks exhibit substantially greater instability in individual risk predictions compared to logistic regression models. Risk estimate instability near clinically relevant decision thresholds can alter treatment recommendations. These findings that stability diagnostics should be incorporated into routine model validation for assessing clinical reliability.


AI hit: India hungry to harness US tech giants' technology at Delhi summit

The Guardian

From left: India's prime minister, Narendra Modi, with the chief executives of OpenAI, Sam Altman, and Anthropic, Dario Amodei, at the AI Impact summit in Delhi. From left: India's prime minister, Narendra Modi, with the chief executives of OpenAI, Sam Altman, and Anthropic, Dario Amodei, at the AI Impact summit in Delhi. AI hit: India hungry to harness US tech giants' technology at Delhi summit Narendra Modi's thirst to supercharge economic growth is matched by US desire to inject AI into world's biggest democracy I ndia celebrates 80 years of independence from the UK in August 2027. At about that same moment, "early versions of true super intelligence" could emerge, Sam Altman, the co-founder of OpenAI, said this week. It's a looming coincidence that raised a charged question at the AI Impact summit in Delhi, hosted by India's prime minister, Narendra Modi: can India avoid returning to the status of a vassal state when it imports AI to raise the prospects of its 1.4 billion people? Modi's hunger to harness AI's capability is great.





The tech bros might show more humility in Delhi – but will they make AI any safer?

BBC News

The tech bros might show more humility in Delhi - but will they make AI any safer? Those who shout the loudest about artificial intelligence tend to be in the West, notably the US and Europe. So it's significant that a gathering of powerful leaders is being held in the Global South, a region of the world that runs the risk of being left behind in the AI race. Tech bosses, politicians, scientists, academics and campaigners are meeting at the AI Impact Summit in India this week for top-level discussions about what the world should be doing to try to marshal the AI revolution in the right direction. At last year's AI Action Summit, as it was then known, an ugly power struggle broke out between some Western countries over who should be in charge.




No-regret Algorithms for Fair Resource Allocation

Neural Information Processing Systems

Suppose a revenue-maximizing recommendation algorithm concludes from past data that more revenue is generated by showing the ad to Group A compared to Group B. In that case, the ad-serving algorithm will eventually end up showing that ad exclusively to Group A


6d0f9c415e2d779c78f32b74668e9d02-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing Systems

Fact-checking is extensively studied in the context of misinformation and disinformation, addressing objective inaccuracies. However, a softer form of misinformation involves responses that are factually correct but lack certain features such as clarity and relevance. This challenge is prevalent in formal Question-Answer (QA) settings such as press conferences in finance, politics, sports, and other domains, where subjective answers can obscure transparency. Despite this, there is a lack of manually annotated datasets for subjective features across multiple dimensions. To address this gap, we introduce SubjECTive-QA, a human annotated dataset on Earnings Call Transcripts' (ECTs) QA sessions as the answers given by company representatives are often open to subjective interpretations and scrutiny. The dataset includes 49, 446 annotations for long-form QA pairs across six features: Assertive, Cautious, Optimistic, Specific, Clear, and Relevant . These features are carefully selected to encompass the key attributes that reflect the tone of the answers provided during QA sessions across different domains. Our findings are that the best-performing Pre-trained Language Model (PLM), RoBERTa-base, has similar weighted F1 scores to Llama-3-70b-Chat on features with lower subjectivity, such as Relevant and Clear, with a mean difference of 2 .