South America
Inequalities for Optimization of Classification Algorithms: A Perspective Motivated by Diagnostic Testing
Patrone, Paul N., Kearsley, Anthony J.
Motivated by canonical problems in medical diagnostics, we propose and study properties of an objective function that uniformly bounds uncertainties in quantities of interest extracted from classifiers and related data analysis tools. We begin by adopting a set-theoretic perspective to show how two main tasks in diagnostics -- classification and prevalence estimation -- can be recast in terms of a variation on the confusion (or error) matrix ${\boldsymbol {\rm P}}$ typically considered in supervised learning. We then combine arguments from conditional probability with the Gershgorin circle theorem to demonstrate that the largest Gershgorin radius $\boldsymbol ฯ_m$ of the matrix $\mathbb I-\boldsymbol {\rm P}$ (where $\mathbb I$ is the identity) yields uniform error bounds for both classification and prevalence estimation. In a two-class setting, $\boldsymbol ฯ_m$ is minimized via a measure-theoretic ``water-leveling'' argument that optimizes an appropriately defined partition $U$ generating the matrix ${\boldsymbol {\rm P}}$. We also consider an example that illustrates the difficulty of generalizing the binary solution to a multi-class setting and deduce relevant properties of the confusion matrix.
Multi-Community Spectral Clustering for Geometric Graphs
Allem, Luiz Emilio, Avrachenkov, Konstantin, Hoppen, Carlos, Manjunath, Hariprasad, Sibemberg, Lucas Siviero
In this paper, we consider the soft geometric block model (SGBM) with a fixed number $k \geq 2$ of homogeneous communities in the dense regime, and we introduce a spectral clustering algorithm for community recovery on graphs generated by this model. Given such a graph, the algorithm produces an embedding into $\mathbb{R}^{k-1}$ using the eigenvectors associated with the $k-1$ eigenvalues of the adjacency matrix of the graph that are closest to a value determined by the parameters of the model. It then applies $k$-means clustering to the embedding. We prove weak consistency and show that a simple local refinement step ensures strong consistency. A key ingredient is an application of a non-standard version of Davis-Kahan theorem to control eigenspace perturbations when eigenvalues are not simple. We also analyze the limiting spectrum of the adjacency matrix, using a combination of combinatorial and matrix techniques.
Automatic Identification of Machine Learning-Specific Code Smells
Hamfelt, Peter, Britto, Ricardo, Rocha, Lincoln, Almendra, Camilo
Machine learning (ML) has rapidly grown in popularity, becoming vital to many industries. Currently, the research on code smells in ML applications lacks tools and studies that address the identification and validity of ML-specific code smells. This work investigates suitable methods and tools to design and develop a static code analysis tool (MLpylint) based on code smell criteria. This research employed the Design Science Methodology. In the problem identification phase, a literature review was conducted to identify ML-specific code smells. In solution design, a secondary literature review and consultations with experts were performed to select methods and tools for implementing the tool. We evaluated the tool on data from 160 open-source ML applications sourced from GitHub. We also conducted a static validation through an expert survey involving 15 ML professionals. The results indicate the effectiveness and usefulness of the MLpylint. We aim to extend our current approach by investigating ways to introduce MLpylint seamlessly into development workflows, fostering a more productive and innovative developer environment.
Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe
Feng, Tiantian, Huang, Kevin, Xu, Anfeng, Shi, Xuan, Lertpetchpun, Thanathai, Lee, Jihwan, Lee, Yoonjeong, Byrd, Dani, Narayanan, Shrikanth
Specifically, we report comprehensive benchmark evaluations on dialects and regional language varieties in English, Arabic, Mandarin and Cantonese, Tibetan, Indic languages, Thai, Spanish, French, German, Brazilian Portuguese, and Italian. Our study used over 2 million training utterances from 30 publicly available speech corpora that are provided with dialectal information. We evaluate the performance of several widely used speech foundation models in classifying speech dialects. We assess the robustness of the dialectal models under noisy conditions and present an error analysis that highlights modeling results aligned with geographic continuity. In addition to benchmarking dialect classification, we demonstrate several downstream applications enabled by Voxlect . Specifically, we show that Voxlect can be applied to augment existing speech recognition datasets with dialect information, enabling a more detailed analysis of ASR performance across dialectal variations. Voxlect is also used as a tool to evaluate the performance of speech generation systems.
Algorithmic Detection of Rank Reversals, Transitivity Violations, and Decomposition Inconsistencies in Multi-Criteria Decision Analysis
Borda, Agustรญn, Cabral, Juan Bautista, Giarda, Gonzalo, Irusta, Diego Nicolรกs Gimenez, Pacheco, Paula, Schachner, Alvaro Roy
Our work focuses on providing a mechanism capable of measuring the performance of a MCDM on a given set of alternatives, with the collateral goal of building a global ranking of the e ffectiveness of di fferent MCDMs. We have implemented these tests within the open-source Scikit-Criteria library, leveraging its RankResult and RanksComparator data structures as fundamental building blocks for comparative ranking analysis. RRT1 systematically evaluates the stability of the optimal alternative when suboptimal alternatives are degraded, employing a controlled mutation strategy and providing comprehensive documentation of the experimental context. This approach provides decision analysts with the following: 1. Quantitative stability assessment: Precise measures of how often methods exhibit rank reversal 2. Sensitivity mapping: Identification of which alternatives and criteria are most prone to instability 3. Method comparison: Objective basis for comparing the robustness of di fferent MCDA approaches 4. Confidence intervals: Statistical bounds on decision reliability through repeated experimentation The algorithm addresses the complications that arise from preprocessing pipelines that can eliminate alternatives, ensuring "graceful degradation" by assigning appropriate worst ranks to maintain completeness.
Hollywood turns to AI tools to rewire movie magic
Fox News anchor and executive editor Bret Baier has the latest on fears over the'darker side' of artificial intelligence on'Special Report.' Generative Artificial Intelligence can create lifelike imaging and audio, which is likely why an increasing number of film studios are incorporating A.I. into special effects. It comes just two years after Hollywood's largest union went on strike, in part over the impact A.I. would bring. "Popular culture movies like The Terminator have created a very dark dystopian version of what this could look like," White House A.I. and Crypto Czar David Sacks said. "The version of the future of A.I. that I think is probably most accurate if you want to pop cultural references is Star Trek Enterprise. Think about the ship computer in that. It can perform tasks for you. But it doesn't have a will of its own, it doesn't' have a mind of its' own. It's there to help the crew, and it needs to be supervised by humans."
Cultural Bias in Large Language Models: Evaluating AI Agents through Moral Questionnaires
Are AI systems truly representing human values, or merely averaging across them? Our study suggests a concerning reality: Large Language Models (LLMs) fail to represent diverse cultural moral frameworks despite their linguistic capabilities. We expose significant gaps between AI-generated and human moral intuitions by applying the Moral Foundations Questionnaire across 19 cultural contexts. Comparing multiple state-of-the-art LLMs' origins against human baseline data, we find these models systematically homogenize moral diversity. Surprisingly, increased model size doesn't consistently improve cultural representation fidelity. Our findings challenge the growing use of LLMs as synthetic populations in social science research and highlight a fundamental limitation in current AI alignment approaches. Without data-driven alignment beyond prompting, these systems cannot capture the nuanced, culturally-specific moral intuitions. Our results call for more grounded alignment objectives and evaluation metrics to ensure AI systems represent diverse human values rather than flattening the moral landscape.
A record-breaking lightning bolt just 'shocked' meteorologists
Breakthroughs, discoveries, and DIY tips sent every weekday. In October 2017, a single flash of lightning during a thunderstorm streaked across the Great Plains for 515 miles. The flash traveled from eastern Texas all the way to Kansas City--and now into the record books. The World Meteorological Organization (WMO) certified that this megaflash is now the longest single lightning flash in the United States. The massive lightning bolt is detailed in a study published July 31 in the Bulletin of the American Meteorological Society.
Will AI Take My Job? Evolving Perceptions of Automation and Labor Risk in Latin America
Cremaschi, Andrea, Lee, Dae-Jin, Leonelli, Manuele
As artificial intelligence and robotics increasingly reshape the global labor market, understanding public perceptions of these technologies becomes critical. We examine how these perceptions have evolved across Latin America, using survey data from the 2017, 2018, 2020, and 2023 waves of the Lati-nobar ometro. Drawing on responses from over 48,000 individuals across 16 countries, we analyze fear of job loss due to artificial intelligence and robotics. Using statistical modeling and latent class analysis, we identify key structural and ideological predictors of concern, with education level and political orientation emerging as the most consistent drivers. Our findings reveal substantial temporal and cross-country variation, with a notable peak in fear during 2018 and distinct attitudinal profiles emerging from latent segmentation. These results offer new insights into the social and structural dimensions of AI anxiety in emerging economies and contribute to a broader understanding of public attitudes toward automation beyond the Global North.
Prompt-Reverse Inconsistency: LLM Self-Inconsistency Beyond Generative Randomness and Prompt Paraphrasing
Ahn, Jihyun Janice, Yin, Wenpeng
While the inconsistency of LLMs is not a novel topic, prior research has predominantly addressed two types of generative inconsistencies: i) Randomness Inconsistency: running the same LLM multiple trials, yielding varying responses; ii) Paraphrase Inconsistency: paraphrased prompts result in different responses from the same LLM. Randomness Inconsistency arises from the inherent randomness due to stochastic sampling in generative models, while Paraphrase Inconsistency is a consequence of the language modeling objectives, where paraphrased prompts alter the distribution of vocabulary logits. This research discovers Prompt-Reverse Inconsistency (PRIN), a new form of LLM self-inconsistency: given a question and a couple of LLM-generated answer candidates, the LLM often has conflicting responses when prompted "Which are correct answers?" and "Which are incorrect answers?". PRIN poses a big concern as it undermines the credibility of LLM-as-a-judge, and suggests a challenge for LLMs to adhere to basic logical rules. We conduct a series of experiments to investigate PRIN, examining the extent of PRIN across different LLMs, methods to mitigate it, potential applications, and its relationship with Randomness Inconsistency and Paraphrase Inconsistency. As the first study to explore PRIN, our findings offer valuable insights into the inner workings of LLMs and contribute to advancing trustworthy AI.