Overview
A Tutorial On Intersectionality in Fair Rankings
Criscuolo, Chiara, Martinenghi, Davide, Piccirillo, Giuseppe
We address the critical issue of biased algorithms and unfair rankings, which have permeated various sectors, including search engines, recommendation systems, and workforce management. These biases can lead to discriminatory outcomes in a data-driven world, especially against marginalized and underrepresented groups. Efforts towards responsible data science and responsible artificial intelligence aim to mitigate these biases and promote fairness, diversity, and transparency. However, most fairness-aware ranking methods singularly focus on protected attributes such as race, gender, or socio-economic status, neglecting the intersectionality of these attributes, i.e., the interplay between multiple social identities. Understanding intersectionality is crucial to ensure that existing inequalities are not preserved by fair rankings. We offer a description of the main ways to incorporate intersectionality in fair ranking systems through practical examples and provide a comparative overview of existing literature and a synoptic table summarizing the various methodologies. Our analysis highlights the need for intersectionality to attain fairness, while also emphasizing that fairness, alone, does not necessarily imply intersectionality.
Agentic AI Systems Applied to tasks in Financial Services: Modeling and model risk management crews
Okpala, Izunna, Golgoon, Ashkan, Kannan, Arjun Ravi
The advent of large language models has ushered in a new era of agentic systems, where artificial intelligence programs exhibit remarkable autonomous decision-making capabilities across diverse domains. This paper explores agentic system workflows in the financial services industry. In particular, we build agentic crews that can effectively collaborate to perform complex modeling and model risk management (MRM) tasks. The modeling crew consists of a manager and multiple agents who perform specific tasks such as exploratory data analysis, feature engineering, model selection, hyperparameter tuning, model training, model evaluation, and writing documentation. The MRM crew consists of a manager along with specialized agents who perform tasks such as checking compliance of modeling documentation, model replication, conceptual soundness, analysis of outcomes, and writing documentation. We demonstrate the effectiveness and robustness of modeling and MRM crews by presenting a series of numerical examples applied to credit card fraud detection, credit card approval, and portfolio credit risk modeling datasets.
A Comprehensive Review on Noise Control of Diffusion Model
Guo, Zhehao, Lang, Jiedong, Huang, Shuyu, Gao, Yunfei, Ding, Xintong
Diffusion models have recently emerged as powerful generative frameworks for producing high-quality images. A pivotal component of these models is the noise schedule, which governs the rate of noise injection during the diffusion process. Since the noise schedule substantially influences sampling quality and training quality, understanding its design and implications is crucial. In this discussion, various noise schedules are examined, and their distinguishing features and performance characteristics are highlighted.
The "negative end" of change in grammar: terminology, concepts and causes
The topic of "negative end" of change is, contrary to the fields of innovation and emergence, largely under-researched. Yet, it has lately started to gain an increasing attention from language scholars worldwide. The main focus of this article is threefold, namely to discuss the i) terminology; ii) concepts and iii) causes associated with the "negative end" of change in grammar. The article starts with an overview of research conducted on the topic. It then moves to situating phenomena referred to as loss, decline or obsolescence among processes of language change, before elaborating on the terminology and concepts behind it. The last part looks at possible causes for constructions to display a (gradual or rapid, but very consistent) decrease in the frequency of use over time, which continues until the construction disappears or there are only residual or fossilised forms left.
Principles and Components of Federated Learning Architectures
Saif, Sarwar, Nasim, MD Abdullah Al, Biswas, Parag, Rashid, Abdur, Haque, MD Mahim Anjum, Jahangir, Md. Zihad Bin
Federated learning, also known as FL, is a machine learning framework in which a significant amount of clients (such as mobile devices or whole enterprises) collaborate to collaboratively train a model while keeping decentralized training data, all overseen by a central server (such as a service provider). There are advantages in terms of privacy, security, regulations, and economy with this decentralized approach to model training. FL is not impervious to the flaws that plague conventional machine learning models, despite its seeming promise. This study offers a thorough analysis of the fundamental ideas and elements of federated learning architectures, emphasizing five important areas: communication architectures, machine learning models, data partitioning, privacy methods, and system heterogeneity. We additionally address the difficulties and potential paths for future study in the area. Furthermore, based on a comprehensive review of the literature, we present a collection of architectural patterns for federated learning systems. This analysis will help to understand the basic of Federated learning, the primary components of FL, and also about several architectural details.
Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation
Eger, Steffen, Cao, Yong, D'Souza, Jennifer, Geiger, Andreas, Greisinger, Christian, Gross, Stephanie, Hou, Yufang, Krenn, Brigitte, Lauscher, Anne, Li, Yizhi, Lin, Chenghua, Moosavi, Nafise Sadat, Zhao, Wei, Miller, Tristan
With the advent of large multimodal language models, science is now at a threshold of an AI-based technological transformation. Recently, a plethora of new AI models and tools has been proposed, promising to empower researchers and academics worldwide to conduct their research more effectively and efficiently. This includes all aspects of the research cycle, especially (1) searching for relevant literature; (2) generating research ideas and conducting experimentation; generating (3) text-based and (4) multimodal content (e.g., scientific figures and diagrams); and (5) AI-based automatic peer review. In this survey, we provide an in-depth overview over these exciting recent developments, which promise to fundamentally alter the scientific research process for good. Our survey covers the five aspects outlined above, indicating relevant datasets, methods and results (including evaluation) as well as limitations and scope for future research. Ethical concerns regarding shortcomings of these tools and potential for misuse (fake science, plagiarism, harms to research integrity) take a particularly prominent place in our discussion. We hope that our survey will not only become a reference guide for newcomers to the field but also a catalyst for new AI-based initiatives in the area of "AI4Science".
Global Ease of Living Index: a machine learning framework for longitudinal analysis of major economies
Panat, Tanay, Chandra, Rohitash
The drastic changes in the global economy, geopolitical conditions, and disruptions such as the COVID-19 pandemic have impacted the cost of living and quality of life. It is important to understand the long-term nature of the cost of living and quality of life in major economies. A transparent and comprehensive living index must include multiple dimensions of living conditions. In this study, we present an approach to quantifying the quality of life through the Global Ease of Living Index that combines various socio-economic and infrastructural factors into a single composite score. Our index utilises economic indicators that define living standards, which could help in targeted interventions to improve specific areas. We present a machine learning framework for addressing the problem of missing data for some of the economic indicators for specific countries. We then curate and update the data and use a dimensionality reduction approach (principal component analysis) to create the Ease of Living Index for major economies since 1970. Our work significantly adds to the literature by offering a practical tool for policymakers to identify areas needing improvement, such as healthcare systems, employment opportunities, and public safety. Our approach with open data and code can be easily reproduced and applied to various contexts. This transparency and accessibility make our work a valuable resource for ongoing research and policy development in quality-of-life assessment.
Conformal Prediction for Electricity Price Forecasting in the Day-Ahead and Real-Time Balancing Market
O'Connor, Ciaran, Bahloul, Mohamed, Rossi, Roberto, Prestwich, Steven, Visentin, Andrea
The integration of renewable energy into electricity markets poses significant challenges to price stability and increases the complexity of market operations. Accurate and reliable electricity price forecasting is crucial for effective market participation, where price dynamics can be significantly more challenging to predict. Probabilistic forecasting, through prediction intervals, efficiently quantifies the inherent uncertainties in electricity prices, supporting better decision-making for market participants. This study explores the enhancement of probabilistic price prediction using Conformal Prediction (CP) techniques, specifically Ensemble Batch Prediction Intervals and Sequential Predictive Conformal Inference. These methods provide precise and reliable prediction intervals, outperforming traditional models in validity metrics. We propose an ensemble approach that combines the efficiency of quantile regression models with the robust coverage properties of time series adapted CP techniques. This ensemble delivers both narrow prediction intervals and high coverage, leading to more reliable and accurate forecasts. We further evaluate the practical implications of CP techniques through a simulated trading algorithm applied to a battery storage system. The ensemble approach demonstrates improved financial returns in energy trading in both the Day-Ahead and Balancing Markets, highlighting its practical benefits for market participants.
Enhancing Phishing Email Identification with Large Language Models
Phishing has long been a common tactic used by cybercriminals and continues to pose a significant threat in today's digital world. When phishing attacks become more advanced and sophisticated, there is an increasing need for effective methods to detect and prevent them. To address the challenging problem of detecting phishing emails, researchers have developed numerous solutions, in particular those based on machine learning (ML) algorithms. In this work, we take steps to study the efficacy of large language models (LLMs) in detecting phishing emails. The experiments show that the LLM achieves a high accuracy rate at high precision; importantly, it also provides interpretable evidence for the decisions.
Indigenous Languages Spoken in Argentina: A Survey of NLP and Speech Resources
Ticona, Belu, Carranza, Fernando, Cotik, Viviana
Argentina has a large yet little-known Indigenous linguistic diversity, encompassing at least 40 different languages. The majority of these languages are at risk of disappearing, resulting in a significant loss of world heritage and cultural knowledge. Currently, unified information on speakers and computational tools is lacking for these languages. In this work, we present a systematization of the Indigenous languages spoken in Argentina, classifying them into seven language families: Mapuche, Tup\'i-Guaran\'i, Guaycur\'u, Quechua, Mataco-Mataguaya, Aymara, and Chon. For each one, we present an estimation of the national Indigenous population size, based on the most recent Argentinian census. We discuss potential reasons why the census questionnaire design may underestimate the actual number of speakers. We also provide a concise survey of computational resources available for these languages, whether or not they were specifically developed for Argentinian varieties.