Goto

Collaborating Authors

 mishra


AI and analytics in sports: Leveraging BERTopic to map the past and chart the future

Mishra, Manit

arXiv.org Artificial Intelligence

Purpose: The purpose of this study is to map the body of scholarly literature at the intersection of artificial intelligence (AI), analytics and sports and thereafter, leverage the insights generated to chart guideposts for future research. Design/methodology/approach: The study carries out systematic literature review (SLR). Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) protocol is leveraged to identify 204 journal articles pertaining to utilization of AI and analytics in sports published during 2002 to 2024. We follow it up with extraction of the latent topics from sampled articles by leveraging the topic modelling technique of BERTopic. Findings: The study identifies the following as predominant areas of extant research on usage of AI and analytics in sports: performance modelling, physical and mental health, social media sentiment analysis, and tactical tracking. Each extracted topic is further examined in terms of its relative prominence, representative studies, and key term associations. Drawing on these insights, the study delineates promising avenues for future inquiry. Research limitations/implications: The study offers insights to academicians and sports administrators on transformational impact of AI and analytics in sports. Originality/value: The study introduces BERTopic as a novel approach for extracting latent structures in sports research, thereby advancing both scholarly understanding and the methodological toolkit of the field.


Fighter pilots take directions from AI in Pentagon's groundbreaking test

FOX News

The Pentagon conducted its first successful tests of Army and Navy fighter jets tactically controlled by AI through Raft's'Starsage' this month. FIRST ON FOX: For the first time, U.S. fighter pilots took direction from an AI "air battle manager" in a Pentagon test that could change how wars are fought in the skies. The Air Force and Navy ran the August test using Raft AI's Starsage tactical control system on F-16s, F/A-18s and F-35s during a joint military exercise designed to evaluate new weapons systems, advanced communications and battle management platforms, Fox News Digital has learned. In a typical combat mission, fighter pilots communicate with human air battle managers on the ground. These managers monitor radar, sensor feeds and intelligence to direct pilots on where to fly and how to position their aircraft.


A Start To End Machine Learning Approach To Maximize Scientific Throughput From The LCLS-II-HE

Mishra, Aashwin, Seaberg, Matt, Roussel, Ryan, Poitevin, Fred, Thayer, Jana, Ratner, Daniel, Edelen, Auralee, Mehta, Apurva

arXiv.org Artificial Intelligence

With the increasing brightness of Light sources, including the Diffraction-Limited brightness upgrade of APS and the high-repetition-rate upgrade of LCLS, the proposed experiments therein are becoming increasingly complex. For instance, experiments at LCLS-II-HE will require the X-ray beam to be within a fraction of a micron in diameter, with pointing stability of a few nanoradians, at the end of a kilometer-long electron accelerator, a hundred-meter-long undulator section, and tens of meters long X-ray optics. This enhancement of brightness will increase the data production rate to rival the largest data generators in the world. Without real-time active feedback control and an optimized pipeline to transform measurements to scientific information and insights, researchers will drown in a deluge of mostly useless data, and fail to extract the highly sophisticated insights that the recent brightness upgrades promise. In this article, we outline the strategy we are developing at SLAC to implement Machine Learning driven optimization, automation and real-time knowledge extraction from the electron-injector at the start of the electron accelerator, to the multidimensional X-ray optical systems, and till the experimental endstations and the high readout rate, multi-megapixel detectors at LCLS to deliver the design performance to the users. This is illustrated via examples from Accelerator, Optics and End User applications.


Perceived Fairness of the Machine Learning Development Process: Concept Scale Development

Mishra, Anoop, Khazanchi, Deepak

arXiv.org Artificial Intelligence

In machine learning (ML) applications, unfairness is triggered due to bias in the data, the data curation process, erroneous assumptions, and implicit bias rendered during the development process. It is also well-accepted by researchers that fairness in ML application development is highly subjective, with a lack of clarity of what it means from an ML development and implementation perspective. Thus, in this research, we investigate and formalize the notion of the perceived fairness of ML development from a sociotechnical lens. Our goal in this research is to understand the characteristics of perceived fairness in ML applications. We address this research goal using a three-pronged strategy: 1) conducting virtual focus groups with ML developers, 2) reviewing existing literature on fairness in ML, and 3) incorporating aspects of justice theory relating to procedural and distributive justice. Based on our theoretical exposition, we propose operational attributes of perceived fairness to be transparency, accountability, and representativeness. These are described in terms of multiple concepts that comprise each dimension of perceived fairness. We use this operationalization to empirically validate the notion of perceived fairness of machine learning (ML) applications from both the ML practioners and users perspectives. The multidimensional framework for perceived fairness offers a comprehensive understanding of perceived fairness, which can guide the creation of fair ML systems with positive implications for society and businesses.


NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark

Sainz, Oscar, Campos, Jon Ander, García-Ferrero, Iker, Etxaniz, Julen, de Lacalle, Oier Lopez, Agirre, Eneko

arXiv.org Artificial Intelligence

In this position paper, we argue that the classical evaluation on Natural Language Processing (NLP) tasks using annotated benchmarks is in trouble. The worst kind of data contamination happens when a Large Language Model (LLM) is trained on the test split of a benchmark, and then evaluated in the same benchmark. The extent of the problem is unknown, as it is not straightforward to measure. Contamination causes an overestimation of the performance of a contaminated model in a target benchmark and associated task with respect to their non-contaminated counterparts. The consequences can be very harmful, with wrong scientific conclusions being published while other correct ones are discarded. This position paper defines different levels of data contamination and argues for a community effort, including the development of automatic and semi-automatic measures to detect when data from a benchmark was exposed to a model, and suggestions for flagging papers with conclusions that are compromised by data contamination.


Beurling-Selberg Extremization for Dual-Blind Deconvolution Recovery in Joint Radar-Communications

Monsalve, Jonathan, Vargas, Edwin, Mishra, Kumar Vijay, Sadler, Brian M., Arguello, Henry

arXiv.org Machine Learning

Recent interest in integrated sensing and communications has led to the design of novel signal processing techniques to recover information from an overlaid radar-communications signal. Here, we focus on a spectral coexistence scenario, wherein the channels and transmit signals of both radar and communications systems are unknown to the common receiver. In this dual-blind deconvolution (DBD) problem, the receiver admits a multi-carrier wireless communications signal that is overlaid with the radar signal reflected off multiple targets. The communications and radar channels are represented by continuous-valued range-times or delays corresponding to multiple transmission paths and targets, respectively. Prior works addressed recovery of unknown channels and signals in this ill-posed DBD problem through atomic norm minimization but contingent on individual minimum separation conditions for radar and communications channels. In this paper, we provide an optimal joint separation condition using extremal functions from the Beurling-Selberg interpolation theory. Thereafter, we formulate DBD as a low-rank modified Hankel matrix retrieval and solve it via nuclear norm minimization. We estimate the unknown target and communications parameters from the recovered low-rank matrix using multiple signal classification (MUSIC) method. We show that the joint separation condition also guarantees that the underlying Vandermonde matrix for MUSIC is well-conditioned. Numerical experiments validate our theoretical findings.


nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources

Nawrot, Piotr

arXiv.org Artificial Intelligence

State-of-the-art language models like T5 have revolutionized the NLP landscape, but their computational demands hinder a large portion of the research community. To address this challenge, we present nanoT5, a specially-optimized PyTorch framework for efficient pre-training and fine-tuning of T5 models. Drawing on insights from optimizer differences and prioritizing efficiency, nanoT5 allows a T5-Base model to be pre-trained on a single GPU in just 16 hours, without any loss in performance. With the introduction of this open-source framework, we hope to widen the accessibility to language modelling research and cater to the community's demand for more user-friendly T5 (Encoder-Decoder) implementations. We make our contributions, including configurations, codebase, pre-training insights, and pre-trained models, available to the public.


Physics-constrained Random Forests for Turbulence Model Uncertainty Estimation

Matha, Marcel, Morsbach, Christian

arXiv.org Artificial Intelligence

Data-driven approaches, aided by high-fidelity simulations and Machine Learning (ML), are gaining popularity To achieve virtual certification for industrial design, in RANS turbulence modeling (Heyse et al., 2021; quantifying the uncertainties in simulationdriven Matha & Kucharczyk, 2022). The present study, which processes is crucial. We discuss a physicsconstrained was the foundation of the CFD application results in our approach to account for epistemic previous paper (Matha et al., 2023), focuses on the use of uncertainty of turbulence models. In order to data-driven methods in order to identify flow regions with eliminate user input, we incorporate a data-driven potential turbulence model prediction inaccuracies.


Artificial intelligence and HR: How companies use AI products for hiring? - The Hindu BusinessLine

#artificialintelligence

Artificial intelligence (AI), machine learning (ML), and natural language processing (NLP) technologies have seeped into the operations of several top companies, and have enabled them to launch advanced products and services that drive their growth. Talent acquisition (TA) and HR companies too, have built their identities around these technologies, catering to companies and individuals through upskilling, hiring, exam proctoring and bias-free interviews. Gartner's Hype Cycle for Human Capital Management 2022 research note found that "AI drives automation of the recruitment process and provides decision-making support to TA professionals, hiring managers and candidates, during talent sourcing, screening, marketing, interview scheduling and onboarding." Other solutions such as chatbots and virtual assistants are also available, but according to Gartner's inquiry, AI-enabled sourcing and screening currently represents the most concentrated demand. InstaHyre, an AI-based hiring platform, is an example of how AI is used to screen and match candidates to suitable companies.


Multi-Antenna Dual-Blind Deconvolution for Joint Radar-Communications via SoMAN Minimization

Jacome, Roman, Vargas, Edwin, Mishra, Kumar Vijay, Sadler, Brian M., Arguello, Henry

arXiv.org Machine Learning

Joint radar-communications (JRC) has emerged as a promising technology for efficiently using the limited electromagnetic spectrum. In JRC applications such as secure military receivers, often the radar and communications signals are overlaid in the received signal. In these passive listening outposts, the signals and channels of both radar and communications are unknown to the receiver. The ill-posed problem of recovering all signal and channel parameters from the overlaid signal is terms as dual-blind deconvolution (DBD). In this work, we investigate a more challenging version of DBD with a multi-antenna receiver. We model the radar and communications channels with a few (sparse) continuous-valued parameters such as time delays, Doppler velocities, and directions-of-arrival (DoAs). To solve this highly ill-posed DBD, we propose to minimize the sum of multivariate atomic norms (SoMAN) that depends on the unknown parameters. To this end, we devise an exact semidefinite program using theories of positive hyperoctant trigonometric polynomials (PhTP). Our theoretical analyses show that the minimum number of samples and antennas required for perfect recovery is logarithmically dependent on the maximum of the number of radar targets and communications paths rather than their sum. We show that our approach is easily generalized to include several practical issues such as gain/phase errors and additive noise. Numerical experiments show the exact parameter recovery for different JRC