Goto

Collaborating Authors

 chattopadhyay


Identifying Heart Attack Risk in Vulnerable Population: A Machine Learning Approach

Chattopadhyay, Subhagata, Chattopadhyay, Amit K

arXiv.org Artificial Intelligence

The COVID-19 pandemic has significantly increased the incidence of post-infection cardiovascular events, particularly myocardial infarction, in individuals over 40. While the underlying mechanisms remain elusive, this study employs a hybrid machine learning approach to analyze epidemiological data in assessing 13 key heart attack risk factors and their susceptibility. Based on a unique dataset that combines demographic, biochemical, ECG, and thallium stress-tests, this study categorizes distinct subpopulations against varying risk profiles and then divides the population into 'at-risk' (AR) and 'not-at-risk' (NAR) groups using clustering algorithms. The study reveals strong association between the likelihood of experiencing a heart attack on the 13 risk factors studied. The aggravated risk for postmenopausal patients indicates compromised individual risk factors due to estrogen depletion that may be, further compromised by extraneous stress impacts, like anxiety and fear, aspects that have traditionally eluded data modeling predictions.


Evaluating LLMs and Pre-trained Models for Text Summarization Across Diverse Datasets

Rehman, Tohida, Ghosh, Soumabha, Das, Kuntal, Bhattacharjee, Souvik, Sanyal, Debarshi Kumar, Chattopadhyay, Samiran

arXiv.org Artificial Intelligence

Text summarization plays a crucial role in natural language processing by condensing large volumes of text into concise and coherent summaries. As digital content continues to grow rapidly and the demand for effective information retrieval increases, text summarization has become a focal point of research in recent years. This study offers a thorough evaluation of four leading pre-trained and open-source large language models: BART, FLAN-T5, LLaMA-3-8B, and Gemma-7B, across five diverse datasets CNN/DM, Gigaword, News Summary, XSum, and BBC News. The evaluation employs widely recognized automatic metrics, including ROUGE-1, ROUGE-2, ROUGE-L, BERTScore, and METEOR, to assess the models' capabilities in generating coherent and informative summaries. The results reveal the comparative strengths and limitations of these models in processing various text types.


Quality check of a sample partition using multinomial distribution

Modak, Soumita

arXiv.org Machine Learning

In this paper, we advocate a novel measure for the purpose of checking the quality of a cluster partition for a sample into several distinct classes, and thus, determine the unknown value for the true number of clusters prevailing the provided set of data. Our objective leads us to the development of an approach through applying the multinomial distribution to the distances of data members, clustered in a group, from their respective cluster representatives. This procedure is carried out independently for each of the clusters, and the concerned statistics are combined together to design our targeted measure. Individual clusters separately possess the category-wise probabilities which correspond to different positions of its members in the cluster with respect to a typical member, in the form of cluster-centroid, medoid or mode, referred to as the corresponding cluster representative. Our method is robust in the sense that it is distribution-free, since this is devised irrespective of the parent distribution of the underlying sample. It fulfills one of the rare coveted qualities, present in the existing cluster accuracy measures, of having the capability to investigate whether the assigned sample owns any inherent clusters other than a single group of all members or not. Our measure's simple concept, easy algorithm, fast runtime, good performance, and wide usefulness, demonstrated through extensive simulation and diverse case-studies, make it appealing.


A posteriori learning for quasi-geostrophic turbulence parametrization

Frezat, Hugo, Sommer, Julien Le, Fablet, Ronan, Balarac, Guillaume, Lguensat, Redouane

arXiv.org Artificial Intelligence

The use of machine learning to build subgrid parametrizations for climate models is receiving growing attention. State-of-the-art strategies address the problem as a supervised learning task and optimize algorithms that predict subgrid fluxes based on information from coarse resolution models. In practice, training data are generated from higher resolution numerical simulations transformed in order to mimic coarse resolution simulations. By essence, these strategies optimize subgrid parametrizations to meet so-called $\textit{a priori}$ criteria. But the actual purpose of a subgrid parametrization is to obtain good performance in terms of $\textit{a posteriori}$ metrics which imply computing entire model trajectories. In this paper, we focus on the representation of energy backscatter in two dimensional quasi-geostrophic turbulence and compare parametrizations obtained with different learning strategies at fixed computational complexity. We show that strategies based on $\textit{a priori}$ criteria yield parametrizations that tend to be unstable in direct simulations and describe how subgrid parametrizations can alternatively be trained end-to-end in order to meet $\textit{a posteriori}$ criteria. We illustrate that end-to-end learning strategies yield parametrizations that outperform known empirical and data-driven schemes in terms of performance, stability and ability to apply to different flow configurations. These results support the relevance of differentiable programming paradigms for climate models in the future.


A new nonparametric interpoint distance-based measure for assessment of clustering

Modak, Soumita

arXiv.org Artificial Intelligence

A new interpoint distance-based measure is proposed to identify the optimal number of clusters present in a data set. Designed in nonparametric approach, it is independent of the distribution of given data. Interpoint distances between the data members make our cluster validity index applicable to univariate and multivariate data measured on arbitrary scales, or having observations in any dimensional space where the number of study variables can be even larger than the sample size. Our proposed criterion is compatible with any clustering algorithm, and can be used to determine the unknown number of clusters or to assess the quality of the resulting clusters for a data set. Demonstration through synthetic and real-life data establishes its superiority over the well-known clustering accuracy measures of the literature.


Researchers use AI to predict crime, biased policing in major U.S. cities like L.A.

Los Angeles Times

For once, algorithms that predict crime might be used to uncover bias in policing, instead of reinforcing it. A group of social and data scientists developed a machine learning tool it hoped would better predict crime. The scientists say they succeeded, but their work also revealed inferior police protection in poorer neighborhoods in eight major U.S. cities, including Los Angeles. Instead of justifying more aggressive policing in those areas, however, the hope is the technology will lead to "changes in policy that result in more equitable, need-based resource allocation," including sending officials other than law enforcement to certain kinds of calls, according to a report published Thursday in the journal Nature Human Behavior. The tool, developed by a team led by University of Chicago professor Ishanu Chattopadhyay, forecasts crime by spotting patterns amid vast amounts of public data on property crimes and crimes of violence, learning from the data as it goes.


Researchers are using AI to predict crime, again

#artificialintelligence

Scientists are looking for a way to predict crime using, you guessed it, artificial intelligence. There are loads of studies that show using AI to predict crime results in consistently racist outcomes. For instance, one AI crime prediction model that the Chicago Police Department tried out in 2016 tried to get rid of its racist biases but had the opposite effect. It used a model to predict who might be most at risk of being involved in a shooting, but 56% of 20-29 year old Black men in the city appeared on the list. Despite it all, scientists are still trying to use the tool to find out when, and where, crime might occur.


AI Algorithm Predicts Future Crimes One Week in Advance With 90% Accuracy

#artificialintelligence

Our model enables discovery of these connections." The new model isolates crime by looking at the time and spatial coordinates of discrete events and detecting patterns to predict future events. It divides the city into spatial tiles roughly 1,000 feet across and predicts crime within these areas instead of relying on traditional neighborhood or political boundaries, which are also subject to bias. The model performed just as well with data from seven other U.S. cities: Atlanta, Austin, Detroit, Los Angeles, Philadelphia, Portland, and San Francisco. "We demonstrate the importance of discovering city-specific patterns for the prediction of reported crime, which generates a fresh view on neighborhoods in the city, allows us to ask novel questions, and lets us evaluate police action in new ways," Evans said. Chattopadhyay is careful to note that the tool's accuracy does not mean that it should be used to direct law enforcement, with police departments using it to swarm neighborhoods proactively to prevent crime. Instead, it should be added to a toolbox of urban policies and policing strategies to address crime. "We created a digital twin of urban environments.


AI predicts crime a week in advance with 90 per cent accuracy

New Scientist

An artificial intelligence can now predict the location and rate of crime across a city a week in advance with up to 90 per cent accuracy. Similar systems have been shown to perpetuate racist bias in policing, and the same could be true in this case, but the researchers who created this AI claim that it can also be used to expose those biases. Ishanu Chattopadhyay at the University of Chicago and his colleagues created an AI model that analysed historical crime data from Chicago, Illinois, from 2014 to the end of 2016, then predicted crime levels for the weeks that followed this training period. The model predicted the likelihood of certain crimes occurring across the city, which was divided into squares about 300 metres across, a week in advance with up to 90 per cent accuracy. It was also trained and tested on data for seven other major US cities, with a similar level of performance.


Chattopadhyay

AAAI Conferences

As AI continues to advance, human-AI teams are inevitable. However, progress in AI is routinely measured in isolation, without a human in the loop. It is crucial to benchmark progress in AI, not just in isolation, but also in terms of how it translates to helping humans perform certain tasks, i.e., the performance of human-AI teams. In this work, we design a cooperative game -- GuessWhich -- to measure human-AI team performance in the specific context of the AI being a visual conversational agent. GuessWhich involves live interaction between the human and the AI.