AITopics | Ghosh, Subhankar

Plotting

Ghosh, Subhankar

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Building Machine Learning Challenges for Anomaly Detection in Science

Campolongo, Elizabeth G., Chou, Yuan-Tang, Govorkova, Ekaterina, Bhimji, Wahid, Chao, Wei-Lun, Harris, Chris, Hsu, Shih-Chieh, Lapp, Hilmar, Neubauer, Mark S., Namayanja, Josephine, Subramanian, Aneesh, Harris, Philip, Anand, Advaith, Carlyn, David E., Ghosh, Subhankar, Lawrence, Christopher, Moreno, Eric, Raikman, Ryan, Wu, Jiaman, Zhang, Ziheng, Adhi, Bayu, Gharehtoragh, Mohammad Ahmadi, Monsalve, Saúl Alonso, Babicz, Marta, Baig, Furqan, Banerji, Namrata, Bardon, William, Barna, Tyler, Berger-Wolf, Tanya, Dieng, Adji Bousso, Brachman, Micah, Buat, Quentin, Hui, David C. Y., Cao, Phuong, Cerino, Franco, Chang, Yi-Chun, Chaulagain, Shivaji, Chen, An-Kai, Chen, Deming, Chen, Eric, Chou, Chia-Jui, Ciou, Zih-Chen, Cochran-Branson, Miles, Choi, Artur Cordeiro Oudot, Coughlin, Michael, Cremonesi, Matteo, Dadarlat, Maria, Darch, Peter, Desai, Malina, Diaz, Daniel, Dillmann, Steven, Duarte, Javier, Duporge, Isla, Ekka, Urbas, Heravi, Saba Entezari, Fang, Hao, Flynn, Rian, Fox, Geoffrey, Freed, Emily, Gao, Hang, Gao, Jing, Gonski, Julia, Graham, Matthew, Hashemi, Abolfazl, Hauck, Scott, Hazelden, James, Peterson, Joshua Henry, Hoang, Duc, Hu, Wei, Huennefeld, Mirco, Hyde, David, Janeja, Vandana, Jaroenchai, Nattapon, Jia, Haoyi, Kang, Yunfan, Kholiavchenko, Maksim, Khoda, Elham E., Kim, Sangin, Kumar, Aditya, Lai, Bo-Cheng, Le, Trung, Lee, Chi-Wei, Lee, JangHyeon, Lee, Shaocheng, van der Lee, Suzan, Lewis, Charles, Li, Haitong, Li, Haoyang, Liao, Henry, Liu, Mia, Liu, Xiaolin, Liu, Xiulong, Loncar, Vladimir, Lyu, Fangzheng, Makarov, Ilya, Mao, Abhishikth Mallampalli Chen-Yu, Michels, Alexander, Migala, Alexander, Mokhtar, Farouk, Morlighem, Mathieu, Namgung, Min, Novak, Andrzej, Novick, Andrew, Orsborn, Amy, Padmanabhan, Anand, Pan, Jia-Cheng, Pandya, Sneh, Pei, Zhiyuan, Peixoto, Ana, Percivall, George, Leung, Alex Po, Purushotham, Sanjay, Que, Zhiqiang, Quinnan, Melissa, Ranjan, Arghya, Rankin, Dylan, Reissel, Christina, Riedel, Benedikt, Rubenstein, Dan, Sasli, Argyro, Shlizerman, Eli, Singh, Arushi, Singh, Kim, Sokol, Eric R., Sorensen, Arturo, Su, Yu, Taheri, Mitra, Thakkar, Vaibhav, Thomas, Ann Mariam, Toberer, Eric, Tsai, Chenghan, Vandewalle, Rebecca, Verma, Arjun, Venterea, Ricco C., Wang, He, Wang, Jianwu, Wang, Sam, Wang, Shaowen, Watts, Gordon, Weitz, Jason, Wildridge, Andrew, Williams, Rebecca, Wolf, Scott, Xu, Yue, Yan, Jianqi, Yu, Jai, Zhang, Yulei, Zhao, Haoran, Zhao, Ying, Zhong, Yibo

arXiv.org Artificial IntelligenceMar-3-2025

Scientific discoveries are often made by finding a pattern or object that was not predicted by the known rules of science. Oftentimes, these anomalous events or objects that do not conform to the norms are an indication that the rules of science governing the data are incomplete, and something new needs to be present to explain these unexpected outliers. The challenge of finding anomalies can be confounding since it requires codifying a complete knowledge of the known scientific behaviors and then projecting these known behaviors on the data to look for deviations. When utilizing machine learning, this presents a particular challenge since we require that the model not only understands scientific data perfectly but also recognizes when the data is inconsistent and out of the scope of its trained behavior. In this paper, we present three datasets aimed at developing machine learning-based anomaly detection for disparate scientific domains covering astrophysics, genomics, and polar science. We present the different datasets along with a scheme to make machine learning challenges around the three datasets findable, accessible, interoperable, and reusable (FAIR). Furthermore, we present an approach that generalizes to future machine learning challenges, enabling the possibility of large, more compute-intensive challenges that can ultimately lead to scientific discovery.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.02112

Country:

North America > United States > California (1.00)
Europe (1.00)
Asia (1.00)
(5 more...)

Genre: Research Report (0.40)

Industry:

Energy (0.68)
Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Spatial Distribution-Shift Aware Knowledge-Guided Machine Learning

Sharma, Arun, Farhadloo, Majid, Yang, Mingzhou, Zeng, Ruolei, Ghosh, Subhankar, Shekhar, Shashi

arXiv.org Artificial IntelligenceFeb-20-2025

Given inputs of diverse soil characteristics, and climate data gathered from various regions, we aimed to build a model to predict accurate land emissions. The problem is important since accurate quantification of the carbon cycle in agroecosystems is crucial for mitigating climate change and ensuring sustainable food production. Predicting accurate land emissions is challenging due to since calibrating heterogeneous nature of soil properties, moisture, and environmental conditions is hard at decision-relevant scales. Traditional approaches do not adequately estimate land emissions due to location-independent parameters failing to leverage the spatial heterogeneity and also require large datasets. To overcome these limitations, we proposed Spatial Distribution-Shift A ware Knowledge-Guided Machine Learning (SDSA-KGML) which leverage location-dependent parameters which accounts significant spatial heterogeneity in soil moisture from multiple sites within the same region. Experimental results demonstrate that SDSA-KGML models achieve higher local accuracy for the specified states in the Midwest Region.

artificial intelligence, land emission, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.1484

Country:

North America > United States (1.00)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.49)

Industry:

Food & Agriculture > Agriculture (1.00)
Government > Regional Government > North America Government > United States Government (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

d-Sketch: Improving Visual Fidelity of Sketch-to-Image Translation with Pretrained Latent Diffusion Models without Retraining

Roy, Prasun, Bhattacharya, Saumik, Ghosh, Subhankar, Pal, Umapada, Blumenstein, Michael

arXiv.org Artificial IntelligenceFeb-19-2025

Structural guidance in an image-to-image translation allows intricate control over the shapes of synthesized images. Generating high-quality realistic images from user-specified rough hand-drawn sketches is one such task that aims to impose a structural constraint on the conditional generation process. While the premise is intriguing for numerous use cases of content creation and academic research, the problem becomes fundamentally challenging due to substantial ambiguities in freehand sketches. Furthermore, balancing the trade-off between shape consistency and realistic generation contributes to additional complexity in the process. Existing approaches based on Generative Adversarial Networks (GANs) generally utilize conditional GANs or GAN inversions, often requiring application-specific data and optimization objectives. The recent introduction of Denoising Diffusion Probabilistic Models (DDPMs) achieves a generational leap for low-level visual attributes in general image synthesis. However, directly retraining a large-scale diffusion model on a domain-specific subtask is often extremely difficult due to demanding computation costs and insufficient data. In this paper, we introduce a technique for sketch-to-image translation by exploiting the feature generalization capabilities of a large-scale diffusion model without retraining. In particular, we use a learnable lightweight mapping network to achieve latent feature translation from source to target domain. Experimental results demonstrate that the proposed method outperforms the existing techniques in qualitative and quantitative benchmarks, allowing high-resolution realistic image synthesis from rough hand-drawn sketches.

artificial intelligence, conference, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2502.14007

Country: Asia > India > West Bengal (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance

Hussain, Shehzeen, Neekhara, Paarth, Yang, Xuesong, Casanova, Edresson, Ghosh, Subhankar, Desta, Mikyas T., Fejgin, Roy, Valle, Rafael, Li, Jason

arXiv.org Artificial IntelligenceFeb-7-2025

While autoregressive speech token generation models produce speech with remarkable variety and naturalness, their inherent lack of controllability often results in issues such as hallucinations and undesired vocalizations that do not conform to conditioning inputs. We introduce Koel-TTS, a suite of enhanced encoder-decoder Transformer TTS models that address these challenges by incorporating preference alignment techniques guided by automatic speech recognition and speaker verification models. Additionally, we incorporate classifier-free guidance to further improve synthesis adherence to the transcript and reference speaker audio. Our experiments demonstrate that these optimizations significantly enhance target speaker similarity, intelligibility, and naturalness of synthesized speech. Notably, Koel-TTS directly maps text and context audio to acoustic tokens, and on the aforementioned metrics, outperforms state-of-the-art TTS models, despite being trained on a significantly smaller dataset. Audio samples and demos are available on our website.

artificial intelligence, speaker similarity, speech recognition, (15 more...)

arXiv.org Artificial Intelligence

2502.05236

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer

Bataev, Vladimir, Ghosh, Subhankar, Lavrukhin, Vitaly, Li, Jason

arXiv.org Artificial IntelligenceJan-10-2025

This work introduces TTS-Transducer - a novel architecture for text-to-speech, leveraging the strengths of audio codec models and neural transducers. Transducers, renowned for their superior quality and robustness in speech recognition, are employed to learn monotonic alignments and allow for avoiding using explicit duration predictors. Neural audio codecs efficiently compress audio into discrete codes, revealing the possibility of applying text modeling approaches to speech generation. However, the complexity of predicting multiple tokens per frame from several codebooks, as necessitated by audio codec models with residual quantizers, poses a significant challenge. The proposed system first uses a transducer architecture to learn monotonic alignments between tokenized text and speech codec tokens for the first codebook. Next, a non-autoregressive Transformer predicts the remaining codes using the alignment extracted from transducer loss. The proposed system is trained end-to-end. We show that TTS-Transducer is a competitive and robust alternative to contemporary TTS systems.

alignment, artificial intelligence, speech recognition, (16 more...)

arXiv.org Artificial Intelligence

2501.0632

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.88)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)

Add feedback

Towards Kriging-informed Conditional Diffusion for Regional Sea-Level Data Downscaling

Ghosh, Subhankar, Sharma, Arun, Gupta, Jayant, Subramanian, Aneesh, Shekhar, Shashi

arXiv.org Artificial IntelligenceDec-17-2024

Given coarser-resolution projections from global climate models or satellite data, the downscaling problem aims to estimate finer-resolution regional climate data, capturing fine-scale spatial patterns and variability. Downscaling is any method to derive high-resolution data from low-resolution variables, often to provide more detailed and local predictions and analyses. This problem is societally crucial for effective adaptation, mitigation, and resilience against significant risks from climate change. The challenge arises from spatial heterogeneity and the need to recover finer-scale features while ensuring model generalization. Most downscaling methods \cite{Li2020} fail to capture the spatial dependencies at finer scales and underperform on real-world climate datasets, such as sea-level rise. We propose a novel Kriging-informed Conditional Diffusion Probabilistic Model (Ki-CDPM) to capture spatial variability while preserving fine-scale features. Experimental results on climate data show that our proposed method is more accurate than state-of-the-art downscaling techniques.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.15628

Country:

Europe (1.00)
North America > Canada (0.93)
North America > United States > Minnesota (0.29)
North America > United States > Colorado (0.28)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.70)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Visual Counter Turing Test (VCT^2): Discovering the Challenges for AI-Generated Image Detection and Introducing Visual AI Index (V_AI)

Imanpour, Nasrin, Bajpai, Shashwat, Ghosh, Subhankar, Sankepally, Sainath Reddy, Borah, Abhilekh, Abdullah, Hasnat Md, Kosaraju, Nishoak, Dixit, Shreyas, Aziz, Ashhar, Biswas, Shwetangshu, Jain, Vinija, Chadha, Aman, Sheth, Amit, Das, Amitava

arXiv.org Artificial IntelligenceNov-24-2024

The proliferation of AI techniques for image generation, coupled with their increasing accessibility, has raised significant concerns about the potential misuse of these images to spread misinformation. Recent AI-generated image detection (AGID) methods include CNNDetection, NPR, DM Image Detection, Fake Image Detection, DIRE, LASTED, GAN Image Detection, AIDE, SSP, DRCT, RINE, OCC-CLIP, De-Fake, and Deep Fake Detection. However, we argue that the current state-of-the-art AGID techniques are inadequate for effectively detecting contemporary AI-generated images and advocate for a comprehensive reevaluation of these methods. We introduce the Visual Counter Turing Test (VCT^2), a benchmark comprising ~130K images generated by contemporary text-to-image models (Stable Diffusion 2.1, Stable Diffusion XL, Stable Diffusion 3, DALL-E 3, and Midjourney 6). VCT^2 includes two sets of prompts sourced from tweets by the New York Times Twitter account and captions from the MS COCO dataset. We also evaluate the performance of the aforementioned AGID techniques on the VCT$^2$ benchmark, highlighting their ineffectiveness in detecting AI-generated images. As image-generative AI models continue to evolve, the need for a quantifiable framework to evaluate these models becomes increasingly critical. To meet this need, we propose the Visual AI Index (V_AI), which assesses generated images from various visual perspectives, including texture complexity and object coherence, setting a new standard for evaluating image-generative AI models. To foster research in this domain, we make our https://huggingface.co/datasets/anonymous1233/COCO_AI and https://huggingface.co/datasets/anonymous1233/twitter_AI datasets publicly available.

artificial intelligence, machine learning, midjourney 6, (16 more...)

arXiv.org Artificial Intelligence

2411.16754

Country:

North America > United States (0.94)
Europe (0.93)

Genre: Research Report > New Finding (0.68)

Industry:

Media > News (1.00)
Information Technology (1.00)
Government (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.93)

Add feedback

Reducing False Discoveries in Statistically-Significant Regional-Colocation Mining: A Summary of Results

Ghosh, Subhankar, Gupta, Jayant, Sharma, Arun, An, Shuai, Shekhar, Shashi

arXiv.org Artificial IntelligenceJul-1-2024

Given a set \emph{S} of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs $<$a region ($r_{g}$), a subset \emph{C} of \emph{S}$>$ such that \emph{C} is a statistically significant regional-colocation pattern in $r_{g}$. This problem is important for applications in various domains including ecology, economics, and sociology. The problem is computationally challenging due to the exponential number of regional colocation patterns and candidate regions. Previously, we proposed a miner \cite{10.1145/3557989.3566158} that finds statistically significant regional colocation patterns. However, the numerous simultaneous statistical inferences raise the risk of false discoveries (also known as the multiple comparisons problem) and carry a high computational cost. We propose a novel algorithm, namely, multiple comparisons regional colocation miner (MultComp-RCM) which uses a Bonferroni correction. Theoretical analysis, experimental evaluation, and case study results show that the proposed method reduces both the false discovery rate and computational cost.

artificial intelligence, machine learning, spatial reasoning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.4230/LIPIcs.GIScience.2023.3

2407.02536

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.29)

Genre: Research Report > Experimental Study (1.00)

Industry:

Consumer Products & Services (0.95)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Add feedback

Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment

Neekhara, Paarth, Hussain, Shehzeen, Ghosh, Subhankar, Li, Jason, Valle, Rafael, Badlani, Rohan, Ginsburg, Boris

arXiv.org Artificial IntelligenceJun-25-2024

Despite their remarkable Large Language Model (LLM) based text-to-speech (TTS) systems achievements, LLM-based TTS models suffer from have demonstrated remarkable capabilities in handling attention errors resulting in mis-aligned speech, repeating and large speech datasets and generating natural speech for new missing words, analogous to hallucinations [15, 16] exhibited speakers. However, LLM-based TTS models are not robust by LLMs in the text domain. This issue becomes more prominent as the generated output can contain repeating words, missing when the input text is challenging and contains repeating words and mis-aligned speech (referred to as hallucinations or words. For certain inputs, the probabilistic autoregressive inference attention errors), especially when the text contains multiple occurrences of LLM-based TTS models can result in looping or infinite of the same token. We examine these challenges silences [17]. This issue makes LLM-based TTS models unreliable in an encoder-decoder transformer model and find that certain for real-world applications.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2406.17957

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Conformal Prediction for Class-wise Coverage via Augmented Label Rank Calibration

Shi, Yuanjie, Ghosh, Subhankar, Belkhouja, Taha, Doppa, Janardhan Rao, Yan, Yan

arXiv.org Artificial IntelligenceJun-10-2024

Conformal prediction (CP) is an emerging uncertainty quantification framework that allows us to construct a prediction set to cover the true label with a pre-specified marginal or conditional probability. Although the valid coverage guarantee has been extensively studied for classification problems, CP often produces large prediction sets which may not be practically useful. This issue is exacerbated for the setting of class-conditional coverage on imbalanced classification tasks. This paper proposes the Rank Calibrated Class-conditional CP (RC3P) algorithm to reduce the prediction set sizes to achieve class-conditional coverage, where the valid coverage holds for each class. In contrast to the standard class-conditional CP (CCP) method that uniformly thresholds the class-wise conformity score for each class, the augmented label rank calibration step allows RC3P to selectively iterate this class-wise thresholding subroutine only for a subset of classes whose class-wise top-k error is small. We prove that agnostic to the classifier and data distribution, RC3P achieves class-wise coverage. We also show that RC3P reduces the size of prediction sets compared to the CCP method. Comprehensive experiments on multiple real-world datasets demonstrate that RC3P achieves class-wise coverage and 26.25% reduction in prediction set sizes on average.

artificial intelligence, machine learning, prediction, (15 more...)

arXiv.org Artificial Intelligence

2406.06818

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Diagnostic Medicine (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback