AITopics | Feng, Tiantian

Collaborating Authors

Feng, Tiantian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Convex Hull-based Algebraic Constraint for Visual Quadric SLAM

Yu, Xiaolong, Zhao, Junqiao, Song, Shuangfu, Zhu, Zhongyang, Yuan, Zihan, Ye, Chen, Feng, Tiantian

arXiv.org Artificial IntelligenceMar-3-2025

-- Using Quadrics as the object representation has the benefits of both generality and closed-form projection derivation between image and world spaces. Although numerous constraints have been proposed for dual quadric reconstruction, we found that many of them are imprecise and provide minimal improvements to localization. After scrutinizing the existing constraints, we introduce a concise yet more precise convex hull-based algebraic constraint for object landmarks, which is applied to object reconstruction, frontend pose estimation, and backend bundle adjustment. This constraint is designed to fully leverage precise semantic segmentation, effectively mitigating mismatches between complex-shaped object contours and dual quadrics. Experiments on public datasets demonstrate that our approach is applicable to both monocular and RGB-D SLAM and achieves improved object mapping and localization than existing quadric SLAM methods. In recent years, with the rapid development of object detection and semantic segmentation, many object-based SLAM systems have been proposed [1], [2], [3], [4], [5]. By mapping and localizing high-level object landmarks, the robustness of the SLAM system is improved because the image feature points are susceptible to environmental degradation or variations such as textureless regions and illumination changes.

artificial intelligence, constraint, convex hull, (11 more...)

arXiv.org Artificial Intelligence

2503.01254

Country:

Asia > China (0.15)
Oceania > Australia (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Enhancing Listened Speech Decoding from EEG via Parallel Phoneme Sequence Prediction

Lee, Jihwan, Feng, Tiantian, Kommineni, Aditya, Kadiri, Sudarsana Reddy, Narayanan, Shrikanth

arXiv.org Artificial IntelligenceJan-8-2025

Brain-computer interfaces (BCI) offer numerous human-centered application possibilities, particularly affecting people with neurological disorders. Text or speech decoding from brain activities is a relevant domain that could augment the quality of life for people with impaired speech perception. We propose a novel approach to enhance listened speech decoding from electroencephalography (EEG) signals by utilizing an auxiliary phoneme predictor that simultaneously decodes textual phoneme sequences. The proposed model architecture consists of three main parts: EEG module, speech module, and phoneme predictor. The EEG module learns to properly represent EEG signals into EEG embeddings. The speech module generates speech waveforms from the EEG embeddings. The phoneme predictor outputs the decoded phoneme sequences in text modality. Our proposed approach allows users to obtain decoded listened speech from EEG signals in both modalities (speech waveforms and textual phoneme sequences) simultaneously, eliminating the need for a concatenated sequential pipeline for each modality. The proposed approach also outperforms previous methods in both modalities. The source code and speech samples are publicly available.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.04844

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Can Generic LLMs Help Analyze Child-adult Interactions Involving Children with Autism in Clinical Observation?

Feng, Tiantian, Xu, Anfeng, Lahiri, Rimita, Tager-Flusberg, Helen, Kim, So Hyun, Bishop, Somer, Lord, Catherine, Narayanan, Shrikanth

arXiv.org Artificial IntelligenceNov-16-2024

Large Language Models (LLMs) have shown significant potential in understanding human communication and interaction. However, their performance in the domain of child-inclusive interactions, including in clinical settings, remains less explored. In this work, we evaluate generic LLMs' ability to analyze child-adult dyadic interactions in a clinically relevant context involving children with ASD. Specifically, we explore LLMs in performing four tasks: classifying child-adult utterances, predicting engaged activities, recognizing language skills and understanding traits that are clinically relevant. Our evaluation shows that generic LLMs are highly capable of analyzing long and complex conversations in clinical observation sessions, often surpassing the performance of non-expert human evaluators. The results show their potential to segment interactions of interest, assist in language skills evaluation, identify engaged activities, and offer clinical-relevant context for assessments.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.10761

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology > Autism (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder

Kommineni, Aditya, Bose, Digbalay, Feng, Tiantian, Kim, So Hyun, Tager-Flusberg, Helen, Bishop, Somer, Lord, Catherine, Kadiri, Sudarsana, Narayanan, Shrikanth

arXiv.org Artificial IntelligenceSep-20-2024

Clinical videos in the context of Autism Spectrum Disorder are often long-form interactions between children and caregivers/clinical professionals, encompassing complex verbal and non-verbal behaviors. Objective analyses of these videos could provide clinicians and researchers with nuanced insights into the behavior of children with Autism Spectrum Disorder. Manually coding these videos is a time-consuming task and requires a high level of domain expertise. Hence, the ability to capture these interactions computationally can augment the manual effort and enable supporting the diagnostic procedure. In this work, we investigate the use of foundation models across three modalities: speech, video, and text, to analyse child-focused interaction sessions. We propose a unified methodology to combine multiple modalities by using large language models as reasoning agents. We evaluate their performance on two tasks with different information granularity: activity recognition and abnormal behavior detection. We find that the proposed multimodal pipeline provides robustness to modality-specific limitations and improves performance on the clinical video analysis compared to unimodal settings.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2409.13606

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Neurology > Autism (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?

Feng, Tiantian, Dimitriadis, Dimitrios, Narayanan, Shrikanth

arXiv.org Artificial IntelligenceJun-13-2024

Recent advances in foundation models have enabled audio-generative models that produce high-fidelity sounds associated with music, events, and human actions. Despite the success achieved in modern audio-generative models, the conventional approach to assessing the quality of the audio generation relies heavily on distance metrics like Frechet Audio Distance. In contrast, we aim to evaluate the quality of audio generation by examining the effectiveness of using them as training data. Specifically, we conduct studies to explore the use of synthetic audio for audio recognition. Moreover, we investigate whether synthetic audio can serve as a resource for data augmentation in speech-related modeling. Our comprehensive experiments demonstrate the potential of using synthetic audio for audio recognition and speech-related modeling. Our code is available at https://github.com/usc-sail/SynthAudio.

machine learning, natural language, recognition, (15 more...)

arXiv.org Artificial Intelligence

2406.088

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Music (0.48)
Leisure & Entertainment (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions

Xu, Anfeng, Huang, Kevin, Feng, Tiantian, Shen, Lue, Tager-Flusberg, Helen, Narayanan, Shrikanth

arXiv.org Artificial IntelligenceJun-12-2024

Speech foundation models, trained on vast datasets, have opened unique opportunities in addressing challenging low-resource speech understanding, such as child speech. In this work, we explore the capabilities of speech foundation models on child-adult speaker diarization. We show that exemplary foundation models can achieve 39.5% and 62.3% relative reductions in Diarization Error Rate and Speaker Confusion Rate, respectively, compared to previous speaker diarization methods. In addition, we benchmark and evaluate the speaker diarization results of the speech foundation models with varying the input audio window size, speaker demographics, and training data ratio. Our results highlight promising pathways for understanding and adopting speech foundation models to facilitate child speech understanding.

artificial intelligence, diarization, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2406.0789

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Neurology > Autism (0.30)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Toward Fully-End-to-End Listened Speech Decoding from EEG Signals

Lee, Jihwan, Kommineni, Aditya, Feng, Tiantian, Avramidis, Kleanthis, Shi, Xuan, Kadiri, Sudarsana, Narayanan, Shrikanth

arXiv.org Artificial IntelligenceJun-12-2024

Speech decoding from EEG signals is a challenging task, where brain activity is modeled to estimate salient characteristics of acoustic stimuli. We propose FESDE, a novel framework for Fully-End-to-end Speech Decoding from EEG signals. Our approach aims to directly reconstruct listened speech waveforms given EEG signals, where no intermediate acoustic feature processing step is required. The proposed method consists of an EEG module and a speech module along with a connector. The EEG module learns to better represent EEG signals, while the speech module generates speech waveforms from model representations. The connector learns to bridge the distributions of the latent spaces of EEG and speech. The proposed framework is both simple and efficient, by allowing single-step inference, and outperforms prior works on objective metrics. A fine-grained phoneme analysis is conducted to unveil model characteristics of speech decoding. The source code is available here: github.com/lee-jhwn/fesde.

artificial intelligence, machine learning, speech, (17 more...)

arXiv.org Artificial Intelligence

2406.08644

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning

Huang, Chang, Zhao, Junqiao, Zhu, Shatong, Zhou, Hongtu, Ye, Chen, Feng, Tiantian, Jiang, Changjun

arXiv.org Artificial IntelligenceMay-15-2024

Value function factorization methods are commonly used in cooperative multi-agent reinforcement learning, with QMIX receiving significant attention. Many QMIX-based methods introduce monotonicity constraints between the joint action value and individual action values to achieve decentralized execution. However, such constraints limit the representation capacity of value factorization, restricting the joint action values it can represent and hindering the learning of the optimal policy. To address this challenge, we propose the Potentially Optimal joint actions Weighted QMIX (POWQMIX) algorithm, which recognizes the potentially optimal joint actions and assigns higher weights to the corresponding losses of these joint actions during training. We theoretically prove that with such a weighted training approach the optimal policy is guaranteed to be recovered. Experiments in matrix games, predator-prey, and StarCraft II Multi-Agent Challenge environments demonstrate that our algorithm outperforms the state-of-the-art value-based multi-agent reinforcement learning methods.

joint action, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2405.08036

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

LOG-LIO2: A LiDAR-Inertial Odometry with Efficient Uncertainty Analysis

Huang, Kai, Zhao, Junqiao, Lin, Jiaye, Zhu, Zhongyang, Song, Shuangfu, Ye, Chen, Feng, Tiantian

arXiv.org Artificial IntelligenceMay-2-2024

Uncertainty in LiDAR measurements, stemming from factors such as range sensing, is crucial for LIO (LiDAR-Inertial Odometry) systems as it affects the accurate weighting in the loss function. While recent LIO systems address uncertainty related to range sensing, the impact of incident angle on uncertainty is often overlooked by the community. Moreover, the existing uncertainty propagation methods suffer from computational inefficiency. This paper proposes a comprehensive point uncertainty model that accounts for both the uncertainties from LiDAR measurements and surface characteristics, along with an efficient local uncertainty analytical method for LiDAR-based state estimation problem. We employ a projection operator that separates the uncertainty into the ray direction and its orthogonal plane. Then, we derive incremental Jacobian matrices of eigenvalues and eigenvectors w.r.t. points, which enables a fast approximation of uncertainty propagation. This approach eliminates the requirement for redundant traversal of points, significantly reducing the time complexity of uncertainty propagation from $\mathcal{O} (n)$ to $\mathcal{O} (1)$ when a new point is added. Simulations and experiments on public datasets are conducted to validate the accuracy and efficiency of our formulations. The proposed methods have been integrated into a LIO system, which is available at https://github.com/tiev-tongji/LOG-LIO2.

artificial intelligence, covariance, log-lio2, (14 more...)

arXiv.org Artificial Intelligence

2405.01316

Country: Asia > China (0.15)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Robots (0.47)

Add feedback

TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality

Feng, Tiantian, Shi, Xuan, Gupta, Rahul, Narayanan, Shrikanth S.

arXiv.org Artificial IntelligenceApr-27-2024

Automatic Speech Understanding (ASU) aims at human-like speech interpretation, providing nuanced intent, emotion, sentiment, and content understanding from speech and language (text) content conveyed in speech. Typically, training a robust ASU model relies heavily on acquiring large-scale, high-quality speech and associated transcriptions. However, it is often challenging to collect or use speech data for training ASU due to concerns such as privacy. To approach this setting of enabling ASU when speech (audio) modality is missing, we propose TI-ASU, using a pre-trained text-to-speech model to impute the missing speech. We report extensive experiments evaluating TI-ASU on various missing scales, both multi- and single-modality settings, and the use of LLMs. Our findings show that TI-ASU yields substantial benefits to improve ASU in scenarios where even up to 95% of training speech is missing. Moreover, we show that TI-ASU is adaptive to dropout training, improving model robustness in addressing missing speech during inference.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2404.17983

Country:

North America > United States > California (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(3 more...)

Add feedback