AITopics | Tran, Minh

Collaborating Authors

Tran, Minh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Negative to Positive Co-learning with Aggressive Modality Dropout

Magal, Nicholas, Tran, Minh, Arakawa, Riku, Nie, Suzanne

arXiv.org Artificial IntelligenceJan-1-2025

We find that by using variant. We show that in situations where there is NCL, by aggressive modality dropout we are able to applying aggressive modality dropout we are able to reverse reverse negative co-learning (NCL) to positive NCL to PCL. While there is prior work documenting the effectiveness co-learning (PCL). Aggressive modality dropout of modality modality dropout during co-learning can be used to'prep' a multimodal model for and multimodal machine learning, we are the first to show unimodal deployment, and dramatically increases that modality dropout can reverse NCL to PCL. model performance during negative co-learning, where during some experiments we saw a 20% gain in accuracy.

artificial intelligence, machine learning, modality dropout, (17 more...)

arXiv.org Artificial Intelligence

2501.00865

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Adversarial Representation Learning for Robust Privacy Preservation in Audio

Gharib, Shayan, Tran, Minh, Luong, Diep, Drossos, Konstantinos, Virtanen, Tuomas

arXiv.org Artificial IntelligenceJan-3-2024

Sound event detection systems are widely used in various applications such as surveillance and environmental monitoring where data is automatically collected, processed, and sent to a cloud for sound recognition. However, this process may inadvertently reveal sensitive information about users or their surroundings, hence raising privacy concerns. In this study, we propose a novel adversarial training method for learning representations of audio recordings that effectively prevents the detection of speech activity from the latent features of the recordings. The proposed method trains a model to generate invariant latent representations of speech-containing audio recordings that cannot be distinguished from non-speech recordings by a speech classifier. The novelty of our work is in the optimization algorithm, where the speech classifier's weights are regularly replaced with the weights of classifiers trained in a supervised manner. This increases the discrimination power of the speech classifier constantly during the adversarial training, motivating the model to generate latent representations in which speech is not distinguishable, even using new speech classifiers trained outside the adversarial training loop. The proposed method is evaluated against a baseline approach with no privacy measures and a prior adversarial training method, demonstrating a significant reduction in privacy violations compared to the baseline approach. Additionally, we show that the prior adversarial method is practically ineffective for this purpose.

data mining, information, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/OJSP.2023.3349113

2305.00011

Country: Europe > Finland (0.29)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

3FM: Multi-modal Meta-learning for Federated Tasks

Tran, Minh, Shah, Roochi, Gong, Zejun

arXiv.org Artificial IntelligenceDec-15-2023

We present a novel approach in the domain of federated learning (FL), particularly focusing on addressing the challenges posed by modality heterogeneity, variability in modality availability across clients, and the prevalent issue of missing data. We introduce a meta-learning framework specifically designed for multimodal federated tasks. Our approach is motivated by the need to enable federated models to robustly adapt when exposed to new modalities, a common scenario in FL where clients often differ in the number of available modalities. The effectiveness of our proposed framework is demonstrated through extensive experimentation on an augmented MNIST dataset, enriched with audio and sign language data. We demonstrate that the proposed algorithm achieves better performance than the baseline on a subset of missing modality scenarios with careful tuning of the meta-learning rates. This is a shortened report, and our work will be extended and updated soon. Code and full report could be found on our Github.

artificial intelligence, machine learning, modality, (18 more...)

arXiv.org Artificial Intelligence

2312.10179

Genre: Research Report (0.70)

Industry: Education (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

Yamazaki, Kashu, Hanyu, Taisei, Vo, Khoa, Pham, Thang, Tran, Minh, Doretto, Gianfranco, Nguyen, Anh, Le, Ngan

arXiv.org Artificial IntelligenceOct-5-2023

Precise 3D environmental mapping is pivotal in robotics. Existing methods often rely on predefined concepts during training or are time-intensive when generating semantic maps. This paper presents Open-Fusion, a groundbreaking approach for real-time open-vocabulary 3D mapping and queryable scene representation using RGB-D data. Open-Fusion harnesses the power of a pre-trained vision-language foundation model (VLFM) for open-set semantic comprehension and employs the Truncated Signed Distance Function (TSDF) for swift 3D scene reconstruction. By leveraging the VLFM, we extract region-based embeddings and their associated confidence maps. These are then integrated with 3D knowledge from TSDF using an enhanced Hungarian-based feature-matching mechanism. Notably, Open-Fusion delivers outstanding annotation-free 3D segmentation for open-vocabulary without necessitating additional 3D training. Benchmark tests on the ScanNet dataset against leading zero-shot methods highlight Open-Fusion's superiority. Furthermore, it seamlessly combines the strengths of region-based VLFM and TSDF, facilitating real-time 3D scene comprehension that includes object concepts and open-world semantics. We encourage the readers to view the demos on our project page: https://uark-aicv.github.io/OpenFusion

artificial intelligence, mapping and queryable scene representation, open-fusion

arXiv.org Artificial Intelligence

2310.03923

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence (0.87)
Information Technology > Architecture > Real Time Systems (0.80)

Add feedback

Representation Learning for Audio Privacy Preservation using Source Separation and Robust Adversarial Learning

Luong, Diep, Tran, Minh, Gharib, Shayan, Drossos, Konstantinos, Virtanen, Tuomas

arXiv.org Artificial IntelligenceAug-9-2023

Privacy preservation has long been a concern in smart acoustic monitoring systems, where speech can be passively recorded along with a target signal in the system's operating environment. In this study, we propose the integration of two commonly used approaches in privacy preservation: source separation and adversarial representation learning. The proposed system learns the latent representation of audio recordings such that it prevents differentiating between speech and non-speech recordings. Initially, the source separation network filters out some of the privacy-sensitive data, and during the adversarial learning process, the system will learn privacy-preserving representation on the filtered signal. We demonstrate the effectiveness of our proposed method by comparing our method against systems without source separation, without adversarial learning, and without both. Overall, our results suggest that the proposed system can significantly improve speech privacy preservation compared to that of using source separation or adversarial learning solely while maintaining good performance in the acoustic monitoring task.

artificial intelligence, machine learning, speech recognition, (15 more...)

arXiv.org Artificial Intelligence

2308.0496

Country: Europe > Finland (0.29)

Genre: Research Report > New Finding (0.89)

Industry:

Information Technology > Security & Privacy (0.47)
Media > Music (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)

Add feedback

An Inception-Residual-Based Architecture with Multi-Objective Loss for Detecting Respiratory Anomalies

Ngo, Dat, Pham, Lam, Phan, Huy, Tran, Minh, Jarchi, Delaram, Kolozali, Sefki

arXiv.org Artificial IntelligenceJun-19-2023

This paper presents a deep learning system applied for detecting anomalies from respiratory sound recordings. Initially, our system begins with audio feature extraction using Gammatone and Continuous Wavelet transformation. This step aims to transform the respiratory sound input into a two-dimensional spectrogram where both spectral and temporal features are presented. Then, our proposed system integrates Inception-residual-based backbone models combined with multi-head attention and multi-objective loss to classify respiratory anomalies. Instead of applying a simple concatenation approach by combining results from various spectrograms, we propose a Linear combination, which has the ability to regulate equally the contribution of each individual spectrogram throughout the training process. To evaluate the performance, we conducted experiments over the benchmark dataset of SPRSound (The Open-Source SJTU Paediatric Respiratory Sound) proposed by the IEEE BioCAS 2022 challenge. As regards the Score computed by an average between the average score and harmonic score, our proposed system gained significant improvements of 9.7%, 15.8%, 17.8%, and 16.1% in Task 1-1, Task 1-2, Task 2-1, and Task 2-2, respectively, compared to the challenge baseline system. Notably, we achieved the Top-1 performance in Task 2-1 and Task 2-2 with the highest Score of 74.5% and 53.9%, respectively.

artificial intelligence, machine learning, spectrogram, (19 more...)

arXiv.org Artificial Intelligence

2303.04104

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Meta Learning for Few-Shot Medical Text Classification

Sharma, Pankaj, Qureshi, Imran, Tran, Minh

arXiv.org Artificial IntelligenceDec-3-2022

Medical professionals frequently work in a data constrained setting to provide insights across a unique demographic. A few medical observations, for instance, informs the diagnosis and treatment of a patient. This suggests a unique setting for meta-learning, a method to learn models quickly on new tasks, to provide insights unattainable by other methods. We investigate the use of meta-learning and robustness techniques on a broad corpus of benchmark text and medical data. To do this, we developed new data pipelines, combined language models with meta-learning approaches, and extended existing meta-learning algorithms to minimize worst case loss. We find that meta-learning on text is a suitable framework for text-based data, providing better data efficiency and comparable performance to few-shot language models and can be successfully applied to medical note data. Furthermore, meta-learning models coupled with DRO can improve worst case loss across disease codes.

machine learning, natural language, text classification, (19 more...)

arXiv.org Artificial Intelligence

2212.01552

Genre:

Research Report (1.00)
Instructional Material > Online (0.40)
Instructional Material > Course Syllabus & Notes (0.40)

Industry:

Health & Medicine > Health Care Providers & Services (0.70)
Health & Medicine > Health Care Technology > Medical Record (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)

Add feedback