AITopics | Nguyen, Tuan Dung

Collaborating Authors

Nguyen, Tuan Dung

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy

Pan, Rui, Nguyen, Tuan Dung, Arora, Hardik, Accomazzi, Alberto, Ghosal, Tirthankar, Ting, Yuan-Sen

arXiv.org Artificial IntelligenceSep-29-2024

Continual pretraining of large language models on domain-specific data has been proposed to enhance performance on downstream tasks. In astronomy, the previous absence of astronomy-focused benchmarks has hindered objective evaluation of these specialized LLM models. Leveraging a recent initiative to curate high-quality astronomical MCQs, this study aims to quantitatively assess specialized LLMs in astronomy. We find that the previously released AstroLLaMA series, based on LLaMA-2-7B, underperforms compared to the base model. We demonstrate that this performance degradation can be partially mitigated by utilizing high-quality data for continual pretraining, such as summarized text from arXiv. Despite the observed catastrophic forgetting in smaller models, our results indicate that continual pretraining on the 70B model can yield significant improvements. However, the current supervised fine-tuning dataset still constrains the performance of instruct models. In conjunction with this study, we introduce a new set of models, AstroLLaMA-3-8B and AstroLLaMA-2-70B, building upon the previous AstroLLaMA series.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2409.1975

Country:

North America > United States > Illinois (0.14)
North America > United States > Pennsylvania (0.14)
North America > Mexico > Mexico City (0.14)

Genre: Research Report > New Finding (0.89)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AstroMLab 1: Who Wins Astronomy Jeopardy!?

Ting, Yuan-Sen, Nguyen, Tuan Dung, Ghosal, Tirthankar, Pan, Rui, Arora, Hardik, Sun, Zechang, de Haan, Tijmen, Ramachandra, Nesar, Wells, Azton, Madireddy, Sandeep, Accomazzi, Alberto

arXiv.org Artificial IntelligenceJul-15-2024

We present a comprehensive evaluation of proprietary and open-weights large language models using the first astronomy-specific benchmarking dataset. This dataset comprises 4,425 multiple-choice questions curated from the Annual Review of Astronomy and Astrophysics, covering a broad range of astrophysical topics. Our analysis examines model performance across various astronomical subfields and assesses response calibration, crucial for potential deployment in research environments. Claude-3.5-Sonnet outperforms competitors by up to 4.6 percentage points, achieving 85.0% accuracy. For proprietary models, we observed a universal reduction in cost every 3-to-12 months to achieve similar score in this particular astronomy benchmark. Open-source models have rapidly improved, with LLaMA-3-70b (80.6%) and Qwen-2-72b (77.7%) now competing with some of the best proprietary models. We identify performance variations across topics, with non-English-focused models generally struggling more in exoplanet-related fields, stellar astrophysics, and instrumentation related questions. These challenges likely stem from less abundant training data, limited historical context, and rapid recent developments in these areas. This pattern is observed across both open-weights and proprietary models, with regional dependencies evident, highlighting the impact of training data diversity on model performance in specialized scientific domains. Top-performing models demonstrate well-calibrated confidence, with correlations above 0.9 between confidence and correctness, though they tend to be slightly underconfident. The development for fast, low-cost inference of open-weights models presents new opportunities for affordable deployment in astronomy. The rapid progress observed suggests that LLM-driven research in astronomy may become feasible in the near future.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2407.11194

Country:

Asia (1.00)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Energy (0.93)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Federated PCA on Grassmann Manifold for IoT Anomaly Detection

Nguyen, Tung-Anh, Le, Long Tan, Nguyen, Tuan Dung, Bao, Wei, Seneviratne, Suranga, Hong, Choong Seon, Tran, Nguyen H.

arXiv.org Artificial IntelligenceJul-10-2024

With the proliferation of the Internet of Things (IoT) and the rising interconnectedness of devices, network security faces significant challenges, especially from anomalous activities. While traditional machine learning-based intrusion detection systems (ML-IDS) effectively employ supervised learning methods, they possess limitations such as the requirement for labeled data and challenges with high dimensionality. Recent unsupervised ML-IDS approaches such as AutoEncoders and Generative Adversarial Networks (GAN) offer alternative solutions but pose challenges in deployment onto resource-constrained IoT devices and in interpretability. To address these concerns, this paper proposes a novel federated unsupervised anomaly detection framework, FedPCA, that leverages Principal Component Analysis (PCA) and the Alternating Directions Method Multipliers (ADMM) to learn common representations of distributed non-i.i.d. datasets. Building on the FedPCA framework, we propose two algorithms, FEDPE in Euclidean space and FEDPG on Grassmann manifolds. Our approach enables real-time threat detection and mitigation at the device level, enhancing network resilience while ensuring privacy. Moreover, the proposed algorithms are accompanied by theoretical convergence rates even under a subsampling scheme, a novel result. Experimental results on the UNSW-NB15 and TON-IoT datasets show that our proposed methods offer performance in anomaly detection comparable to nonlinear baselines, while providing significant improvements in communication and memory efficiency, underscoring their potential for securing IoT networks.

data mining, detection, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TNET.2024.3423780

2407.07421

Country:

North America > United States (0.46)
Asia > Vietnam (0.28)
North America > Canada > Ontario > Toronto (0.14)
Oceania > Australia > New South Wales (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets

Perkowski, Ernest, Pan, Rui, Nguyen, Tuan Dung, Ting, Yuan-Sen, Kruk, Sandor, Zhang, Tong, O'Neill, Charlie, Jablonska, Maja, Sun, Zechang, Smith, Michael J., Liu, Huiling, Schawinski, Kevin, Iyer, Kartheik, UniverseTBD, Ioana Ciucă for

arXiv.org Artificial IntelligenceJan-5-2024

To enhance this, we introduce AstroLLaMA-Chat, an advanced version of AstroLLaMA. This new iteration broadens the training scope to include introductions and conclusions of papers, alongside abstracts, as these sections are often rich in pivotal information for question-answering tasks. We initiated by downloading all papers up to July 2023, including all the files that come with a submission to arXiv. The data has been further refined for optimal operability, retaining only files with ".tex" suffixes. Through a multi-stage process, and utilising a comprehensive regex matching process, the extraction of the targeted sections was performed. Given the diverse LaTeX formatting standards, approximately 90% of the samples remained post-processing. Subsequently, we removed specific formatting patterns, comments, and superfluous symbols like newlines to ensure the readability of the training data. Further, we have fine-tuned AstroLLaMA-Chat on a domain-specific dialogue dataset. To generate Question-Answer pairs, we engaged GPT-4 (OpenAI 2023) to formulate pertinent questions from paragraphs within 300,000 arXiv papers, with GPT-4 also tasked with answering these questions by retrieving context-relevant information.

astrollama-chat, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2401.01916

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Add feedback

On Partial Optimal Transport: Revising the Infeasibility of Sinkhorn and Efficient Gradient Methods

Nguyen, Anh Duc, Nguyen, Tuan Dung, Nguyen, Quang Minh, Nguyen, Hoang H., Nguyen, Lam M., Toh, Kim-Chuan

arXiv.org Artificial IntelligenceDec-22-2023

This paper studies the Partial Optimal Transport (POT) problem between two unbalanced measures with at most $n$ supports and its applications in various AI tasks such as color transfer or domain adaptation. There is hence the need for fast approximations of POT with increasingly large problem sizes in arising applications. We first theoretically and experimentally investigate the infeasibility of the state-of-the-art Sinkhorn algorithm for POT due to its incompatible rounding procedure, which consequently degrades its qualitative performance in real world applications like point-cloud registration. To this end, we propose a novel rounding algorithm for POT, and then provide a feasible Sinkhorn procedure with a revised computation complexity of $\mathcal{\widetilde O}(n^2/\varepsilon^4)$. Our rounding algorithm also permits the development of two first-order methods to approximate the POT problem. The first algorithm, Adaptive Primal-Dual Accelerated Gradient Descent (APDAGD), finds an $\varepsilon$-approximate solution to the POT problem in $\mathcal{\widetilde O}(n^{2.5}/\varepsilon)$, which is better in $\varepsilon$ than revised Sinkhorn. The second method, Dual Extrapolation, achieves the computation complexity of $\mathcal{\widetilde O}(n^2/\varepsilon)$, thereby being the best in the literature. We further demonstrate the flexibility of POT compared to standard OT as well as the practicality of our algorithms on real applications where two marginal distributions are unbalanced.

artificial intelligence, machine learning, sinkhorn, (17 more...)

arXiv.org Artificial Intelligence

2312.1397

Country:

Asia (0.46)
North America > United States > Massachusetts (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Federated Deep Equilibrium Learning: A Compact Shared Representation for Edge Communication Efficiency

Le, Long Tan, Nguyen, Tuan Dung, Nguyen, Tung-Anh, Hong, Choong Seon, Tran, Nguyen H.

arXiv.org Artificial IntelligenceSep-27-2023

Federated Learning (FL) is a prominent distributed learning paradigm facilitating collaboration among nodes within an edge network to co-train a global model without centralizing data. By shifting computation to the network edge, FL offers robust and responsive edge-AI solutions and enhance privacy-preservation. However, deploying deep FL models within edge environments is often hindered by communication bottlenecks, data heterogeneity, and memory limitations. To address these challenges jointly, we introduce FeDEQ, a pioneering FL framework that effectively employs deep equilibrium learning and consensus optimization to exploit a compact shared data representation across edge nodes, allowing the derivation of personalized models specific to each node. We delve into a unique model structure composed of an equilibrium layer followed by traditional neural network layers. Here, the equilibrium layer functions as a global feature representation that edge nodes can adapt to personalize their local layers. Capitalizing on FeDEQ's compactness and representation power, we present a novel distributed algorithm rooted in the alternating direction method of multipliers (ADMM) consensus optimization and theoretically establish its convergence for smooth objectives. Experiments across various benchmarks demonstrate that FeDEQ achieves performance comparable to state-of-the-art personalized methods while employing models of up to 4 times smaller in communication size and 1.5 times lower memory footprint during training.

artificial intelligence, federated deep equilibrium learning, machine learning, (2 more...)

arXiv.org Artificial Intelligence

2309.15659

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.87)

Add feedback

AstroLLaMA: Towards Specialized Foundation Models in Astronomy

Nguyen, Tuan Dung, Ting, Yuan-Sen, Ciucă, Ioana, O'Neill, Charlie, Sun, Ze-Chang, Jabłońska, Maja, Kruk, Sandor, Perkowski, Ernest, Miller, Jack, Li, Jason, Peek, Josh, Iyer, Kartheik, Różański, Tomasz, Khetarpal, Pranav, Zaman, Sharaf, Brodrick, David, Méndez, Sergio J. Rodríguez, Bui, Thang, Goodman, Alyssa, Accomazzi, Alberto, Naiman, Jill, Cranney, Jesse, Schawinski, Kevin, UniverseTBD, null

arXiv.org Artificial IntelligenceSep-12-2023

Large language models excel in many human-language tasks but often falter in highly specialized domains like scholarly astronomy. To bridge this gap, we introduce AstroLLaMA, a 7-billion-parameter model fine-tuned from LLaMA-2 using over 300,000 astronomy abstracts from arXiv. Optimized for traditional causal language modeling, AstroLLaMA achieves a 30% lower perplexity than Llama-2, showing marked domain adaptation. Our model generates more insightful and scientifically relevant text completions and embedding extraction than state-of-the-arts foundation models despite having significantly fewer parameters. AstroLLaMA serves as a robust, domain-specific model with broad fine-tuning potential. Its public release aims to spur astronomy-focused research, including automatic paper summarization and conversational agent development.

astrollama, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2309.06126

Country: North America > United States (0.95)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Personalized Federated Learning with Moreau Envelopes

Dinh, Canh T., Tran, Nguyen H., Nguyen, Tuan Dung

arXiv.org Machine LearningJun-15-2020

Federated learning (FL) is a decentralized and privacy-preserving machine learning technique in which a group of clients collaborate with a server to learn a global model without sharing clients' data. One challenge associated with FL is statistical diversity among clients, which restricts the global model from delivering good performance on each client's task. To address this, we propose an algorithm for personalized FL (pFedMe) using Moreau envelopes as clients' regularized loss functions, which help decouple personalized model optimization from the global model learning in a bi-level problem stylized for personalized FL. Theoretically, we show that pFedMe's convergence rate is state-of-the-art: achieving quadratic speedup for strongly convex and sublinear speedup of order 2/3 for smooth nonconvex objectives. Experimentally, we verify that pFedMe excels at empirical performance compared with the vanilla FedAvg and Per-FedAvg, a meta-learning based personalized FL algorithm.

artificial intelligence, neural network, pfedme, (15 more...)

arXiv.org Machine Learning

2006.08848

Country:

Oceania > Australia > New South Wales > Sydney (0.14)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.84)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback