AITopics | Ding, Yanna

Collaborating Authors

Ding, Yanna

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection

Ma, Mingyu Derek, Ding, Yanna, Huang, Zijie, Gao, Jianxi, Sun, Yizhou, Wang, Wei

arXiv.org Artificial IntelligenceJan-28-2025

Generative Language Models rely on autoregressive decoding to produce the output sequence token by token. Many tasks such as preference optimization, require the model to produce task-level output consisting of multiple tokens directly by selecting candidates from a pool as predictions. Determining a task-level prediction from candidates using the ordinary token-level decoding mechanism is constrained by time-consuming decoding and interrupted gradients by discrete token selection. Existing works have been using decoding-free candidate selection methods to obtain candidate probability from initial output logits over vocabulary. Though these estimation methods are widely used, they are not systematically evaluated, especially on end tasks. We introduce an evaluation of a comprehensive collection of decoding-free candidate selection approaches on a comprehensive set of tasks, including five multiple-choice QA tasks with a small candidate pool and four clinical decision tasks with a massive amount of candidates, some with 10k+ options. We evaluate the estimation methods paired with a wide spectrum of foundation LMs covering different architectures, sizes and training paradigms. The results and insights from our analysis inform the future model design.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.17338

Country:

North America > United States > California (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Education (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)

Add feedback

Predicting Time Series of Networked Dynamical Systems without Knowing Topology

Ding, Yanna, Huang, Zijie, Magdon-Ismail, Malik, Gao, Jianxi

arXiv.org Artificial IntelligenceDec-24-2024

Many real-world complex systems, such as epidemic spreading networks and ecosystems, can be modeled as networked dynamical systems that produce multivariate time series. Learning the intrinsic dynamics from observational data is pivotal for forecasting system behaviors and making informed decisions. However, existing methods for modeling networked time series often assume known topologies, whereas real-world networks are typically incomplete or inaccurate, with missing or spurious links that hinder precise predictions. Moreover, while networked time series often originate from diverse topologies, the ability of models to generalize across topologies has not been systematically evaluated. To address these gaps, we propose a novel framework for learning network dynamics directly from observed time-series data, when prior knowledge of graph topology or governing dynamical equations is absent. Our approach leverages continuous graph neural networks with an attention mechanism to construct a latent topology, enabling accurate reconstruction of future trajectories for network states. Extensive experiments on real and synthetic networks demonstrate that our model not only captures dynamics effectively without topology knowledge but also generalizes to unseen time series originating from diverse topologies.

artificial intelligence, data mining, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2412.18734

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
Health & Medicine > Therapeutic Area > Immunology (0.46)
Health & Medicine > Epidemiology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Architecture-Aware Learning Curve Extrapolation via Graph Ordinary Differential Equation

Ding, Yanna, Huang, Zijie, Shou, Xiao, Guo, Yihang, Sun, Yizhou, Gao, Jianxi

arXiv.org Machine LearningDec-22-2024

We Training neural architectures is a resource-intensive endeavor, utilize a seq2seq variational autoencoder framework to analyze often demanding considerable computational power the initial stages of a learning curve and predict its future and time. Researchers have developed various methodologies progression. This predictive capability is further enhanced to predict the performance of neural networks early in by an architecture-aware component that produces a graphlevel the training process using learning curve data. Some methods embedding from the architecture's topology, employing Domhan et al. (2015); Gargiani et al. (2019); Adriaensen techniques like Graph Convolutional Networks (GCN) Kipf et al. (2023) apply Bayesian inference to project these and Welling (2016) and Differentiable Pooling Ying et al. curves forward, while others employ time-series prediction (2018). This integration not only improves the accuracy of techniques, such as LSTM networks. Despite their effectiveness, learning curve extrapolations compared to existing methods these approaches (Swersky et al., 2014; Baker et al., but also significantly facilitates model ranking, potentially 2017) typically overlook the architectural features of networks, leading to more efficient use of computational resources, missing out on crucial insights that could be derived from the accelerated experimentation cycles, and faster progress in the models' topology.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

2412.15554

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback