AITopics | Hu, Ting

Collaborating Authors

Hu, Ting

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs

Lin, Haoran, Yu, Xianzhi, Zhao, Kang, Hou, Lu, Zhan, Zongyuan, Kamenev, Stanislav, Bao, Han, Hu, Ting, Wang, Mingkai, Chang, Qixin, Sui, Siyue, Sun, Weihao, Hu, Jiaxin, Yao, Jun, Yin, Zekun, Qian, Cheng, Zhang, Ying, Pan, Yinfei, Yang, Yu, Liu, Weiguo

arXiv.org Artificial IntelligenceOct-21-2024

FlashAttention series has been widely applied in the inference of large language models (LLMs). However, FlashAttention series only supports the high-level GPU architectures, e.g., Ampere and Hopper. At present, FlashAttention series is not easily transferrable to NPUs and low-resource GPUs. Moreover, FlashAttention series is inefficient for multi- NPUs or GPUs inference scenarios. In this work, we propose FastAttention which pioneers the adaptation of FlashAttention series for NPUs and low-resource GPUs to boost LLM inference efficiency. Specifically, we take Ascend NPUs and Volta-based GPUs as representatives for designing our FastAttention. We migrate FlashAttention series to Ascend NPUs by proposing a novel two-level tiling strategy for runtime speedup, tiling-mask strategy for memory saving and the tiling-AllReduce strategy for reducing communication overhead, respectively. Besides, we adapt FlashAttention for Volta-based GPUs by redesigning the operands layout in shared memory and introducing a simple yet effective CPU-GPU cooperative strategy for efficient memory utilization. On Ascend NPUs, our FastAttention can achieve a 10.7$\times$ speedup compared to the standard attention implementation. Llama-7B within FastAttention reaches up to 5.16$\times$ higher throughput than within the standard attention. On Volta architecture GPUs, FastAttention yields 1.43$\times$ speedup compared to its equivalents in \texttt{xformers}. Pangu-38B within FastAttention brings 1.46$\times$ end-to-end speedup using FasterTransformer. Coupled with the propose CPU-GPU cooperative strategy, FastAttention supports a maximal input length of 256K on 8 V100 GPUs. All the codes will be made available soon.

fastattention, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2410.16663

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Scaled Prompt-Tuning for Few-Shot Natural Language Generation

Hu, Ting, Meinel, Christoph, Yang, Haojin

arXiv.org Artificial IntelligenceSep-13-2023

The increasingly Large Language Models (LLMs) demonstrate stronger language understanding and generation capabilities, while the memory demand and computation cost of fine-tuning LLMs on downstream tasks are non-negligible. Besides, fine-tuning generally requires a certain amount of data from individual tasks whilst data collection cost is another issue to consider in real-world applications. In this work, we focus on Parameter-Efficient Fine-Tuning (PEFT) methods for few-shot Natural Language Generation (NLG), which freeze most parameters in LLMs and tune a small subset of parameters in few-shot cases so that memory footprint, training cost, and labeling cost are reduced while maintaining or even improving the performance. We propose a Scaled Prompt-Tuning (SPT) method which surpasses conventional PT with better performance and generalization ability but without an obvious increase in training cost. Further study on intermediate SPT suggests the superior transferability of SPT in few-shot scenarios, providing a recipe for data-deficient and computation-limited circumstances. Moreover, a comprehensive comparison of existing PEFT methods reveals that certain approaches exhibiting decent performance with modest training cost such as Prefix-Tuning in prior study could struggle in few-shot NLG tasks, especially on challenging datasets.

artificial intelligence, few-shot natural language generation, large language model, (1 more...)

arXiv.org Artificial Intelligence

2309.06759

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Genetic heterogeneity analysis using genetic algorithm and network science

Sha, Zhendong, Chen, Yuanzhu, Hu, Ting

arXiv.org Artificial IntelligenceAug-11-2023

Through genome-wide association studies (GWAS), disease susceptible genetic variables can be identified by comparing the genetic data of individuals with and without a specific disease. However, the discovery of these associations poses a significant challenge due to genetic heterogeneity and feature interactions. Genetic variables intertwined with these effects often exhibit lower effect-size, and thus can be difficult to be detected using machine learning feature selection methods. To address these challenges, this paper introduces a novel feature selection mechanism for GWAS, named Feature Co-selection Network (FCSNet). FCS-Net is designed to extract heterogeneous subsets of genetic variables from a network constructed from multiple independent feature selection runs based on a genetic algorithm (GA), an evolutionary learning algorithm. We employ a non-linear machine learning algorithm to detect feature interaction. We introduce the Community Risk Score (CRS), a synthetic feature designed to quantify the collective disease association of each variable subset. Our experiment showcases the effectiveness of the utilized GA-based feature selection method in identifying feature interactions through synthetic data analysis. Furthermore, we apply our novel approach to a case-control colorectal cancer GWAS dataset. The resulting synthetic features are then used to explain the genetic heterogeneity in an additional case-only GWAS dataset.

artificial intelligence, evolutionary algorithm, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2308.06429

Country: North America > United States > New York > New York County > New York City (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)

Add feedback

Evolutionary approaches to explainable machine learning

Zhou, Ryan, Hu, Ting

arXiv.org Artificial IntelligenceJun-23-2023

Machine learning models are increasingly being used in critical sectors, but their black-box nature has raised concerns about accountability and trust. The field of explainable artificial intelligence (XAI) or explainable machine learning (XML) has emerged in response to the need for human understanding of these models. Evolutionary computing, as a family of powerful optimization and learning tools, has significant potential to contribute to XAI/XML. In this chapter, we provide a brief introduction to XAI/XML and review various techniques in current use for explaining machine learning models. We then focus on how evolutionary computing can be used in XAI/XML, and review some approaches which incorporate EC techniques. We also discuss some open challenges in XAI/XML and opportunities for future research in this field using EC. Our aim is to demonstrate that evolutionary computing is well-suited for addressing current problems in explainability, and to encourage further exploration of these methods to contribute to the development of more transparent, trustworthy and accountable machine learning models.

artificial intelligence, evolutionary approach, machine learning, (1 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-981-99-3814-8_16

2306.14786

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Phenotype Search Trajectory Networks for Linear Genetic Programming

Hu, Ting, Ochoa, Gabriela, Banzhaf, Wolfgang

arXiv.org Artificial IntelligenceJun-23-2023

Genotype-to-phenotype mappings translate genotypic variations such as mutations into phenotypic changes. Neutrality is the observation that some mutations do not lead to phenotypic changes. Studying the search trajectories in genotypic and phenotypic spaces, especially through neutral mutations, helps us to better understand the progression of evolution and its algorithmic behaviour. In this study, we visualise the search trajectories of a genetic programming system as graph-based models, where nodes are genotypes/phenotypes and edges represent their mutational transitions. We also quantitatively measure the characteristics of phenotypes including their genotypic abundance (the requirement for neutrality) and Kolmogorov complexity. We connect these quantified metrics with search trajectory visualisations, and find that more complex phenotypes are under-represented by fewer genotypes and are harder for evolution to discover. Less complex phenotypes, on the other hand, are over-represented by genotypes, are easier to find, and frequently serve as stepping-stones for evolution.

artificial intelligence, evolutionary algorithm, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2211.08516

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Michigan > Ingham County (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

DiriNet: A network to estimate the spatial and spectral degradation functions

Hu, Ting

arXiv.org Artificial IntelligenceJan-27-2022

The spatial and spectral degradation functions are critical to hyper- and multi-spectral image fusion. However, few work has been payed on the estimation of the degradation functions. To learn the spatial response function and the point spread function from the image pairs to be fused, we propose a Dirichlet network, where both functions are properly constrained. Specifically, the spatial response function is constrained with positivity, while the Dirichlet distribution along with a total variation is imposed on the point spread function. To the best of our knowledge, the neural netwrok and the Dirichlet regularization are exclusively investigated, for the first time, to estimate the degradation functions. Both image degradation and fusion experiments demonstrate the effectiveness and superiority of the proposed Dirichlet network.

artificial intelligence, dirinet, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2201.12346

Country: Asia > China (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.35)

Add feedback

Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions

Lei, Yunwen, Hu, Ting, Tang, Ke

arXiv.org Machine LearningFeb-3-2019

Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing theoretical results for SGD applied to nonconvex objective functions are far from mature. For example, existing results require to impose a nontrivial assumption on the uniform boundedness of gradients for all iterates encountered in the learning process, which is hard to verify in practical implementations. In this paper, we establish a rigorous theoretical foundation for SGD in nonconvex learning by showing that this boundedness assumption can be removed without affecting convergence rates. In particular, we establish sufficient conditions for almost sure convergence as well as optimal convergence rates for SGD applied to both general nonconvex objective functions and gradient-dominated objective functions. A linear convergence is further derived in the case with zero variances.

convergence, neural network, optimization problem, (18 more...)

arXiv.org Machine Learning

1902.00908

Country: Asia > China (0.28)

Genre: Research Report (0.40)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Consistency Analysis of an Empirical Minimum Error Entropy Algorithm

Fan, Jun, Hu, Ting, Wu, Qiang, Zhou, Ding-Xuan

arXiv.org Machine LearningDec-17-2014

In this paper we study the consistency of an empirical minimum error entropy (MEE) algorithm in a regression setting. We introduce two types of consistency. The error entropy consistency, which requires the error entropy of the learned function to approximate the minimum error entropy, is shown to be always true if the bandwidth parameter tends to 0 at an appropriate rate. The regression consistency, which requires the learned function to approximate the regression function, however, is a complicated issue. We prove that the error entropy consistency implies the regression consistency for homoskedastic models where the noise is independent of the input variable. But for heteroskedastic models, a counterexample is used to show that the two types of consistency do not coincide. A surprising result is that the regression consistency is always true, provided that the bandwidth parameter tends to infinity at an appropriate rate. Regression consistency of two classes of special models is shown to hold with fixed bandwidth parameter, which further illustrates the complexity of regression consistency of MEE. Fourier transform plays crucial roles in our analysis.

artificial intelligence, consistency, machine learning, (16 more...)

arXiv.org Machine Learning

1412.5272

Country: North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Learning Theory Approach to Minimum Error Entropy Criterion

Hu, Ting, Fan, Jun, Wu, Qiang, Zhou, Ding-Xuan

arXiv.org Machine LearningFeb-22-2013

We consider the minimum error entropy (MEE) criterion and an empirical risk minimization learning algorithm in a regression setting. A learning theory approach is presented for this MEE algorithm and explicit error bounds are provided in terms of the approximation ability and capacity of the involved hypothesis space when the MEE scaling parameter is large. Novel asymptotic analysis is conducted for the generalization error associated with Renyi's entropy and a Parzen window function, to overcome technical difficulties arisen from the essential differences between the classical least squares problems and the MEE setting. A semi-norm and the involved symmetrized least squares error are introduced, which is related to some ranking algorithms.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1208.0848

Country: Asia > China > Hong Kong (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.61)

Add feedback