Goto

Collaborating Authors

 cht


Brain-inspired sparse training enables Transformers and LLMs to perform as fully connected

arXiv.org Artificial Intelligence

This study aims to enlarge our current knowledge on application of brain-inspired network science principles for training artificial neural networks (ANNs) with sparse connectivity. Dynamic sparse training (DST) can reduce the computational demands in ANNs, but faces difficulties to keep peak performance at high sparsity levels. The Cannistraci-Hebb training (CHT) is a brain-inspired method for growing connectivity in DST. CHT leverages a gradient-free, topology-driven link regrowth, which has shown ultra-sparse (1% connectivity or lower) advantage across various tasks compared to fully connected networks. Yet, CHT suffers two main drawbacks: (i) its time complexity is O(Nd^3) - N node network size, d node degree - hence it can apply only to ultra-sparse networks. (ii) it selects top link prediction scores, which is inappropriate for the early training epochs, when the network presents unreliable connections. We propose a GPU-friendly approximation of the CH link predictor, which reduces the computational complexity to O(N^3), enabling a fast implementation of CHT in large-scale models. We introduce the Cannistraci-Hebb training soft rule (CHTs), which adopts a strategy for sampling connections in both link removal and regrowth, balancing the exploration and exploitation of network topology. To improve performance, we integrate CHTs with a sigmoid gradual density decay (CHTss). Empirical results show that, using 1% of connections, CHTs outperforms fully connected networks in MLP on visual classification tasks, compressing some networks to < 30% nodes. Using 5% of the connections, CHTss outperforms fully connected networks in two Transformer-based machine translation tasks. Using 30% of the connections, CHTss achieves superior performance compared to other dynamic sparse training methods in language modeling, and it surpasses the fully connected counterpart in zero-shot evaluations.


Continual Few-Shot Learning Using HyperTransformers

arXiv.org Artificial Intelligence

We focus on the problem of learning without forgetting from multiple tasks arriving sequentially, where each task is defined using a few-shot episode of novel or already seen classes. We approach this problem using the recently published HyperTransformer (HT), a Transformer-based hypernetwork that generates specialized task-specific CNN weights directly from the support set. In order to learn from a continual sequence of tasks, we propose to recursively re-use the generated weights as input to the HT for the next task. This way, the generated CNN weights themselves act as a representation of previously learned tasks, and the HT is trained to update these weights so that the new task can be learned without forgetting past tasks. This approach is different from most continual learning algorithms that typically rely on using replay buffers, weight regularization or task-dependent architectural changes. We demonstrate that our proposed Continual HyperTransformer method equipped with a prototypical loss is capable of learning and retaining knowledge about past tasks for a variety of scenarios, including learning from mini-batches, and task-incremental and class-incremental learning scenarios.


Energy reconstruction for large liquid scintillator detectors with machine learning techniques: aggregated features approach

arXiv.org Artificial Intelligence

Large-scale detectors consisting of a liquid scintillator target surrounded by an array of photo-multiplier tubes (PMTs) are widely used in the modern neutrino experiments: Borexino, KamLAND, Daya Bay, Double Chooz, RENO, and the upcoming JUNO with its satellite detector TAO. Such apparatuses are able to measure neutrino energy which can be derived from the amount of light and its spatial and temporal distribution over PMT channels. However, achieving a fine energy resolution in large-scale detectors is challenging. In this work, we present machine learning methods for energy reconstruction in the JUNO detector, the most advanced of its type. We focus on positron events in the energy range of 0-10 MeV which corresponds to the main signal in JUNO -- neutrinos originated from nuclear reactor cores and detected via the inverse beta decay channel. We consider the following models: Boosted Decision Trees and Fully Connected Deep Neural Network, trained on aggregated features, calculated using the information collected by PMTs. We describe the details of our feature engineering procedure and show that machine learning models can provide the energy resolution $\sigma = 3\%$ at 1 MeV using subsets of engineered features. The dataset for model training and testing is generated by the Monte Carlo method with the official JUNO software.


Exploring Constraint Handling Techniques in Real-world Problems on MOEA/D with Limited Budget of Evaluations

arXiv.org Artificial Intelligence

Finding good solutions for Multi-objective Optimization (MOPs) Problems is considered a hard problem, especially when considering MOPs with constraints. Thus, most of the works in the context of MOPs do not explore in-depth how different constraints affect the performance of MOP solvers. Here, we focus on exploring the effects of different Constraint Handling Techniques (CHTs) on MOEA/D, a commonly used MOP solver when solving complex real-world MOPs. Moreover, we introduce a simple and effective CHT focusing on the exploration of the decision space, the Three Stage Penalty. We explore each of these CHTs in MOEA/D on two simulated MOPs and six analytic MOPs (eight in total). The results of this work indicate that while the best CHT is problem-dependent, our new proposed Three Stage Penalty achieves competitive results and remarkable performance in terms of hypervolume values in the hard simulated car design MOP.


CHT to recruit talent for AI, IoT, big data

#artificialintelligence

Chunghwa Telecom (CHT) plans to launch a large-scale recruitment drive in 2019 as it expects to see an unprecedented wave of up to 5,000 of its employees applying for retirements over the next five years. As many as 1,600 jobs would be available at the Taiwan-based telecom carrier in 2019, according to company chairman David Cheng, who added that the number of new employees hired each year will be over 1,000 for a few years after 2019. However, to cope with changing industry developments, including the forthcoming 5G era and increasing competition, the company plans to hire more talent with expertise related to AI, big data analysis, IoT, mobile payment, 5G and information security, Cheng said. Including its subsidiaries, CPT currently has about 33,500 employees.