Barnes, Nick
A Comprehensive Overview of Large Language Models
Naveed, Humza, Khan, Asad Ullah, Qiu, Shi, Saqib, Muhammad, Anwar, Saeed, Usman, Muhammad, Akhtar, Naveed, Barnes, Nick, Mian, Ajmal
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks and beyond. This success of LLMs has led to a large influx of research contributions in this direction. These works encompass diverse topics such as architectural innovations, better training strategies, context length improvements, fine-tuning, multi-modal LLMs, robotics, datasets, benchmarking, efficiency, and more. With the rapid development of techniques and regular breakthroughs in LLM research, it has become considerably challenging to perceive the bigger picture of the advances in this direction. Considering the rapidly emerging plethora of literature on LLMs, it is imperative that the research community is able to benefit from a concise yet comprehensive overview of the recent developments in this field. This article provides an overview of the existing literature on a broad range of LLM-related concepts. Our self-contained comprehensive overview of LLMs discusses relevant background concepts along with covering the advanced topics at the frontier of research in LLMs. This review article is intended to not only provide a systematic survey but also a quick comprehensive reference for the researchers and practitioners to draw insights from extensive informative summaries of the existing works to advance the LLM research.
Model Calibration in Dense Classification with Adaptive Label Perturbation
Liu, Jiawei, Ye, Changkun, Wang, Shan, Cui, Ruikai, Zhang, Jing, Zhang, Kaihao, Barnes, Nick
For safety-related applications, it is crucial to produce trustworthy deep neural networks whose prediction is associated with confidence that can represent the likelihood of correctness for subsequent decision-making. Existing dense binary classification models are prone to being over-confident. To improve model calibration, we propose Adaptive Stochastic Label Perturbation (ASLP) which learns a unique label perturbation level for each training image. ASLP employs our proposed Self-Calibrating Binary Cross Entropy (SC-BCE) loss, which unifies label perturbation processes including stochastic approaches (like DisturbLabel), and label smoothing, to correct calibration while maintaining classification rates. ASLP follows Maximum Entropy Inference of classic statistical mechanics to maximise prediction entropy with respect to missing information. It performs this while: (1) preserving classification accuracy on known data as a conservative solution, or (2) specifically improves model calibration degree by minimising the gap between the prediction accuracy and expected confidence of the target training label. Extensive results demonstrate that ASLP can significantly improve calibration degrees of dense binary classification models on both in-distribution and out-of-distribution data. The code is available on https://github.com/Carlisle-Liu/ASLP.
Robust normalizing flows using Bernstein-type polynomials
Ramasinghe, Sameera, Fernando, Kasun, Khan, Salman, Barnes, Nick
We propose a framework to construct (Kobyzev et al., 2020). NFs based on increasing triangular maps and Bernstein-type polynomials. Compared to the In contrast, normalizing flows (NFs) are a category of generative existing (universal) NF frameworks, our method models that enable exact density computation and provides compelling advantages like theoretical efficient sampling. Since the seminal work by Rezende upper bounds for the approximation error, robustness, & Mohamed (2015), NFs have been gaining increasing attention higher interpretability, suitability for compactly from the machine learning community due to the supported densities, and the ability to employ attractive properties mentioned earlier. In the abstract, NFs higher degree polynomials without training consist of a series diffeomorphisms that transforms a simple instability. Moreover, we provide a constructive distribution into a more complex one, which in turn universality proof, which gives analytic expressions allows an analytical density estimation of samples. In the of the approximations for known transformations.
Attention Guided Semantic Relationship Parsing for Visual Question Answering
Farazi, Moshiur, Khan, Salman, Barnes, Nick
Humans explain inter-object relationships with semantic labels that demonstrate a high-level understanding required to perform complex Vision-Language tasks such as Visual Question Answering (VQA). However, existing VQA models represent relationships as a combination of object-level visual features which constrain a model to express interactions between objects in a single domain, while the model is trying to solve a multi-modal task. In this paper, we propose a general purpose semantic relationship parser which generates a semantic feature vector for each subject-predicate-object triplet in an image, and a Mutual and Self Attention (MSA) mechanism that learns to identify relationship triplets that are important to answer the given question. To motivate the significance of semantic relationships, we show an oracle setting with ground-truth relationship triplets, where our model achieves a ~25% accuracy gain over the closest state-of-the-art model on the challenging GQA dataset. Further, with our semantic parser, we show that our model outperforms other comparable approaches on VQA and GQA datasets.
Blended Convolution and Synthesis for Efficient Discrimination of 3D Shapes
Ramasinghe, Sameera, Khan, Salman, Barnes, Nick, Gould, Stephen
Existing networks directly learn feature representations on 3D point clouds for shape analysis. We argue that 3D point clouds are highly redundant and hold irregular (permutation-invariant) structure, which makes it difficult to achieve inter-class discrimination efficiently. In this paper, we propose a two-faceted solution to this problem that is seamlessly integrated in a single `Blended Convolution and Synthesis' layer. This fully differentiable layer performs two critical tasks in succession. In the first step, it projects the input 3D point clouds into a latent 3D space to synthesize a highly compact and more inter-class discriminative point cloud representation. Since, 3D point clouds do not follow a Euclidean topology, standard 2/3D Convolutional Neural Networks offer limited representation capability. Therefore, in the second step, it uses a novel 3D convolution operator functioning inside the unit ball ($\mathbb{B}^3$) to extract useful volumetric features. We extensively derive formulae to achieve both translation and rotation of our novel convolution kernels. Finally, using the proposed techniques we present an extremely light-weight, end-to-end architecture that achieves compelling results on 3D shape recognition and retrieval.
Volumetric Convolution: Automatic Representation Learning in Unit Ball
Ramasinghe, Sameera, Khan, Salman, Barnes, Nick
Convolution is an efficient technique to obtain abstract feature representations using hierarchical layers in deep networks. Although performing convolution in Euclidean geometries is fairly straightforward, its extension to other topological spaces---such as a sphere ($\mathbb{S}^2$) or a unit ball ($\mathbb{B}^3$)---entails unique challenges. In this work, we propose a novel `\emph{volumetric convolution}' operation that can effectively convolve arbitrary functions in $\mathbb{B}^3$. We develop a theoretical framework for \emph{volumetric convolution} based on Zernike polynomials and efficiently implement it as a differentiable and an easily pluggable layer for deep networks. Furthermore, our formulation leads to derivation of a novel formula to measure the symmetry of a function in $\mathbb{B}^3$ around an arbitrary axis, that is useful in 3D shape analysis tasks. We demonstrate the efficacy of proposed volumetric convolution operation on a possible use-case i.e., 3D object recognition task.
AI@NICTA
Barnes, Nick (NICTA) | Baumgartner, Peter (NICTA) | Caetano, Tiberio (NICTA) | Durrant-Whyte, Hugh (NICTA) | Klein, Gerwin (NICTA) | Sanderson, Penelope (University of Queensland) | Sattar, Abdul (Griffith University) | Stuckey, Peter (The University of Melbourne) | Thiebaux, Sylvie (The Australian National University) | Hentenryck, Pascal Van (University of Melbourne) | Walsh, Toby (NICTA)
NICTA is Australia's Information and Communications Technology (ICT) Centre of Excellence. It is the largest organization in Australia dedicated to ICT research. While it has close links with local universities, it is in fact an independent but not-for-profit company in the business of doing research, commercializing that research and training PhD students to do that research. Much of the work taking place at NICTA involves various topics in artificial intelligence. In this article, we survey some of the AI work being undertaken at NICTA.
Totally Corrective Boosting for Regularized Risk Minimization
Shen, Chunhua, Li, Hanxi, Barnes, Nick
Consideration of the primal and dual problems together leads to important new insights into the characteristics of boosting algorithms. In this work, we propose a general framework that can be used to design new boosting algorithms. A wide variety of machine learning problems essentially minimize a regularized risk functional. We show that the proposed boosting framework, termed CGBoost, can accommodate various loss functions and different regularizers in a totally-corrective optimization fashion. We show that, by solving the primal rather than the dual, a large body of totally-corrective boosting algorithms can actually be efficiently solved and no sophisticated convex optimization solvers are needed. We also demonstrate that some boosting algorithms like AdaBoost can be interpreted in our framework--even their optimization is not totally corrective. We empirically show that various boosting algorithms based on the proposed framework perform similarly on the UCIrvine machine learning datasets [1] that we have used in the experiments.