Goto

Collaborating Authors

 Overview


Exploring Prompt Engineering: A Systematic Review with SWOT Analysis

arXiv.org Artificial Intelligence

In this paper, we conduct a comprehensive SWOT analysis of prompt engineering techniques within the realm of Large Language Models (LLMs). Emphasizing linguistic principles, we examine various techniques to identify their strengths, weaknesses, opportunities, and threats. Our findings provide insights into enhancing AI interactions and improving language model comprehension of human prompts. The analysis covers techniques including template-based approaches and fine-tuning, addressing the problems and challenges associated with each. The conclusion offers future research directions aimed at advancing the effectiveness of prompt engineering in optimizing human-machine communication.


Artificial intelligence techniques in inherited retinal diseases: A review

arXiv.org Artificial Intelligence

Inherited retinal diseases (IRDs) are a diverse group of genetic disorders that lead to progressive vision loss and are a major cause of blindness in working-age adults. The complexity and heterogeneity of IRDs pose significant challenges in diagnosis, prognosis, and management. Recent advancements in artificial intelligence (AI) offer promising solutions to these challenges. However, the rapid development of AI techniques and their varied applications have led to fragmented knowledge in this field. This review consolidates existing studies, identifies gaps, and provides an overview of AI's potential in diagnosing and managing IRDs. It aims to structure pathways for advancing clinical applications by exploring AI techniques like machine learning and deep learning, particularly in disease detection, progression prediction, and personalized treatment planning. Special focus is placed on the effectiveness of convolutional neural networks in these areas. Additionally, the integration of explainable AI is discussed, emphasizing its importance in clinical settings to improve transparency and trust in AI-based systems. The review addresses the need to bridge existing gaps in focused studies on AI's role in IRDs, offering a structured analysis of current AI techniques and outlining future research directions. It concludes with an overview of the challenges and opportunities in deploying AI for IRDs, highlighting the need for interdisciplinary collaboration and the continuous development of robust, interpretable AI models to advance clinical applications.


Answering Questions in Stages: Prompt Chaining for Contract QA

arXiv.org Artificial Intelligence

Finding answers to legal questions about clauses in contracts is an important form of analysis in many legal workflows (e.g., understanding market trends, due diligence, risk mitigation) but more important is being able to do this at scale. Prior work showed that it is possible to use large language models with simple zero-shot prompts to generate structured answers to questions, which can later be incorporated into legal workflows. Such prompts, while effective on simple and straightforward clauses, fail to perform when the clauses are long and contain information not relevant to the question. In this paper, we propose two-stage prompt chaining to produce structured answers to multiple-choice and multiple-select questions and show that they are more effective than simple prompts on more nuanced legal text. We analyze situations where this technique works well and areas where further refinement is needed, especially when the underlying linguistic variations are more than can be captured by simply specifying possible answers. Finally, we discuss future research that seeks to refine this work by improving stage one results by making them more question-specific.


Towards Generalisable Time Series Understanding Across Domains

arXiv.org Artificial Intelligence

In natural language processing and computer vision, self-supervised pre-training on large datasets unlocks foundational model capabilities across domains and tasks. However, this potential has not yet been realised in time series analysis, where existing methods disregard the heterogeneous nature of time series characteristics. Time series are prevalent in many domains, including medicine, engineering, natural sciences, and finance, but their characteristics vary significantly in terms of variate count, inter-variate relationships, temporal dynamics, and sampling frequency. This inherent heterogeneity across domains prevents effective pre-training on large time series corpora. To address this issue, we introduce OTiS, an open model for general time series analysis, that has been specifically designed to handle multi-domain heterogeneity. We propose a novel pre-training paradigm including a tokeniser with learnable domain-specific signatures, a dual masking strategy to capture temporal causality, and a normalised cross-correlation loss to model long-range dependencies. Our model is pre-trained on a large corpus of 640, 187 samples and 11 billion time points spanning 8 distinct domains, enabling it to analyse time series from any (unseen) domain. In comprehensive experiments across 15 diverse applications - including classification, regression, and forecasting - OTiS showcases its ability to accurately capture domain-specific data characteristics and demonstrates its competitiveness against state-of-the-art baselines. Our code and pre-trained weights are publicly available at https://github.com/oetu/otis. Self-supervised pre-training paradigms are designed to account for the specific properties of language (Radford et al., 2018; Touvron et al., 2023; Chowdhery et al., 2023) or imaging (Zhou et al., 2022; Cherti et al., 2023; Oquab et al., 2024), unlocking foundational model capabilities that apply to a wide range of domains and downstream tasks. This potential, however, remains largely unrealised in time series due to the lack of self-supervised pre-training paradigms that account for the heterogeneity of time series across domains.


Pap2Pat: Towards Automated Paper-to-Patent Drafting using Chunk-based Outline-guided Generation

arXiv.org Artificial Intelligence

The patent domain is gaining attention in natural language processing research, offering practical applications in streamlining the patenting process and providing challenging benchmarks for large language models (LLMs). However, the generation of the description sections of patents, which constitute more than 90% of the patent document, has not been studied to date. We address this gap by introducing the task of outline-guided paper-to-patent generation, where an academic paper provides the technical specification of the invention and an outline conveys the desired patent structure. We present PAP2PAT, a new challenging benchmark of 1.8k patent-paper pairs with document outlines, collected using heuristics that reflect typical research lab practices. Our experiments with current open-weight LLMs and outline-guided chunk-based generation show that they can effectively use information from the paper but struggle with repetitions, likely due to the inherent repetitiveness of patent language. We release our data and code.


Personal Intelligence System UniLM: Hybrid On-Device Small Language Model and Server-Based Large Language Model for Malay Nusantara

arXiv.org Artificial Intelligence

In contexts with limited computational and data resources, high-resource language models often prove inadequate, particularly when addressing the specific needs of Malay languages. This paper introduces a Personal Intelligence System designed to efficiently integrate both on-device and server-based models. The system incorporates SLiM-34M for on-device processing, optimized for low memory and power usage, and MANYAK-1.3B for server-based tasks, allowing for scalable, high-performance language processing. The models achieve significant results across various tasks, such as machine translation, question-answering, and translate IndoMMLU. Particularly noteworthy is SLiM-34M's ability to achieve a high improvement in accuracy compared to other LLMs while using 2 times fewer pre-training tokens. This work challenges the prevailing assumption that large-scale computational resources are necessary to build effective language models, contributing to the development of resource-efficient models for the Malay language with the unique orchestration between SLiM-34M and MANYAK-1.3B.


Compositional Entailment Learning for Hyperbolic Vision-Language Models

arXiv.org Artificial Intelligence

Image-text representation learning forms a cornerstone in vision-language models, where pairs of images and textual descriptions are contrastively aligned in a shared embedding space. Since visual and textual concepts are naturally hierarchical, recent work has shown that hyperbolic space can serve as a high-potential manifold to learn vision-language representation with strong downstream performance. In this work, for the first time we show how to fully leverage the innate hierarchical nature of hyperbolic embeddings by looking beyond individual image-text pairs. We propose Compositional Entailment Learning for hyperbolic vision-language models. The idea is that an image is not only described by a sentence but is itself a composition of multiple object boxes, each with their own textual description. Such information can be obtained freely by extracting nouns from sentences and using openly available localized grounding models. We show how to hierarchically organize images, image boxes, and their textual descriptions through contrastive and entailment-based objectives. Empirical evaluation on a hyperbolic visionlanguage model trained with millions of image-text pairs shows that the proposed compositional learning approach outperforms conventional Euclidean CLIP learning, as well as recent hyperbolic alternatives, with better zero-shot and retrieval generalization and clearly stronger hierarchical performance. Vision-language modeling has witnessed rapid progress in recent years with innovative approaches such as CLIP (Radford et al., 2021) and ALIGN (Jia et al., 2021) using extensive vision-language data to train encoders for understanding visual and textual content simultaneously. Such encoders align visual scenes with textual descriptions in a shared high-dimensional Euclidean space, facilitating semantic understanding (Radford et al., 2021). While effective, conventional vision-language models only take a holistic approach to image-text representation learning, neglecting the intrinsic hierarchy and composition of elements within images. Indeed, a visual scene is commonly composed of multiple objects interacting with one another to form a precise context.


Automatic differentiation in ML: Where we are and where we should be going

Neural Information Processing Systems

We review the current state of automatic differentiation (AD) for array programming in machine learning (ML), including the different approaches such as operator overloading (OO) and source transformation (ST) used for AD, graph-based intermediate representations for programs, and source languages. Based on these insights, we introduce a new graph-based intermediate representation (IR) which specifically aims to efficiently support fully-general AD for array programming. Unlike existing dataflow programming representations in ML frameworks, our IR naturally supports function calls, higher-order functions and recursion, making ML models easier to implement. The ability to represent closures allows us to perform AD using ST without a tape, making the resulting derivative (adjoint) program amenable to ahead-of-time optimization using tools from functional language compilers, and enabling higher-order derivatives. Lastly, we introduce a proof of concept compiler toolchain called Myia which uses a subset of Python as a front end.


Reviews: Learning to Compose Domain-Specific Transformations for Data Augmentation

Neural Information Processing Systems

This paper addresses an interesting and new problem to augment training data in a learnable and principled manner. Modern machine learning systems are known for their'hunger for data' and until now state-of-the-art approaches have relied mainly on heuristics to augment labeled training data. This paper tries to reduce the tedious task of finding a good combination of data augmentation strategies with best parameters by learning a sequence of best data augmentation strategies in a generative adversarial framework while working with unsupervised data. The motivation behind the problem is to reduce human labor without compromising the final discriminative classification performance. The problem formulation is pretty clear from the text.


Improving Portfolio Optimization Results with Bandit Networks

arXiv.org Artificial Intelligence

In Reinforcement Learning (RL), multi-armed Bandit (MAB) problems have found applications across diverse domains such as recommender systems, healthcare, and finance. Traditional MAB algorithms typically assume stationary reward distributions, which limits their effectiveness in real-world scenarios characterized by non-stationary dynamics. This paper addresses this limitation by introducing and evaluating novel Bandit algorithms designed for non-stationary environments. First, we present the Adaptive Discounted Thompson Sampling (ADTS) algorithm, which enhances adaptability through relaxed discounting and sliding window mechanisms to better respond to changes in reward distributions. We then extend this approach to the Portfolio Optimization problem by introducing the Combinatorial Adaptive Discounted Thompson Sampling (CADTS) algorithm, which addresses computational challenges within Combinatorial Bandits and improves dynamic asset allocation. Additionally, we propose a novel architecture called Bandit Networks, which integrates the outputs of ADTS and CADTS, thereby mitigating computational limitations in stock selection. Through extensive experiments using real financial market data, we demonstrate the potential of these algorithms and architectures in adapting to dynamic environments and optimizing decision-making processes. For instance, the proposed bandit network instances present superior performance when compared to classic portfolio optimization approaches, such as capital asset pricing model, equal weights, risk parity, and Markovitz, with the best network presenting an out-of-sample Sharpe Ratio 20% higher than the best performing classical model.