Goto

Collaborating Authors

 Gorizia


Extracting domain-specific terms using contextual word embeddings

arXiv.org Artificial Intelligence

Automated terminology extraction refers to the task of extracting meaningful terms from domain-specific texts. This paper proposes a novel machine learning approach to terminology extraction, which combines features from traditional term extraction systems with novel contextual features derived from contextual word embeddings. Instead of using a predefined list of part-of-speech patterns, we first analyse a new term-annotated corpus RSDO5 for the Slovenian language and devise a set of rules for term candidate selection and then generate statistical, linguistic and context-based features. We use a support-vector machine algorithm to train a classification model, evaluate it on the four domains (biomechanics, linguistics, chemistry, veterinary) of the RSDO5 corpus and compare the results with state-of-art term extraction approaches for the Slovenian language. Our approach provides significant improvements in terms of F1 score over the previous state-of-the-art, which proves that contextual word embeddings are valuable for improving term extraction.1. Introduction Automated terminology extraction (ATE) refers to the task of extracting meaningful terms from domain-specific texts. Terms are single-word (SWU) or multi-word units (MWU) of knowledge, which are relevant for a particular domain. Since manual identification of terms is costly and time consuming, ATE approaches can reduce the effort needed to generate relevant domain-specific terms. Recognizing and extracting domain-specific terms, which is useful in various fields, such as translation, dictionary creation, ontology generation and others, remains a difficult task.


Cosmology with Persistent Homology: Parameter Inference via Machine Learning

arXiv.org Artificial Intelligence

Building upon [2308.02636], this article investigates the potential constraining power of persistent homology for cosmological parameters and primordial non-Gaussianity amplitudes in a likelihood-free inference pipeline. We evaluate the ability of persistence images (PIs) to infer parameters, compared to the combined Power Spectrum and Bispectrum (PS/BS), and we compare two types of models: neural-based, and tree-based. PIs consistently lead to better predictions compared to the combined PS/BS when the parameters can be constrained (i.e., for $\{\Omega_{\rm m}, \sigma_8, n_{\rm s}, f_{\rm NL}^{\rm loc}\}$). PIs perform particularly well for $f_{\rm NL}^{\rm loc}$, showing the promise of persistent homology in constraining primordial non-Gaussianity. Our results show that combining PIs with PS/BS provides only marginal gains, indicating that the PS/BS contains little extra or complementary information to the PIs. Finally, we provide a visualization of the most important topological features for $f_{\rm NL}^{\rm loc}$ and for $\Omega_{\rm m}$. This reveals that clusters and voids (0-cycles and 2-cycles) are most informative for $\Omega_{\rm m}$, while $f_{\rm NL}^{\rm loc}$ uses the filaments (1-cycles) in addition to the other two types of topological features.


Fractional-order spike-timing-dependent gradient descent for multi-layer spiking neural networks

arXiv.org Artificial Intelligence

Accumulated detailed knowledge about the neuronal activities in human brains has brought more attention to bio-inspired spiking neural networks (SNNs). In contrast to non-spiking deep neural networks (DNNs), SNNs can encode and transmit spatiotemporal information more efficiently by exploiting biologically realistic and low-power event-driven neuromorphic architectures. However, the supervised learning of SNNs still remains a challenge because the spike-timing-dependent plasticity (STDP) of connected spiking neurons is difficult to implement and interpret in existing backpropagation learning schemes. This paper proposes a fractional-order spike-timing-dependent gradient descent (FO-STDGD) learning model by considering a derived nonlinear activation function that describes the relationship between the quasi-instantaneous firing rate and the temporal membrane potentials of nonleaky integrate-and-fire neurons. The training strategy can be generalized to any fractional orders between 0 and 2 since the FO-STDGD incorporates the fractional gradient descent method into the calculation of spike-timing-dependent loss gradients. The proposed FO-STDGD model is tested on the MNIST and DVS128 Gesture datasets and its accuracy under different network structure and fractional orders is analyzed. It can be found that the classification accuracy increases as the fractional order increases, and specifically, the case of fractional order 1.9 improves by 155% relative to the case of fractional order 1 (traditional gradient descent). In addition, our scheme demonstrates the state-of-the-art computational efficacy for the same SNN structure and training epochs.


Optimized neural forms for solving ordinary differential equations

arXiv.org Artificial Intelligence

A critical issue in approximating solutions of ordinary differential equations using neural networks is the exact satisfaction of the boundary or initial conditions. For this purpose, neural forms have been introduced, i.e., functional expressions that depend on neural networks which, by design, satisfy the prescribed conditions exactly. Expanding upon prior progress, the present work contributes in three distinct aspects. First, it presents a novel formalism for crafting optimized neural forms. Second, it outlines a method for establishing an upper bound on the absolute deviation from the exact solution. Third, it introduces a technique for converting problems with Neumann or Robin conditions into equivalent problems with parametric Dirichlet conditions. The proposed optimized neural forms were numerically tested on a set of diverse problems, encompassing first-order and second-order ordinary differential equations, as well as first-order systems. Stiff and delay differential equations were also considered. The obtained solutions were compared against solutions obtained via Runge-Kutta methods and exact solutions wherever available. The reported results and analysis verify that in addition to the exact satisfaction of the boundary or initial conditions, optimized neural forms provide closed-form solutions of superior interpolation capability and controllable overall accuracy.


AHAM: Adapt, Help, Ask, Model -- Harvesting LLMs for literature mining

arXiv.org Artificial Intelligence

In an era marked by a rapid increase in scientific publications, researchers grapple with the challenge of keeping pace with field-specific advances. We present the `AHAM' methodology and a metric that guides the domain-specific \textbf{adapt}ation of the BERTopic topic modeling framework to improve scientific text analysis. By utilizing the LLaMa2 generative language model, we generate topic definitions via one-shot learning by crafting prompts with the \textbf{help} of domain experts to guide the LLM for literature mining by \textbf{asking} it to model the topic names. For inter-topic similarity evaluation, we leverage metrics from language generation and translation processes to assess lexical and semantic similarity of the generated topics. Our system aims to reduce both the ratio of outlier topics to the total number of topics and the similarity between topic definitions. The methodology has been assessed on a newly gathered corpus of scientific papers on literature-based discovery. Through rigorous evaluation by domain experts, AHAM has been validated as effective in uncovering intriguing and novel insights within broad research areas. We explore the impact of domain adaptation of sentence-transformers for the task of topic \textbf{model}ing using two datasets, each specialized to specific scientific domains within arXiv and medarxiv. We evaluate the impact of data size, the niche of adaptation, and the importance of domain adaptation. Our results suggest a strong interaction between domain adaptation and topic modeling precision in terms of outliers and topic definitions.


Learning Channel Importance for High Content Imaging with Interpretable Deep Input Channel Mixing

arXiv.org Machine Learning

Uncovering novel drug candidates for treating complex diseases remain one of the most challenging tasks in early discovery research. To tackle this challenge, biopharma research established a standardized high content imaging protocol that tags different cellular compartments per image channel. In order to judge the experimental outcome, the scientist requires knowledge about the channel importance with respect to a certain phenotype for decoding the underlying biology. In contrast to traditional image analysis approaches, such experiments are nowadays preferably analyzed by deep learning based approaches which, however, lack crucial information about the channel importance. To overcome this limitation, we present a novel approach which utilizes multi-spectral information of high content images to interpret a certain aspect of cellular biology. To this end, we base our method on image blending concepts with alpha compositing for an arbitrary number of channels. More specifically, we introduce DCMIX, a lightweight, scaleable and end-to-end trainable mixing layer which enables interpretable predictions in high content imaging while retaining the benefits of deep learning based methods. We employ an extensive set of experiments on both MNIST and RXRX1 datasets, demonstrating that DCMIX learns the biologically relevant channel importance without scarifying prediction performance.


SMASH Open Call 1 - 2023 • SMASH

#artificialintelligence

SMASH is an innovative, intersectoral, career-development training program for outstanding postdoctoral researchers, co-funded by the Marie Skłodowska-Curie Actions COFUND program. SMASH is open to researchers around the world who are interested in developing cutting-edge machine learning applications for science and humanities. During the five years of the SMASH project (2023-28) and over three planned calls, SMASH aims to hire 50 individuals for 2-year full-time postdoctoral contracts with highly attractive conditions. Fellows will be hosted by five leading Slovenian research institutions: the University of Nova Gorica, the University of Ljubljana, the Jožef Stefan Institute, the Institute of Information Science*, and the Slovenian Environment Agency*. Applicants should propose ambitious research projects in one of SMASH's five key research areas, that are centered on the use of cutting-edge machine learning, or more broadly, artificial intelligence techniques, to address some of the world's most challenging questions in: Applicants should choose the SMASH host institution and supervisor with whom they will coordinate the project proposal preparation.


Link Analysis meets Ontologies: Are Embeddings the Answer?

arXiv.org Artificial Intelligence

The increasing amounts of semantic resources offer valuable storage of human knowledge; however, the probability of wrong entries increases with the increased size. The development of approaches that identify potentially spurious parts of a given knowledge base is thus becoming an increasingly important area of interest. In this work, we present a systematic evaluation of whether structure-only link analysis methods can already offer a scalable means to detecting possible anomalies, as well as potentially interesting novel relation candidates. Evaluating thirteen methods on eight different semantic resources, including Gene Ontology, Food Ontology, Marine Ontology and similar, we demonstrated that structure-only link analysis could offer scalable anomaly detection for a subset of the data sets. Further, we demonstrated that by considering symbolic node embedding, explanations of the predictions (links) could be obtained, making this branch of methods potentially more valuable than the black-box only ones. To our knowledge, this is currently one of the most extensive systematic studies of the applicability of different types of link analysis methods across semantic resources from different domains.


Transfer Learning for Node Regression Applied to Spreading Prediction

arXiv.org Artificial Intelligence

Understanding how information propagates in real-life complex networks yields a better understanding of dynamic processes such as misinformation or epidemic spreading. The recently introduced branch of machine learning methods for learning node representations offers many novel applications, one of them being the task of spreading prediction addressed in this paper. We explore the utility of the state-of-the-art node representation learners when used to assess the effects of spreading from a given node, estimated via extensive simulations. Further, as many real-life networks are topologically similar, we systematically investigate whether the learned models generalize to previously unseen networks, showing that in some cases very good model transfer can be obtained. This work is one of the first to explore transferability of the learned representations for the task of node regression; we show there exist pairs of networks with similar structure between which the trained models can be transferred (zero-shot), and demonstrate their competitive performance. To our knowledge, this is one of the first attempts to evaluate the utility of zero-shot transfer for the task of node regression.


ReliefE: Feature Ranking in High-dimensional Spaces via Manifold Embeddings

arXiv.org Machine Learning

Feature ranking has been widely adopted in machine learning applications such as high-throughput biology and social sciences. The approaches of the popular Relief family of algorithms assign importances to features by iteratively accounting for nearest relevant and irrelevant instances. Despite their high utility, these algorithms can be computationally expensive and not-well suited for high-dimensional sparse input spaces. In contrast, recent embedding-based methods learn compact, low-dimensional representations, potentially facilitating down-stream learning capabilities of conventional learners. This paper explores how the Relief branch of algorithms can be adapted to benefit from (Riemannian) manifold-based embeddings of instance and target spaces, where a given embedding's dimensionality is intrinsic to the dimensionality of the considered data set. The developed ReliefE algorithm is faster and can result in better feature rankings, as shown by our evaluation on 20 real-life data sets for multi-class and multi-label classification tasks. The utility of ReliefE for high-dimensional data sets is ensured by its implementation that utilizes sparse matrix algebraic operations. Finally, the relation of ReliefE to other ranking algorithms is studied via the Fuzzy Jaccard Index.