Passalis, Nikolaos
Large Models in Dialogue for Active Perception and Anomaly Detection
Chamiti, Tzoulio, Passalis, Nikolaos, Tefas, Anastasios
Autonomous aerial monitoring is an important task aimed at gathering information from areas that may not be easily accessible by humans. At the same time, this task often requires recognizing anomalies from a significant distance and/or not previously encountered in the past. In this paper, we propose a novel framework that leverages the advanced capabilities provided by Large Language Models (LLMs) to actively collect information and perform anomaly detection in novel scenes. To this end, we propose an LLM-based model dialogue approach, in which two deep learning models engage in a dialogue to actively control a drone to increase perception and anomaly detection accuracy. We conduct our experiments in a high fidelity simulation environment where an LLM is provided with a predetermined set of natural language movement commands mapped into executable code functions. Additionally, we deploy a multimodal Visual Question Answering (VQA) model charged with the task of visual question answering and captioning. By engaging the two models in conversation, the LLM asks exploratory questions while simultaneously flying a drone into different parts of the scene, providing a novel way to implement active perception. By leveraging LLM's reasoning ability, we output an improved detailed description of the scene going beyond existing static perception approaches. In addition to information gathering, our approach is utilized for anomaly detection and our results demonstrate the proposed method's effectiveness in informing and alerting about potential hazards.
Leveraging Deep Learning and Online Source Sentiment for Financial Portfolio Management
Nousi, Paraskevi, Avramelou, Loukia, Rodinos, Georgios, Tzelepi, Maria, Manousis, Theodoros, Tsampazis, Konstantinos, Stefanidis, Kyriakos, Spanos, Dimitris, Kirtas, Manos, Tosidis, Pavlos, Tsantekidis, Avraam, Passalis, Nikolaos, Tefas, Anastasios
Financial markets analysis has been and remains a topic of intense research interest since the seminal work of Markowitz [1] detailing his theory on portfolio choice, for which he was awarded the Nobel Prize in 1990. The rapid advancements of Machine Learning (ML) and, more specifically those made in the field of Deep Learning (DL) and Deep Reinforcement Learning (DRL), further fueled interest in the field. Financial markets analysts began using ML-based techniques and combining them with their own knowledge of the field [2]. As early as 1992, Neural Networks (NNs) were already being used for equity index futures trading [3]. More recently, DL research in financial market analysis has focused on high frequency trading, i.e., an algorithmic financial trading method where high speeds and large volumes are the main characteristics. The kind of data used in works that focus on this type of trading include Limit Order Book (LOB) data [4] as well as candle data for assets such as FOREX or Cryptocurrencies [5]. Candle data contain the Open, High, Low and Close prices for assets in a requested frequency, e.g., at the minute or hour level. Price forecasting is a first step towards solving the very complex task of portfolio management, and has proved to be a sufficiently difficult problem to tackle itself. One way to sufficiently solve it is by transforming the problem into one of classification, i.e., predicting the price movement instead of its actual value in the next step [4].
Non-negative isomorphic neural networks for photonic neuromorphic accelerators
Kirtas, Manos, Passalis, Nikolaos, Pleros, Nikolaos, Tefas, Anastasios
Neuromorphic photonic accelerators are becoming increasingly popular, since they can significantly improve computation speed and energy efficiency, leading to femtojoule per MAC efficiency. However, deploying existing DL models on such platforms is not trivial, since a great range of photonic neural network architectures relies on incoherent setups and power addition operational schemes that cannot natively represent negative quantities. This results in additional hardware complexity that increases cost and reduces energy efficiency. To overcome this, we can train non-negative neural networks and potentially exploit the full range of incoherent neuromorphic photonic capabilities. However, existing approaches cannot achieve the same level of accuracy as their regular counterparts, due to training difficulties, as also recent evidence suggests. To this end, we introduce a methodology to obtain the non-negative isomorphic equivalents of regular neural networks that meet requirements of neuromorphic hardware, overcoming the aforementioned limitations. Furthermore, we also introduce a sign-preserving optimization approach that enables training of such isomorphic networks in a non-negative manner.
Deep Residual Error and Bag-of-Tricks Learning for Gravitational Wave Surrogate Modeling
Fragkouli, Styliani-Christina, Nousi, Paraskevi, Passalis, Nikolaos, Iosif, Panagiotis, Stergioulas, Nikolaos, Tefas, Anastasios
Deep learning methods have been employed in gravitational-wave astronomy to accelerate the construction of surrogate waveforms for the inspiral of spin-aligned black hole binaries, among other applications. We face the challenge of modeling the residual error of an artificial neural network that models the coefficients of the surrogate waveform expansion (especially those of the phase of the waveform) which we demonstrate has sufficient structure to be learnable by a second network. Adding this second network, we were able to reduce the maximum mismatch for waveforms in a validation set by 13.4 times. We also explored several other ideas for improving the accuracy of the surrogate model, such as the exploitation of similarities between waveforms, the augmentation of the training set, the dissection of the input space, using dedicated networks per output coefficient and output augmentation. In several cases, small improvements can be observed, but the most significant improvement still comes from the addition of a second network that models the residual error. Since the residual error for more general surrogate waveform models (when e.g., eccentricity is included) may also have a specific structure, one can expect our method to be applicable to cases where the gain in accuracy could lead to significant gains in computational time.
Multiplicative update rules for accelerating deep learning training and increasing robustness
Kirtas, Manos, Passalis, Nikolaos, Tefas, Anastasios
Even nowadays, where Deep Learning (DL) has achieved state-of-the-art performance in a wide range of research domains, accelerating training and building robust DL models remains a challenging task. To this end, generations of researchers have pursued to develop robust methods for training DL architectures that can be less sensitive to weight distributions, model architectures and loss landscapes. However, such methods are limited to adaptive learning rate optimizers, initialization schemes, and clipping gradients without investigating the fundamental rule of parameters update. Although multiplicative updates have contributed significantly to the early development of machine learning and hold strong theoretical claims, to best of our knowledge, this is the first work that investigate them in context of DL training acceleration and robustness. In this work, we propose an optimization framework that fits to a wide range of optimization algorithms and enables one to apply alternative update rules. To this end, we propose a novel multiplicative update rule and we extend their capabilities by combining it with a traditional additive update term, under a novel hybrid update method. We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule and we experimentally demonstrate their effectiveness in a wide range of task and optimization methods. Such tasks ranging from convex and non-convex optimization to difficult image classification benchmarks applying a wide range of traditionally used optimization methods and Deep Neural Network (DNN) architectures.
Variational Voxel Pseudo Image Tracking
Oleksiienko, Illia, Nousi, Paraskevi, Passalis, Nikolaos, Tefas, Anastasios, Iosifidis, Alexandros
Uncertainty estimation is an important task for critical problems, such as robotics and autonomous driving, because it allows creating statistically better perception models and signaling the model's certainty in its predictions to the decision method or a human supervisor. In this paper, we propose a Variational Neural Network-based version of a Voxel Pseudo Image Tracking (VPIT) method for 3D Single Object Tracking. The Variational Feature Generation Network of the proposed Variational VPIT computes features for target and search regions and the corresponding uncertainties, which are later combined using an uncertainty-aware cross-correlation module in one of two ways: by computing similarity between the corresponding uncertainties and adding it to the regular cross-correlation values, or by penalizing the uncertain feature channels to increase influence of the certain features. In experiments, we show that both methods improve tracking performance, while penalization of uncertain features provides the best uncertainty quality.
MLGWSC-1: The first Machine Learning Gravitational-Wave Search Mock Data Challenge
Schäfer, Marlin B., Zelenka, Ondřej, Nitz, Alexander H., Wang, He, Wu, Shichao, Guo, Zong-Kuan, Cao, Zhoujian, Ren, Zhixiang, Nousi, Paraskevi, Stergioulas, Nikolaos, Iosif, Panagiotis, Koloniari, Alexandra E., Tefas, Anastasios, Passalis, Nikolaos, Salemi, Francesco, Vedovato, Gabriele, Klimenko, Sergey, Mishra, Tanmaya, Brügmann, Bernd, Cuoco, Elena, Huerta, E. A., Messenger, Chris, Ohme, Frank
We present the results of the first Machine Learning Gravitational-Wave Search Mock Data Challenge (MLGWSC-1). For this challenge, participating groups had to identify gravitational-wave signals from binary black hole mergers of increasing complexity and duration embedded in progressively more realistic noise. The final of the 4 provided datasets contained real noise from the O3a observing run and signals up to a duration of 20 seconds with the inclusion of precession effects and higher order modes. We present the average sensitivity distance and runtime for the 6 entered algorithms derived from 1 month of test data unknown to the participants prior to submission. Of these, 4 are machine learning algorithms. We find that the best machine learning based algorithms are able to achieve up to 95% of the sensitive distance of matched-filtering based production analyses for simulated Gaussian noise at a false-alarm rate (FAR) of one per month. In contrast, for real noise, the leading machine learning search achieved 70%. For higher FARs the differences in sensitive distance shrink to the point where select machine learning submissions outperform traditional search algorithms at FARs $\geq 200$ per month on some datasets. Our results show that current machine learning search algorithms may already be sensitive enough in limited parameter regions to be useful for some production settings. To improve the state-of-the-art, machine learning algorithms need to reduce the false-alarm rates at which they are capable of detecting signals and extend their validity to regions of parameter space where modeled searches are computationally expensive to run. Based on our findings we compile a list of research areas that we believe are the most important to elevate machine learning searches to an invaluable tool in gravitational-wave signal detection.
Non-Linear Spectral Dimensionality Reduction Under Uncertainty
Laakom, Firas, Raitoharju, Jenni, Passalis, Nikolaos, Iosifidis, Alexandros, Gabbouj, Moncef
In this paper, we consider the problem of non-linear dimensionality reduction under uncertainty, both from a theoretical and algorithmic perspectives. Since real-world data usually contain measurements with uncertainties and artifacts, the input space in the proposed framework consists of probability distributions to model the uncertainties associated with each sample. We propose a new dimensionality reduction framework, called NGEU, which leverages uncertainty information and directly extends several traditional approaches, e.g., KPCA, MDA/KMFA, to receive as inputs the probability distributions instead of the original data. We show that the proposed NGEU formulation exhibits a global closed-form solution, and we analyze, based on the Rademacher complexity, how the underlying uncertainties theoretically affect the generalization ability of the framework. Empirical results on different datasets show the effectiveness of the proposed framework.
Graph Embedding with Data Uncertainty
Laakom, Firas, Raitoharju, Jenni, Passalis, Nikolaos, Iosifidis, Alexandros, Gabbouj, Moncef
However, the impracticability of working in high dimensional spaces due to the curse of dimensionality and the realization that the data in many problems reside on manifolds with much lower dimensions than those of the original space, has led to the development of spectral-based subspace learning (SL) techniques. Spectral-based methods rely on the eigenanalysis of Scatter matrices. SL aims at determining a mapping of the original high-dimensional space into a lower-dimensional space preserving properties of interest in the input data. This mapping can be obtained using unsupervised methods, such as Principal Component Analysis (PCA) [1, 2], or supervised ones, such as Linear Discriminant Analysis (LDA) [3] and Marginal Fisher Analysis (MFA) [4]. Despite the different motivations of these spectral-based methods, a general formulation known as Graph Embedding was introduced in [4] to unify them within a common framework. For low-dimensional data, where dimensionality reduction is not needed and classification algorithms can be applied directly, many extensions modeling input data inaccuracies have recently been proposed [5, 6].
Temporal Logistic Neural Bag-of-Features for Financial Time series Forecasting leveraging Limit Order Book Data
Passalis, Nikolaos, Tefas, Anastasios, Kanniainen, Juho, Gabbouj, Moncef, Iosifidis, Alexandros
Time series forecasting is a crucial component of many important applications, ranging from predicting thebehavior of financial markets [5], to accurate energy load prediction [13]. Even though the large amount of data that can be nowadays collected from these domains provide an unprecedented opportunity for applying powerful deep learning (DL) methods [23, 41, 24], the high-dimensionality, velocity and variety of such data also pose significant and unique challenges that must be carefully addressed for each application. To this end, many methods have been proposed to analyze and forecast time series data. For example, traditional approaches employ adaptive distance metrics, such as Dynamic Time Wrapping [4], to tackle these kind of tasks. However, with the advent of DL the interest is gradually shifting toward using neural network-based methods, including recurrent and convolutional architectures [25, 7], that seem to be more effective for handling such kind of data. It is worth noting that other approaches for time series analysis also exist, such as using the Bag-of-Features model (BoF) [35]. The BoF model was recently adapted toward efficiently processing large amounts of complex and high-dimensional time series [2, 1, 32], due its ability to analyze objects that consist of a varying number of features, as well as withstanding distribution shifts better than competitive methods [29]. The Bag-of-Features model (BoF) involves the following pipeline [35]: a) Several feature vectors are extracted from each input object, e.g., an image or time series. This step is called feature extraction and allows for forming the feature space, where each object is represented as a set of feature vectors.