Bortolussi, Luca
Contrastive Learning-Based privacy metrics in Tabular Synthetic Datasets
Palacios, Milton Nicolás Plasencia, Saccani, Sebastiano, Sgroi, Gabriele, Boudewijn, Alexander, Bortolussi, Luca
Synthetic data has garnered attention as a Privacy Enhancing Technology (PET) in sectors such as healthcare and finance. When using synthetic data in practical applications, it is important to provide protection guarantees. In the literature, two family of approaches are proposed for tabular data: on the one hand, Similarity-based methods aim at finding the level of similarity between training and synthetic data. Indeed, a privacy breach can occur if the generated data is consistently too similar or even identical to the train data. On the other hand, Attack-based methods conduce deliberate attacks on synthetic datasets. The success rates of these attacks reveal how secure the synthetic datasets are. In this paper, we introduce a contrastive method that improves privacy assessment of synthetic datasets by embedding the data in a more representative space. This overcomes obstacles surrounding the multitude of data types and attributes. It also makes the use of intuitive distance metrics possible for similarity measurements and as an attack vector. In a series of experiments with publicly available datasets, we compare the performances of similarity-based and attack-based methods, both with and without use of the contrastive learning-based embeddings. Our results show that relatively efficient, easy to implement privacy metrics can perform equally well as more advanced metrics explicitly modeling conditions for privacy referred to by the GDPR.
Timeseria: an object-oriented time series processing library
Russo, Stefano Alberto, Taffoni, Giuliano, Bortolussi, Luca
Timeseria is an object-oriented time series processing library implemented in Python, which aims at making it easier to manipulate time series data and to build statistical and machine learning models on top of it. Unlike common data analysis frameworks, it builds up from well defined and reusable logical units (objects), which can be easily combined together in order to ensure a high level of consistency. Thanks to this approach, Timeseria can address by design several non-trivial issues which are often underestimated, such as handling data losses, non-uniform sampling rates, differences between aggregated data and punctual observations, time zones, daylight saving times, and more. Timeseria comes with a comprehensive set of base data structures, data transformations for resampling and aggregation, common data manipulation operations, and extensible models for data reconstruction, forecasting and anomaly detection. It also integrates a fully featured, interactive plotting engine capable of handling even millions of data points. Time series represent the evolution of a phenomena over time, and their analysis is essential to capture the dynamics of the phenomena being studied, understand cause-and-effect relationships, and make predictions. However, a typical time series processing pipeline -- loading a data set, cleaning and plotting it, performing some operations, applying some models and inspecting the results -- still feels unnecessarily cumbersome. Scientists, engineers, analysts, and many other professional figures spend a considerable amount of time on repetitive procedures and on getting their code to work, instead of focusing on their core tasks.
Scaling Combinatorial Optimization Neural Improvement Heuristics with Online Search and Adaptation
Verdù, Federico Julian Camerota, Castelli, Lorenzo, Bortolussi, Luca
This approach (Singh and Rizwanullah 2022) to circuit board design eliminates the necessity for manually crafted components, (Barahona et al. 1988) and phylogenetics (Catanzaro thereby providing an ideal means to address problems without et al. 2012). Although general-purpose solvers exist and requiring specific domain knowledge (Lombardi and Milano most CO problems are easy to formulate, in many applications 2018). However, improvement heuristics can be easier of interest getting to the exact optimal solution is NPhard to apply when complex constraints need to be satisfied and and said solvers are extremely inefficient or even impractical may yield better performance than constructive alternatives due to the computational time required to reach optimality when the problem structure is difficult to represent (Zhang (Toth 2000; Colorni et al. 1996). Specialized solvers et al. 2020) or when known improvement operators with and heuristics have been developed over the years for different good properties exist (Bordewich et al. 2008).
Effective Analog ICs Floorplanning with Relational Graph Neural Networks and Reinforcement Learning
Basso, Davide, Bortolussi, Luca, Videnovic-Misic, Mirjana, Habal, Husni
Analog integrated circuit (IC) floorplanning is typically a manual process with the placement of components (devices and modules) planned by a layout engineer. This process is further complicated by the interdependence of floorplanning and routing steps, numerous electric and layout-dependent constraints, as well as the high level of customization expected in analog design. This paper presents a novel automatic floorplanning algorithm based on reinforcement learning. It is augmented by a relational graph convolutional neural network model for encoding circuit features and positional constraints. The combination of these two machine learning methods enables knowledge transfer across different circuit designs with distinct topologies and constraints, increasing the \emph{generalization ability} of the solution. Applied to $6$ industrial circuits, our approach surpassed established floorplanning techniques in terms of speed, area and half-perimeter wire length. When integrated into a \emph{procedural generator} for layout completion, overall layout time was reduced by $67.3\%$ with a $8.3\%$ mean area reduction compared to manual layout.
ResiDual Transformer Alignment with Spectral Decomposition
Basile, Lorenzo, Maiorca, Valentino, Bortolussi, Luca, Rodolà, Emanuele, Locatello, Francesco
When examined through the lens of their residual streams, a puzzling property emerges in transformer networks: residual contributions (e.g., attention heads) sometimes specialize in specific tasks or input attributes. In this paper, we analyze this phenomenon in vision transformers, focusing on the spectral geometry of residuals, and explore its implications for modality alignment in vision-language models. First, we link it to the intrinsically low-dimensional structure of visual head representations, zooming into their principal components and showing that they encode specialized roles across a wide variety of input data distributions. Then, we analyze the effect of head specialization in multimodal models, focusing on how improved alignment between text and specialized heads impacts zero-shot classification performance. This specialization-performance link consistently holds across diverse pre-training data, network sizes, and objectives, demonstrating a powerful new mechanism for boosting zero-shot classification through targeted alignment. Ultimately, we translate these insights into actionable terms by introducing ResiDual, a technique for spectral alignment of the residual stream. Much like panning for gold, it lets the noise from irrelevant unit principal components (i.e., attributes) wash away to amplify task-relevant ones. Remarkably, this dual perspective on modality alignment yields fine-tuning level performances on different data distributions while modeling an extremely interpretable and parameter-efficient transformation, as we extensively show on more than 50 (pre-trained network, dataset) pairs.
Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations
Basile, Lorenzo, Acevedo, Santiago, Bortolussi, Luca, Anselmi, Fabio, Rodriguez, Alex
To gain insight into the mechanisms behind machine learning methods, it is crucial to establish connections among the features describing data points. However, these correlations often exhibit a high-dimensional and strongly nonlinear nature, which makes them challenging to detect using standard methods. This paper exploits the entanglement between intrinsic dimensionality and correlation to propose a metric that quantifies the (potentially nonlinear) correlation between high-dimensional manifolds. We first validate our method on synthetic data in controlled environments, showcasing its advantages and drawbacks compared to existing techniques. Subsequently, we extend our analysis to large-scale applications in neural network representations. Specifically, we focus on latent representations of multimodal data, uncovering clear correlations between paired visual and textual embeddings, whereas existing methods struggle significantly in detecting similarity. Our results indicate the presence of highly nonlinear correlation patterns between latent manifolds.
Can you trust your explanations? A robustness test for feature attribution methods
Vascotto, Ilaria, Rodriguez, Alex, Bonaita, Alessandro, Bortolussi, Luca
The increase of legislative concerns towards the usage of Artificial Intelligence (AI) has recently led to a series of regulations striving for a more transparent, trustworthy and accountable AI. Along with these proposals, the field of Explainable AI (XAI) has seen a rapid growth but the usage of its techniques has at times led to unexpected results. The robustness of the approaches is, in fact, a key property often overlooked: it is necessary to evaluate the stability of an explanation (to random and adversarial perturbations) to ensure that the results are trustable. To this end, we propose a test to evaluate the robustness to non-adversarial perturbations and an ensemble approach to analyse more in depth the robustness of XAI methods applied to neural networks and tabular datasets. We will show how leveraging manifold hypothesis and ensemble approaches can be beneficial to an in-depth analysis of the robustness.
Fast ML-driven Analog Circuit Layout using Reinforcement Learning and Steiner Trees
Basso, Davide, Bortolussi, Luca, Videnovic-Misic, Mirjana, Habal, Husni
Abstract--This paper presents an artificial intelligence driven methodology to reduce the bottleneck often encountered in the analog ICs layout phase. We frame the floorplanning problem as a Markov Decision Process and leverage reinforcement learning for automatic placement generation under established topological constraints. Consequently, we introduce Steiner tree-based methods for the global routing step and generate guiding paths to be used to connect every circuit block. Finally, by integrating these solutions into a procedural generation framework, we present a unified pipeline that bridges the divide between circuit design and verification steps. Experimental results demonstrate the efficacy in generating complete layouts, eventually reducing runtimes to 1.5% compared to manual efforts.
Retrieval-Augmented Mining of Temporal Logic Specifications from Data
Saveri, Gaia, Bortolussi, Luca
The integration of cyber-physical systems (CPS) into everyday life raises the critical necessity of ensuring their safety and reliability. An important step in this direction is requirement mining, i.e. inferring formally specified system properties from observed behaviors, in order to discover knowledge about the system. Signal Temporal Logic (STL) offers a concise yet expressive language for specifying requirements, particularly suited for CPS, where behaviors are typically represented as time series data. This work addresses the task of learning STL requirements from observed behaviors in a data-driven manner, focusing on binary classification, i.e. on inferring properties of the system which are able to discriminate between regular and anomalous behaviour, and that can be used both as classifiers and as monitors of the compliance of the CPS to desirable specifications. We present a novel framework that combines Bayesian Optimization (BO) and Information Retrieval (IR) techniques to simultaneously learn both the structure and the parameters of STL formulae, without restrictions on the STL grammar. Specifically, we propose a framework that leverages a dense vector database containing semantic-preserving continuous representations of millions of formulae, queried for facilitating the mining of requirements inside a BO loop. We demonstrate the effectiveness of our approach in several signal classification applications, showing its ability to extract interpretable insights from system executions and advance the state-of-the-art in requirement mining for CPS.
stl2vec: Semantic and Interpretable Vector Representation of Temporal Logic
Saveri, Gaia, Nenzi, Laura, Bortolussi, Luca, Křetínský, Jan
For algorithms is a longstanding challenge in Artificial Intelligence. Despite example in STL one can state properties like "the temperature of the the recognized importance of this task, a notable gap exists due room will reach 25 degrees within the next 10 minutes and will stay to the discreteness of symbolic representations and the continuous above 22 degrees for the next hour". In this area, one is typically interested nature of machine-learning computations. One of the desired bridges in understanding or verifying which properties the system between these two worlds would be to define semantically grounded under analysis is compliant to (or more precisely, in the probability vector representation (feature embedding) of logic formulae, thus enabling of observing behaviour satisfying the property). Such analysis is often to perform continuous learning and optimization in the semantic tackled by formal methods, via algorithms belonging to the world space of formulae. We tackle this goal for knowledge expressed in of quantitative model checking [4]. Signal Temporal Logic (STL) and devise a method to compute continuous In this work, we address the challenge of incorporating knowledge embeddings of formulae with several desirable properties: the in the form of temporal logic formulae inside data-driven embedding (i) is finite-dimensional, (ii) faithfully reflects the semantics learning algorithms. The key step is to devise a finite-dimensional of the formulae, (iii) does not require any learning but instead is embedding (feature mapping) of logical formulae into continuous defined from basic principles, (iv) is interpretable.