Calgary
Analyzing Transformers in Embedding Space
Dar, Guy, Geva, Mor, Gupta, Ankit, Berant, Jonathan
Understanding Transformer-based models has attracted significant attention, as they lie at the heart of recent technological advances across machine learning. While most interpretability methods rely on running models over inputs, recent work has shown that a zero-pass approach, where parameters are interpreted directly without a forward/backward pass is feasible for some Transformer parameters, and for two-layer attention networks. In this work, we present a theoretical analysis where all parameters of a trained Transformer are interpreted by projecting them into the embedding space, that is, the space of vocabulary items they operate on. We derive a simple theoretical framework to support our arguments and provide ample evidence for its validity. First, an empirical analysis showing that parameters of both pretrained and fine-tuned models can be interpreted in embedding space. Second, we present two applications of our framework: (a) aligning the parameters of different models that share a vocabulary, and (b) constructing a classifier without training by ``translating'' the parameters of a fine-tuned classifier to parameters of a different model that was only pretrained. Overall, our findings open the door to interpretation methods that, at least in part, abstract away from model specifics and operate in the embedding space only.
Characterizing and Classifying Developer Forum Posts with their Intentions
Wu, Xingfang, Laufer, Eric, Li, Heng, Khomh, Foutse, Srinivasan, Santhosh, Luo, Jayden
With the rapid growth of the developer community, the amount of posts on online technical forums has been growing rapidly, which poses difficulties for users to filter useful posts and find important information. Tags provide a concise feature dimension for users to locate their interested posts and for search engines to index the most relevant posts according to the queries. However, most tags are only focused on the technical perspective (e.g., program language, platform, tool). In most cases, forum posts in online developer communities reveal the author's intentions to solve a problem, ask for advice, share information, etc. The modeling of the intentions of posts can provide an extra dimension to the current tag taxonomy. By referencing previous studies and learning from industrial perspectives, we create a refined taxonomy for the intentions of technical forum posts. Through manual labeling and analysis on a sampled post dataset extracted from online forums, we understand the relevance between the constitution of posts (code, error messages) and their intentions. Furthermore, inspired by our manual study, we design a pre-trained transformer-based model to automatically predict post intentions. The best variant of our intention prediction framework, which achieves a Micro F1-score of 0.589, Top 1-3 accuracy of 62.6% to 87.8%, and an average AUC of 0.787, outperforms the state-of-the-art baseline approach. Our characterization and automated classification of forum posts regarding their intentions may help forum maintainers or third-party tool developers improve the organization and retrieval of posts on technical forums. We have released our annotated dataset and codes in our supplementary material package.
Catastrophic Forgetting in Deep Learning: A Comprehensive Taxonomy
Aleixo, Everton L., Colonna, Juan G., Cristo, Marco, Fernandes, Everlandio
Deep Learning models have achieved remarkable performance in tasks such as image classification or generation, often surpassing human accuracy. However, they can struggle to learn new tasks and update their knowledge without access to previous data, leading to a significant loss of accuracy known as Catastrophic Forgetting (CF). This phenomenon was first observed by McCloskey and Cohen in 1989 and remains an active research topic. Incremental learning without forgetting is widely recognized as a crucial aspect in building better AI systems, as it allows models to adapt to new tasks without losing the ability to perform previously learned ones. This article surveys recent studies that tackle CF in modern Deep Learning models that use gradient descent as their learning algorithm. Although several solutions have been proposed, a definitive solution or consensus on assessing CF is yet to be established. The article provides a comprehensive review of recent solutions, proposes a taxonomy to organize them, and identifies research gaps in this area.
Deep Learning-Based Cyber-Attack Detection Model for Smart Grids
Mohammadi, Mojtaba, Aflaki, Arshia, Kavousifard, Abdollah, Gitizadeh, Mohsen
In this paper, a novel artificial intelligence-based cyber-attack detection model for smart grids is developed to stop data integrity cyber-attacks (DIAs) on the received load data by supervisory control and data acquisition (SCADA). In the proposed model, first the load data is forecasted using a regression model and after processing stage, the processed data is clustered using the unsupervised learning method. In this work, in order to achieve the best performance, three load forecasting methods (i.e. extra tree regression (ETR), long short-term memory (LSTM) and bidirectional long short-term memory (BiLSTM)) are utilized as regression models and their performance is compared. For clustering and outlying detection, the covariance elliptic envelope (EE) is employed as an unsupervised learning method. To examine the proposed model, the hourly load data of the power company of the city of Johor in Malaysia is employed and Two common DIAs, which are DIAs targeting economic loss and DIAs targeting blackouts, are used to evaluate the accuracy of detection methods in several scenarios. The simulation results show that the proposed EE-BiLSTM method can perform more robust and accurate compared to the other two methods.
MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision
Li, Jianning, Zhou, Zongwei, Yang, Jiancheng, Pepe, Antonio, Gsaxner, Christina, Luijten, Gijs, Qu, Chongyu, Zhang, Tiezheng, Chen, Xiaoxi, Li, Wenxuan, Wodzinski, Marek, Friedrich, Paul, Xie, Kangxian, Jin, Yuan, Ambigapathy, Narmada, Nasca, Enrico, Solak, Naida, Melito, Gian Marco, Vu, Viet Duc, Memon, Afaque R., Schlachta, Christopher, De Ribaupierre, Sandrine, Patel, Rajnikant, Eagleson, Roy, Chen, Xiaojun, Mächler, Heinrich, Kirschke, Jan Stefan, de la Rosa, Ezequiel, Christ, Patrick Ferdinand, Li, Hongwei Bran, Ellis, David G., Aizenberg, Michele R., Gatidis, Sergios, Küstner, Thomas, Shusharina, Nadya, Heller, Nicholas, Andrearczyk, Vincent, Depeursinge, Adrien, Hatt, Mathieu, Sekuboyina, Anjany, Löffler, Maximilian, Liebl, Hans, Dorent, Reuben, Vercauteren, Tom, Shapey, Jonathan, Kujawa, Aaron, Cornelissen, Stefan, Langenhuizen, Patrick, Ben-Hamadou, Achraf, Rekik, Ahmed, Pujades, Sergi, Boyer, Edmond, Bolelli, Federico, Grana, Costantino, Lumetti, Luca, Salehi, Hamidreza, Ma, Jun, Zhang, Yao, Gharleghi, Ramtin, Beier, Susann, Sowmya, Arcot, Garza-Villarreal, Eduardo A., Balducci, Thania, Angeles-Valdez, Diego, Souza, Roberto, Rittner, Leticia, Frayne, Richard, Ji, Yuanfeng, Ferrari, Vincenzo, Chatterjee, Soumick, Dubost, Florian, Schreiber, Stefanie, Mattern, Hendrik, Speck, Oliver, Haehn, Daniel, John, Christoph, Nürnberger, Andreas, Pedrosa, João, Ferreira, Carlos, Aresta, Guilherme, Cunha, António, Campilho, Aurélio, Suter, Yannick, Garcia, Jose, Lalande, Alain, Vandenbossche, Vicky, Van Oevelen, Aline, Duquesne, Kate, Mekhzoum, Hamza, Vandemeulebroucke, Jef, Audenaert, Emmanuel, Krebs, Claudia, van Leeuwen, Timo, Vereecke, Evie, Heidemeyer, Hauke, Röhrig, Rainer, Hölzle, Frank, Badeli, Vahid, Krieger, Kathrin, Gunzer, Matthias, Chen, Jianxu, van Meegdenburg, Timo, Dada, Amin, Balzer, Miriam, Fragemann, Jana, Jonske, Frederic, Rempe, Moritz, Malorodov, Stanislav, Bahnsen, Fin H., Seibold, Constantin, Jaus, Alexander, Marinov, Zdravko, Jaeger, Paul F., Stiefelhagen, Rainer, Santos, Ana Sofia, Lindo, Mariana, Ferreira, André, Alves, Victor, Kamp, Michael, Abourayya, Amr, Nensa, Felix, Hörst, Fabian, Brehmer, Alexander, Heine, Lukas, Hanusrichter, Yannik, Weßling, Martin, Dudda, Marcel, Podleska, Lars E., Fink, Matthias A., Keyl, Julius, Tserpes, Konstantinos, Kim, Moon-Sung, Elhabian, Shireen, Lamecker, Hans, Zukić, Dženan, Paniagua, Beatriz, Wachinger, Christian, Urschler, Martin, Duong, Luc, Wasserthal, Jakob, Hoyer, Peter F., Basu, Oliver, Maal, Thomas, Witjes, Max J. H., Schiele, Gregor, Chang, Ti-chiun, Ahmadi, Seyed-Ahmad, Luo, Ping, Menze, Bjoern, Reyes, Mauricio, Deserno, Thomas M., Davatzikos, Christos, Puladi, Behrus, Fua, Pascal, Yuille, Alan L., Kleesiek, Jens, Egger, Jan
Prior to the deep learning era, shape was commonly used to describe the objects. Nowadays, state-of-the-art (SOTA) algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surface models are used. This is seen from numerous shape-related publications in premier vision conferences as well as the growing popularity of ShapeNet (about 51,300 models) and Princeton ModelNet (127,915 models). For the medical domain, we present a large collection of anatomical shapes (e.g., bones, organs, vessels) and 3D models of surgical instrument, called MedShapeNet, created to facilitate the translation of data-driven vision algorithms to medical applications and to adapt SOTA vision algorithms to medical problems. As a unique feature, we directly model the majority of shapes on the imaging data of real patients. As of today, MedShapeNet includes 23 dataset with more than 100,000 shapes that are paired with annotations (ground truth). Our data is freely accessible via a web interface and a Python application programming interface (API) and can be used for discriminative, reconstructive, and variational benchmarks as well as various applications in virtual, augmented, or mixed reality, and 3D printing. Exemplary, we present use cases in the fields of classification of brain tumors, facial and skull reconstructions, multi-class anatomy completion, education, and 3D printing. In future, we will extend the data and improve the interfaces. The project pages are: https://medshapenet.ikim.nrw/ and https://github.com/Jianningli/medshapenet-feedback
Compressive Recovery of Sparse Precision Matrices
Vayer, Titouan, Lasalle, Etienne, Gribonval, Rémi, Gonçalves, Paulo
We consider the problem of learning a graph modeling the statistical relations of the $d$ variables from a dataset with $n$ samples $X \in \mathbb{R}^{n \times d}$. Standard approaches amount to searching for a precision matrix $\Theta$ representative of a Gaussian graphical model that adequately explains the data. However, most maximum likelihood-based estimators usually require storing the $d^{2}$ values of the empirical covariance matrix, which can become prohibitive in a high-dimensional setting. In this work, we adopt a compressive viewpoint and aim to estimate a sparse $\Theta$ from a \emph{sketch} of the data, i.e. a low-dimensional vector of size $m \ll d^{2}$ carefully designed from $X$ using non-linear random features. Under certain assumptions on the spectrum of $\Theta$ (or its condition number), we show that it is possible to estimate it from a sketch of size $m=\Omega\left((d+2k)\log(d)\right)$ where $k$ is the maximal number of edges of the underlying graph. These information-theoretic guarantees are inspired by compressed sensing theory and involve restricted isometry properties and instance optimal decoders. We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser. We compare our approach and graphical lasso on synthetic datasets, demonstrating its favorable performance even when the dataset is compressed.
HyPHEN: A Hybrid Packing Method and Optimizations for Homomorphic Encryption-Based Neural Networks
Kim, Donghwan, Park, Jaiyoung, Kim, Jongmin, Kim, Sangpyo, Ahn, Jung Ho
Convolutional neural network (CNN) inference using fully homomorphic encryption (FHE) is a promising private inference (PI) solution due to the capability of FHE that enables offloading the whole computation process to the server while protecting the privacy of sensitive user data. Prior FHE-based CNN (HCNN) work has demonstrated the feasibility of constructing deep neural network architectures such as ResNet using FHE. Despite these advancements, HCNN still faces significant challenges in practicality due to the high computational and memory overhead. To overcome these limitations, we present HyPHEN, a deep HCNN construction that incorporates novel convolution algorithms (RAConv and CAConv), data packing methods (2D gap packing and PRCR scheme), and optimization techniques tailored to HCNN construction. Such enhancements enable HyPHEN to substantially reduce the memory footprint and the number of expensive homomorphic operations, such as ciphertext rotation and bootstrapping. As a result, HyPHEN brings the latency of HCNN CIFAR-10 inference down to a practical level at 1.4 seconds (ResNet-20) and demonstrates HCNN ImageNet inference for the first time at 14.7 seconds (ResNet-18).
Manipulator control of the Robotized TMS System with Incurved TMS Coil Case
Objective: This study shows the force/torque control strategy for the robotized TMS system whose TMS coil's floor is incurved. The strategy considered the adhesion and friction between the coil and the subject's head. Methods: Hybrid position/force control and proportional torque were used for the strategy. The force magnitude applied for the force control was scheduled by the error between the coil's current position and the target point. Results: The larger desired force for the force controller makes the error quickly. By scheduling the force magnitude applied for the force control, the low error between the coil's current and target positions is maintained with the relatively small force after the larger force is applied for around 10 seconds. The proportional torque made the adhesion better by locating the contact area between the coil and the head close to the coil. I was shown by checking the ${\tau}_c/F_c$ value from the experimental results. While the head slowly moved away from the coil during the TMS treatment, the coil still interacted with the head. Using that characteristic, the coil could locate the new target point using the force/torque strategy without any trajectory planning. Conclusion: The proposed force/torque controller enhanced the adhesion between the incurved TMS coil and the subject's head. It also reduced the error quickly by scheduling the magnitude of the force applied. Significance: This study proposes the robotized TMS system's force/torque control strategy considering the physical characteristics from the contact between the incurved TMS coil case and the subject's head.
Novel Fundus Image Preprocessing for Retcam Images to Improve Deep Learning Classification of Retinopathy of Prematurity
Rahim, Sajid, Sabri, Kourosh, Ells, Anna, Wassyng, Alan, Lawford, Mark, Chu, Linyang, He, Wenbo
Retinopathy of Prematurity (ROP) is a potentially blinding eye disorder because of damage to the eye's retina which can affect babies born prematurely. Screening of ROP is essential for early detection and treatment. This is a laborious and manual process which requires trained physician performing dilated ophthalmological examination which can be subjective resulting in lower diagnosis success for clinically significant disease. Automated diagnostic methods can assist ophthalmologists increase diagnosis accuracy using deep learning. Several research groups have highlighted various approaches. Captured ROP Retcam images suffer from poor quality. This paper proposes the use of improved novel fundus preprocessing methods using pretrained transfer learning frameworks to create hybrid models to give higher diagnosis accuracy. Once trained and validated, the evaluations showed that these novel methods in comparison to traditional imaging processing contribute to better and in many aspects higher accuracy in classifying Plus disease, Stages of ROP and Zones in comparison to peer papers.
MAINS: A Magnetic Field Aided Inertial Navigation System for Indoor Positioning
Huang, Chuan, Hendeby, Gustaf, Fourati, Hassen, Prieur, Christophe, Skog, Isaac
A Magnetic field Aided Inertial Navigation System (MAINS) for indoor navigation is proposed in this paper. MAINS leverages an array of magnetometers to measure spatial variations in the magnetic field, which are then used to estimate the displacement and orientation changes of the system, thereby aiding the inertial navigation system (INS). Experiments show that MAINS significantly outperforms the stand-alone INS, demonstrating a remarkable two orders of magnitude reduction in position error. Furthermore, when compared to the state-of-the-art magnetic-field-aided navigation approach, the proposed method exhibits slightly improved horizontal position accuracy. On the other hand, it has noticeably larger vertical error on datasets with large magnetic field variations. However, one of the main advantages of MAINS compared to the state-of-the-art is that it enables flexible sensor configurations. The experimental results show that the position error after 2 minutes of navigation in most cases is less than 3 meters when using an array of 30 magnetometers. Thus, the proposed navigation solution has the potential to solve one of the key challenges faced with current magnetic-field simultaneous localization and mapping (SLAM) solutions: the very limited allowable length of the exploration phase during which unvisited areas are mapped.