low dimension
AutoML-guided Fusion of Entity and LLM-based representations
Koloski, Boshko, Pollak, Senja, Navigli, Roberto, Škrlj, Blaž
Large semantic knowledge bases are grounded in factual knowledge. However, recent approaches to dense text representations (embeddings) do not efficiently exploit these resources. Dense and robust representations of documents are essential for effectively solving downstream classification and retrieval tasks. This work demonstrates that injecting embedded information from knowledge bases can augment the performance of contemporary Large Language Model (LLM)-based representations for the task of text classification. Further, by considering automated machine learning (AutoML) with the fused representation space, we demonstrate it is possible to improve classification accuracy even if we use low-dimensional projections of the original representation space obtained via efficient matrix factorization. This result shows that significantly faster classifiers can be achieved with minimal or no loss in predictive performance, as demonstrated using five strong LLM baselines on six diverse real-life datasets.
Curvature Augmented Manifold Embedding and Learning
Dimension reduction (DR) is a long-lasting and focused area in engineering, science, and machine learning communities. It may have different names and preferences depending on the individual field. For example, in engineering, it can referred to as reduced-order modeling, and it is closely related to data visualization in machine learning. The core concept is to solve the curse of dimensionality by projecting the data features to a low dimensional space (2D or 3D for data visualization problems, but not necessary for general DR problems). Once the low-dimensional data structure is obtained, many analyses, such as classification and regression, can be done conveniently compared to their counterparts in the high-dimensional spaces. The DR method can be traced back to the most widely used principal component analysis (PCA) [1], a linear DR method based on the eigenvalue problems of all data points. PCA has alternative names in engineering and science, such as proper orthogonal decomposition [2] in structural dynamics and Kahunen-Leove expansion in engineering statistics[3]. The nonlinear DR method has been proposed to improve the apparent limitation of the linear DR method, such as locally linear embedding (LLE)[4], ISOMAP[5], and Laplacian Eignemap[6], among many others. A detailed review of these earlier developments can be found in [7].
Capturing Knowledge Graphs and Rules with Octagon Embeddings
Charpenay, Victor, Schockaert, Steven
Region based knowledge graph embeddings represent relations as geometric regions. This has the advantage that the rules which are captured by the model are made explicit, making it straightforward to incorporate prior knowledge and to inspect learned models. Unfortunately, existing approaches are severely restricted in their ability to model relational composition, and hence also their ability to model rules, thus failing to deliver on the main promise of region based models. With the aim of addressing these limitations, we investigate regions which are composed of axis-aligned octagons. Such octagons are particularly easy to work with, as intersections and compositions can be straightforwardly computed, while they are still sufficiently expressive to model arbitrary knowledge graphs. Among others, we also show that our octagon embeddings can properly capture a non-trivial class of rule bases. Finally, we show that our model achieves competitive experimental results.