Information Retrieval
Neural-Symbolic Entangled Framework for Complex Query Answering
Xu, Zezhong, Zhang, Wen, Ye, Peng, Chen, Hui, Chen, Huajun
Answering complex queries over knowledge graphs (KG) is an important yet challenging task because of the KG incompleteness issue and cascading errors during reasoning. Recent query embedding (QE) approaches to embed the entities and relations in a KG and the first-order logic (FOL) queries into a low dimensional space, answering queries by dense similarity search. However, previous works mainly concentrate on the target answers, ignoring intermediate entities' usefulness, which is essential for relieving the cascading error problem in logical query answering. In addition, these methods are usually designed with their own geometric or distributional embeddings to handle logical operators like union, intersection, and negation, with the sacrifice of the accuracy of the basic operator - projection, and they could not absorb other embedding methods to their models. In this work, we propose a Neural and Symbolic Entangled framework (ENeSy) for complex query answering, which enables the neural and symbolic reasoning to enhance each other to alleviate the cascading error and KG incompleteness. The projection operator in ENeSy could be any embedding method with the capability of link prediction, and the other FOL operators are handled without parameters. With both neural and symbolic reasoning results contained, ENeSy answers queries in ensembles. ENeSy achieves the SOTA performance on several benchmarks, especially in the setting of the training model only with the link prediction task.
Playing Technique Detection by Fusing Note Onset Information in Guzheng Performance
Li, Dichucheng, Wu, Yulun, Li, Qinyu, Zhao, Jiahao, Yu, Yi, Xia, Fan, Li, Wei
The Guzheng is a kind of traditional Chinese instruments with diverse playing techniques. Instrument playing techniques (IPT) play an important role in musical performance. However, most of the existing works for IPT detection show low efficiency for variable-length audio and provide no assurance in the generalization as they rely on a single sound bank for training and testing. In this study, we propose an end-to-end Guzheng playing technique detection system using Fully Convolutional Networks that can be applied to variable-length audio. Because each Guzheng playing technique is applied to a note, a dedicated onset detector is trained to divide an audio into several notes and its predictions are fused with frame-wise IPT predictions. During fusion, we add the IPT predictions frame by frame inside each note and get the IPT with the highest probability within each note as the final output of that note. We create a new dataset named GZ_IsoTech from multiple sound banks and real-world recordings for Guzheng performance analysis. Our approach achieves 87.97% in frame-level accuracy and 80.76% in note-level F1-score, outperforming existing works by a large margin, which indicates the effectiveness of our proposed method in IPT detection.
For Gen Z, TikTok is the new search engine
When Ja'Kobi Moore decided to apply this year to a private high school in her hometown of New Orleans, she learned that she needed at least one letter of recommendation from a teacher. She had never asked for one, so she sought help. "Teacher letter of recommendation," she typed into TikTok's search bar. This could be due to a conflict with your ad-blocking or security software. Please add japantimes.co.jp and piano.io to your list of allowed sites.
Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark
Azhir, Elham, Hosseinzadeh, Mehdi, Khan, Faheem, Mosavi, Amir
Access plan recommendation is a query optimization approach that executes new queries using prior created query execution plans (QEPs). The query optimizer divides the query space into clusters in the mentioned method. However, traditional clustering algorithms take a significant amount of execution time for clustering such large datasets. The MapReduce distributed computing model provides efficient solutions for storing and processing vast quantities of data. Apache Spark and Apache Hadoop frameworks are used in the present investigation to cluster different sizes of query datasets in the MapReduce-based access plan recommendation method. The performance evaluation is performed based on execution time. The results of the experiments demonstrated the effectiveness of parallel query clustering in achieving high scalability. Furthermore, Apache Spark achieved better performance than Apache Hadoop, reaching an average speedup of 2x.
Learned Sorted Table Search and Static Indexes in Small Model Space
Amato, Domenico, Bosco, Giosuè Lo, Giancarlo, Raffaele
Machine Learning Techniques, properly combined with Data Structures, have resulted in Learned Static Indexes, innovative and powerful tools that speed-up Binary Search, with the use of additional space with respect to the table being searched into. Such space is devoted to the Machine Learning Model. Although in their infancy, they are methodologically and practically important, due to the pervasiveness of Sorted Table Search procedures. In modern applications, model space is a key factor and, in fact, a major open question concerning this area is to assess to what extent one can enjoy the speed-up of Binary Search achieved by Learned Indexes while using constant or nearly constant space models. In this paper, we investigate the mentioned question by (a) introducing two new models, i.e., the Learned k-ary Search Model and the Synoptic Recursive Model Index, respectively; (b) systematically exploring the time-space trade-offs of a hierarchy of existing models, i.e., the ones in the reference software platform Searching on Sorted Data, together with the new ones proposed here. By adhering and extending the current benchmarking methodology, we experimentally show that the Learned k-ary Search Model can speed up Binary Search in constant additional space. Our second model, together with the bi-criteria Piece-wise Geometric Model index, can achieve a speed-up of Binary Search with a model space of 0:05% more than the one taken by the table, being competitive in terms of time-space trade-off with existing proposals. The Synoptic Recursive Model Index and the bi-criteria Piece-wise Geometric Model complement each other quite well across the various levels of the internal memory hierarchy. Finally, our findings stimulate research in this area, since they highlight the need for further studies regarding the time-space relation in Learned Indexes.
Interactions in Information Spread
Since the development of writing 5000 years ago, human-generated data gets produced at an ever-increasing pace. Classical archival methods aimed at easing information retrieval. Nowadays, archiving is not enough anymore. The amount of data that gets generated daily is beyond human comprehension, and appeals for new information retrieval strategies. Instead of referencing every single data piece as in traditional archival techniques, a more relevant approach consists in understanding the overall ideas conveyed in data flows. To spot such general tendencies, a precise comprehension of the underlying data generation mechanisms is required. In the rich literature tackling this problem, the question of information interaction remains nearly unexplored. First, we investigate the frequency of such interactions. Building on recent advances made in Stochastic Block Modelling, we explore the role of interactions in several social networks. We find that interactions are rare in these datasets. Then, we wonder how interactions evolve over time. Earlier data pieces should not have an everlasting influence on ulterior data generation mechanisms. We model this using dynamic network inference advances. We conclude that interactions are brief. Finally, we design a framework that jointly models rare and brief interactions based on Dirichlet-Hawkes Processes. We argue that this new class of models fits brief and sparse interaction modelling. We conduct a large-scale application on Reddit and find that interactions play a minor role in this dataset. From a broader perspective, our work results in a collection of highly flexible models and in a rethinking of core concepts of machine learning. Consequently, we open a range of novel perspectives both in terms of real-world applications and in terms of technical contributions to machine learning.
Sub-GMN: The Neural Subgraph Matching Network Model
Lan, Zixun, Yu, Limin, Yuan, Linglong, Wu, Zili, Niu, Qiang, Ma, Fei
As one of the most fundamental tasks in graph theory, subgraph matching is a crucial task in many fields, ranging from information retrieval, computer vision, biology, chemistry and natural language processing. Yet subgraph matching problem remains to be an NP-complete problem. This study proposes an end-to-end learning-based approximate method for subgraph matching task, called subgraph matching network (Sub-GMN). The proposed Sub-GMN firstly uses graph representation learning to map nodes to node-level embedding. It then combines metric learning and attention mechanisms to model the relationship between matched nodes in the data graph and query graph. To test the performance of the proposed method, we applied our method on two databases. We used two existing methods, GNN and FGNN as baseline for comparison. Our experiment shows that, on dataset 1, on average the accuracy of Sub-GMN are 12.21\% and 3.2\% higher than that of GNN and FGNN respectively. On average running time Sub-GMN runs 20-40 times faster than FGNN. In addition, the average F1-score of Sub-GMN on all experiments with dataset 2 reached 0.95, which demonstrates that Sub-GMN outputs more correct node-to-node matches. Comparing with the previous GNNs-based methods for subgraph matching task, our proposed Sub-GMN allows varying query and data graphes in the test/application stage, while most previous GNNs-based methods can only find a matched subgraph in the data graph during the test/application for the same query graph used in the training stage. Another advantage of our proposed Sub-GMN is that it can output a list of node-to-node matches, while most existing end-to-end GNNs based methods cannot provide the matched node pairs.
Pre-training for Information Retrieval: Are Hyperlinks Fully Explored?
Wu, Jiawen, Zhang, Xinyu, Zhu, Yutao, Liu, Zheng, Guo, Zikai, Fei, Zhaoye, Lai, Ruofei, Wu, Yongkang, Cao, Zhao, Dou, Zhicheng
Recent years have witnessed great progress on applying pre-trained language models, e.g., BERT, to information retrieval (IR) tasks. Hyperlinks, which are commonly used in Web pages, have been leveraged for designing pre-training objectives. For example, anchor texts of the hyperlinks have been used for simulating queries, thus constructing tremendous query-document pairs for pre-training. However, as a bridge across two web pages, the potential of hyperlinks has not been fully explored. In this work, we focus on modeling the relationship between two documents that are connected by hyperlinks and designing a new pre-training objective for ad-hoc retrieval. Specifically, we categorize the relationships between documents into four groups: no link, unidirectional link, symmetric link, and the most relevant symmetric link. By comparing two documents sampled from adjacent groups, the model can gradually improve its capability of capturing matching signals. We propose a progressive hyperlink predication ({PHP}) framework to explore the utilization of hyperlinks in pre-training. Experimental results on two large-scale ad-hoc retrieval datasets and six question-answering datasets demonstrate its superiority over existing pre-training methods.
How to Transform Your Data into a Voice AI Knowledge Assistant - Coruzant Technologies
Almost every enterprise believes data to be one of their most important assets, but most would admit they are not leveraging their data to its full potential. That's because making data easily accessible to employees is surprisingly hard work. It requires a concerted, ongoing effort to gather, structure, and tag data in order to turn it into knowledge, capable of being found in the moments when it can be most useful, using a voice AI knowledge assistant. Of course, this has always been true regardless of the state of technology, from ancient libraries indexing vast physical volumes to today's cloud-based search engines crawling millions of gigabytes of data. We can take for granted the magic of the now-ubiquitous digital keyword search, which has delivered us the power to have any data point just a few mouse and keyboard clicks away.
Y Combinator-backed Andi taps AI to build a better search engine
It's difficult to convince users to switch search engines. That's one reason why public search service startups rarely succeed. Another is that it's expensive to index a huge number of websites (Google has an estimated tens of billions of pages indexed), but one Y Combinator-backed company, Andi, is undeterred -- forging ahead to build an AI assistant that provides answers instead of links when searching online. Andi was founded by Angela Hoover, who registered for YC's Startup School after dropping out of college and got into YC's Winter 2022 Batch. After working overseas in construction and with Microsoft as a data center project administrator, Hoover met Andi's co-founder, Jed White, at the Denver airport upon her return to the U.S. Hoover and White -- who had a background in AI and search, specifically content quality ranking, querying and classification -- talked about how bad web search had become for things like travel and what it would take to build a new type of search engine from scratch.