Goto

Collaborating Authors

 dblp



ScatteringGCN: OvercomingOversmoothnessin GraphConvolutionalNetworks-Supplement

Neural Information Processing Systems

Now,since|N(v)|=β,itholds (Px)[v]= a+b 2, thus verifying the first claim of the lemma as the choice ofv was arbitrary. This construction essentially generalizes the graph demonstrated in Figure 1 of the main paper (see Sec. 7). The following lemma shows that onsuch graphs, the filter responses ofgθ for aconstant signal will encode some geometric information, butwill not distinguish between the cycles inthe graph. These responses with appropriate color coding give the illustration in Figure 1 in the main paper. Validation & testing procedure: All tests were done using train-validation-test splits of the datasets, where validation accuracy is used for tuning hyperparameters and test accuracy is reportedinthecomparisontable.





SPARC: Soft Probabilistic Adaptive multi-interest Retrieval Model via Codebooks for recommender system

Shi, Jialiang, Dou, Yaguang, Qi, Tian

arXiv.org Artificial Intelligence

Modeling multi-interests has arisen as a core problem in real-world RS. Current multi-interest retrieval methods pose three major challenges: 1) Interests, typically extracted from predefined external knowledge, are invariant. Failed to dynamically evolve with users' real-time consumption preferences. 2) Online inference typically employs an over-exploited strategy, mainly matching users' existing interests, lacking proactive exploration and discovery of novel and long-tail interests. To address these challenges, we propose a novel retrieval framework named SPARC(Soft Probabilistic Adaptive Retrieval Model via Codebooks). Our contribution is two folds. First, the framework utilizes Residual Quantized Variational Autoencoder (RQ-VAE) to construct a discretized interest space. It achieves joint training of the RQ-VAE with the industrial large scale recommendation model, mining behavior-aware interests that can perceive user feedback and evolve dynamically. Secondly, a probabilistic interest module that predicts the probability distribution over the entire dynamic and discrete interest space. This facilitates an efficient "soft-search" strategy during online inference, revolutionizing the retrieval paradigm from "passive matching" to "proactive exploration" and thereby effectively promoting interest discovery. Online A/B tests on an industrial platform with tens of millions daily active users, have achieved substantial gains in business metrics: +0.9% increase in user view duration, +0.4% increase in user page views (PV), and a +22.7% improvement in PV500(new content reaching 500 PVs in 24 hours). Offline evaluations are conducted on open-source Amazon Product datasets. Metrics, such as Recall@K and Normalized Discounted Cumulative Gain@K(NDCG@K), also showed consistent improvement. Both online and offline experiments validate the efficacy and practical value of the proposed method.


Composite Data Augmentations for Synthetic Image Detection Against Real-World Perturbations

Amarantidou, Efthymia, Koutlis, Christos, Papadopoulos, Symeon, Petrantonakis, Panagiotis C.

arXiv.org Artificial Intelligence

--The advent of accessible Generative AI tools enables anyone to create and spread synthetic images on social media, often with the intention to mislead, thus posing a significant threat to online information integrity. Most existing Synthetic Image Detection (SID) solutions struggle on generated images sourced from the Internet, as these are often altered by compression and other operations. T o address this, our research enhances SID by exploring data augmentation combinations, leveraging a genetic algorithm for optimal augmentation selection, and introducing a dual-criteria optimization approach. These methods significantly improve model performance under real-world perturbations. Our findings provide valuable insights for developing detection models capable of identifying synthetic images across varying qualities and transformations, with the best-performing model achieving a mean average precision increase of +22.53% compared to models without augmentations.


Hybrid-SQuAD: Hybrid Scholarly Question Answering Dataset

Taffa, Tilahun Abedissa, Banerjee, Debayan, Assabie, Yaregal, Usbeck, Ricardo

arXiv.org Artificial Intelligence

Existing Scholarly Question Answering (QA) methods typically target homogeneous data sources, relying solely on either text or Knowledge Graphs (KGs). However, scholarly information often spans heterogeneous sources, necessitating the development of QA systems that integrate information from multiple heterogeneous data sources. To address this challenge, we introduce Hybrid-SQuAD (Hybrid Scholarly Question Answering Dataset), a novel large-scale QA dataset designed to facilitate answering questions incorporating both text and KG facts. The dataset consists of 10.5K question-answer pairs generated by a large language model, leveraging the KGs DBLP and SemOpenAlex alongside corresponding text from Wikipedia. In addition, we propose a RAG-based baseline hybrid QA model, achieving an exact match score of 69.65 on the Hybrid-SQuAD test set.


Scholarly Wikidata: Population and Exploration of Conference Data in Wikidata using LLMs

Mihindukulasooriya, Nandana, Tiwari, Sanju, Dobriy, Daniil, Nielsen, Finn Årup, Chhetri, Tek Raj, Polleres, Axel

arXiv.org Artificial Intelligence

Several initiatives have been undertaken to conceptually model the domain of scholarly data using ontologies and to create respective Knowledge Graphs. Yet, the full potential seems unleashed, as automated means for automatic population of said ontologies are lacking, and respective initiatives from the Semantic Web community are not necessarily connected: we propose to make scholarly data more sustainably accessible by leveraging Wikidata's infrastructure and automating its population in a sustainable manner through LLMs by tapping into unstructured sources like conference Web sites and proceedings texts as well as already existing structured conference datasets. While an initial analysis shows that Semantic Web conferences are only minimally represented in Wikidata, we argue that our methodology can help to populate, evolve and maintain scholarly data as a community within Wikidata. Our main contributions include (a) an analysis of ontologies for representing scholarly data to identify gaps and relevant entities/properties in Wikidata, (b) semi-automated extraction -- requiring (minimal) manual validation -- of conference metadata (e.g., acceptance rates, organizer roles, programme committee members, best paper awards, keynotes, and sponsors) from websites and proceedings texts using LLMs. Finally, we discuss (c) extensions to visualization tools in the Wikidata context for data exploration of the generated scholarly data. Our study focuses on data from 105 Semantic Web-related conferences and extends/adds more than 6000 entities in Wikidata. It is important to note that the method can be more generally applicable beyond Semantic Web-related conferences for enhancing Wikidata's utility as a comprehensive scholarly resource. Source Repository: https://github.com/scholarly-wikidata/ DOI: https://doi.org/10.5281/zenodo.10989709 License: Creative Commons CC0 (Data), MIT (Code)


Chain-of-Specificity: An Iteratively Refining Method for Eliciting Knowledge from Large Language Models

Wei, Kaiwen, Zhang, Jingyuan, Zhang, Hongzhi, Zhang, Fuzheng, Zhang, Di, Jin, Li, Yu, Yue

arXiv.org Artificial Intelligence

Large Language Models (LLMs) exhibit remarkable generative capabilities, enabling the generation of valuable information. Despite these advancements, previous research found that LLMs sometimes struggle with adhering to specific constraints (e.g., in specific place or at specific time), at times even overlooking them, which leads to responses that are either too generic or not fully satisfactory. Existing approaches attempted to address this issue by decomposing or rewriting input instructions, yet they fall short in adequately emphasizing specific constraints and in unlocking the underlying knowledge (e.g., programming within the context of software development). In response, this paper proposes a simple yet effective method named Chain-of-Specificity (CoS). Specifically, CoS iteratively emphasizes the specific constraints in the input instructions, unlocks knowledge within LLMs, and refines responses. Experiments conducted on publicly available and self-build complex datasets demonstrate that CoS outperforms existing methods in enhancing generated content especially for the specificity. Besides, as the number of specific constraints increase, other baselines falter, while CoS still performs well. Moreover, we show that distilling responses generated by CoS effectively enhances the ability of smaller models to follow the constrained instructions. Resources of this paper will be released for further research.