Information Retrieval
JobHam-place with smart recommend job options and candidate filtering options
Due to the increasing number of graduates, many applicants experience the situation about finding a job, and employers experience difficulty filtering job applicants, which might negatively impact their effectiveness. However, most job-hunting websites lack job recommendation and CV filtering or ranking functionality, which are not integrated into the system. Thus, a smart job hunter combined with the above functionality will be conducted in this project, which contains job recommendations, CV ranking and even a job dashboard for skills and job applicant functionality. Job recommendation and CV ranking starts from the automatic keyword extraction and end with the Job/CV ranking algorithm. Automatic keyword extraction is implemented by Job2Skill and the CV2Skill model based on Bert. Job2Skill consists of two components, text encoder and Gru-based layers, while CV2Skill is mainly based on Bert and fine-tunes the pre-trained model by the Resume- Entity dataset. Besides, to match skills from CV and job description and rank lists of jobs and candidates, job/CV ranking algorithms have been provided to compute the occurrence ratio of skill words based on TFIDF score and match ratio of the total skill numbers. Besides, some advanced features have been integrated into the website to improve user experiences, such as the calendar and sweetalert2 plugin. And some basic features to go through job application processes, such as job application tracking and interview arrangement.
BERTino: an Italian DistilBERT model
Muffo, Matteo, Bertino, Enrico
The recent introduction of Transformers language representation models allowed great improvements in many natural language processing (NLP) tasks. However, if on one hand the performances achieved by this kind of architectures are surprising, on the other their usability is limited by the high number of parameters which constitute their network, resulting in high computational and memory demands. In this work we present BERTino, a DistilBERT model which proposes to be the first lightweight alternative to the BERT architecture specific for the Italian language. We evaluated BERTino on the Italian ISDT, Italian ParTUT, Italian WikiNER and multiclass classification tasks, obtaining F1 scores comparable to those obtained by a BERTBASE with a remarkable improvement in training and inference speed.
Scardina: Scalable Join Cardinality Estimation by Multiple Density Estimators
Ito, Ryuichi, Sasaki, Yuya, Xiao, Chuan, Onizuka, Makoto
In recent years, machine learning-based cardinality estimation methods are replacing traditional methods. This change is expected to contribute to one of the most important applications of cardinality estimation, the query optimizer, to speed up query processing. However, none of the existing methods do not precisely estimate cardinalities when relational schemas consist of many tables with strong correlations between tables/attributes. This paper describes that multiple density estimators can be combined to effectively target the cardinality estimation of data with large and complex schemas having strong correlations. We propose Scardina, a new join cardinality estimation method using multiple partitioned models based on the schema structure.
Reviewer Assignment Problem: A Systematic Review of the Literature
Aksoy, Meltem | Yanik, Seda (Istanbul Technical University) | Amasyali, Mehmet Fatih (Yildiz Technical University)
Appropriate reviewer assignment significantly impacts the quality of proposal evaluation, as accurate and fair reviews are contingent on their assignment to relevant reviewers. The crucial task of assigning reviewers to submitted proposals is the starting point of the review process and is also known as the reviewer assignment problem (RAP). Due to the obvious restrictions of manual assignment, journal editors, conference organizers, and grant managers demand automatic reviewer assignment approaches. Many studies have proposed assignment solutions in response to the demand for automated procedures since 1992. The primary objective of this survey paper is to provide scholars and practitioners with a comprehensive overview of available research on the RAP. To achieve this goal, this article presents an in-depth systematic review of 103 publications in the field of reviewer assignment published in the past three decades and available in the Web of Science, Scopus, ScienceDirect, Google Scholar, and Semantic Scholar databases. This review paper classified and discussed the RAP approaches into two broad categories and numerous subcategories based on their underlying techniques. Furthermore, potential future research directions for each category are presented. This survey shows that the research on the RAP is becoming more significant and that more effort is required to develop new approaches and a framework.
MetaEnhance: Metadata Quality Improvement for Electronic Theses and Dissertations of University Libraries
Choudhury, Muntabir Hasan, Salsabil, Lamia, Jayanetti, Himarsha R., Wu, Jian, Ingram, William A., Fox, Edward A.
Metadata quality is crucial for digital objects to be discovered through digital library interfaces. However, due to various reasons, the metadata of digital objects often exhibits incomplete, inconsistent, and incorrect values. We investigate methods to automatically detect, correct, and canonicalize scholarly metadata, using seven key fields of electronic theses and dissertations (ETDs) as a case study. We propose MetaEnhance, a framework that utilizes state-of-the-art artificial intelligence methods to improve the quality of these fields. To evaluate MetaEnhance, we compiled a metadata quality evaluation benchmark containing 500 ETDs, by combining subsets sampled using multiple criteria. We tested MetaEnhance on this benchmark and found that the proposed methods achieved nearly perfect F1-scores in detecting errors and F1-scores in correcting errors ranging from 0.85 to 1.00 for five of seven fields.
CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning
Smith, James Seale, Karlinsky, Leonid, Gutta, Vyshnavi, Cascante-Bonilla, Paola, Kim, Donghyun, Arbelle, Assaf, Panda, Rameswar, Feris, Rogerio, Kira, Zsolt
Computer vision models suffer from a phenomenon known as catastrophic forgetting when learning novel concepts from continuously shifting training data. Typical solutions for this continual learning problem require extensive rehearsal of previously seen data, which increases memory costs and may violate data privacy. Recently, the emergence of large-scale pre-trained vision transformer models has enabled prompting approaches as an alternative to data-rehearsal. These approaches rely on a key-query mechanism to generate prompts and have been found to be highly resistant to catastrophic forgetting in the well-established rehearsal-free continual learning setting. However, the key mechanism of these methods is not trained end-to-end with the task sequence. Our experiments show that this leads to a reduction in their plasticity, hence sacrificing new task accuracy, and inability to benefit from expanded parameter capacity. We instead propose to learn a set of prompt components which are assembled with input-conditioned weights to produce input-conditioned prompts, resulting in a novel attention-based end-to-end key-query scheme. Our experiments show that we outperform the current SOTA method DualPrompt on established benchmarks by as much as 4.5% in average final accuracy. We also outperform the state of art by as much as 4.4% accuracy on a continual learning benchmark which contains both class-incremental and domain-incremental task shifts, corresponding to many practical settings. Our code is available at https://github.com/GT-RIPL/CODA-Prompt
Task Oriented Conversational Modelling With Subjective Knowledge
Existing conversational models are handled by a database(DB) and API based systems. However, very often users' questions require information that cannot be handled by such systems. Nonetheless, answers to these questions are available in the form of customer reviews and FAQs. DSTC-11 proposes a three stage pipeline consisting of knowledge seeking turn detection, knowledge selection and response generation to create a conversational model grounded on this subjective knowledge. In this paper, we focus on improving the knowledge selection module to enhance the overall system performance. In particular, we propose entity retrieval methods which result in an accurate and faster knowledge search. Our proposed Named Entity Recognition (NER) based entity retrieval method results in 7X faster search compared to the baseline model. Additionally, we also explore a potential keyword extraction method which can improve the accuracy of knowledge selection. Preliminary results show a 4 \% improvement in exact match score on knowledge selection task. The code is available https://github.com/raja-kumar/knowledge-grounded-TODS
Zero-Shot Retrieval with Search Agents and Hybrid Environments
Huebscher, Michelle Chen, Buck, Christian, Ciaramita, Massimiliano, Rothe, Sascha
Learning to search is the task of building artificial agents that learn to autonomously use a search box to find information. So far, it has been shown that current language models can learn symbolic query reformulation policies, in combination with traditional term-based retrieval, but fall short of outperforming neural retrievers. We extend the previous learning to search setup to a hybrid environment, which accepts discrete query refinement operations, after a first-pass retrieval step via a dual encoder. Experiments on the BEIR task show that search agents, trained via behavioral cloning, outperform the underlying search system based on a combined dual encoder retriever and cross encoder reranker. Furthermore, we find that simple heuristic Hybrid Retrieval Environments (HRE) can improve baseline performance by several nDCG points. The search agent based on HRE (HARE) matches state-of-the-art performance, balanced in both zero-shot and in-domain evaluations, via interpretable actions, and at twice the speed.
Neural Graph Databases. A new milestone in graph data…
Vanilla graph databases are pretty much everywhere thanks to the ever-growing graphs in production, flexible graph data models, and expressive query languages. Query engines assume that graphs in classical graph DBs are complete. Under the completeness assumption, we can build indexes, store the graphs in a variety of read/write-optimized formats and expect the DB would return what is there. But this assumption does not often hold in practice (we'd say, doesn't hold way too often). If we look at some prominent knowledge graphs (KGs): in Freebase, 93.8% of people have no place of birth and 78.5% have no nationality, about 68% of people do not have any profession, while in Wikidata, about 50% of artists have no date of birth, and only 0.4% of known buildings have information about height.
AI Keyword Research: Tools, Strategies & Tips - Analytics Vidhya
AI-driven keyword research has become indispensable for bloggers looking to grow their audience and boost their online presence. By leveraging advanced ML algorithms, AI tools provide data-driven insights into user search behavior, revealing high-potential keywords to target. This process helps you create compelling, relevant, and searchable content that attracts organic traffic and improves your blog's search engine ranking. In this article, we'll introduce you to the benefits of AI keyword research, the best tools to use in 2023, and actionable tips for harnessing their potential to grow your blog exponentially. AI keyword research is the process of using machine learning algorithms and advanced data analytics to identify high-potential keywords that can help improve a website's search engine ranking and drive traffic to the site.