AITopics | candidate data

Collaborating Authors

candidate data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LAMDAS: LLM as an Implicit Classifier for Domain-specific Data Selection

Wu, Jian, Yu, Hang, Liu, Bingchang, Yang, Wenjie, Di, Peng, Li, Jianguo, Zhang, Yue

arXiv.org Artificial IntelligenceSep-9-2025

Adapting large language models (LLMs) to specific domains often faces a critical bottleneck: the scarcity of high-quality, human-curated data. While large volumes of unchecked data are readily available, indiscriminately using them for fine-tuning risks introducing noise and degrading performance. Strategic data selection is thus crucial, requiring a method that is both accurate and efficient. Existing approaches, categorized as similarity-based and direct optimization methods, struggle to simultaneously achieve these goals. In this paper, we introduce LAMDAS (LLM As an iMplicit classifier for domain-specific DAta Selection), a novel approach that leverages the pre-trained LLM itself as an implicit classifier, thereby bypassing explicit feature engineering and computationally intensive optimization process. LAMDAS reframes data selection as a one-class classification problem, identifying candidate data that "belongs" to the target domain defined by a small reference dataset. Extensive experimental results demonstrate that LAMDAS not only exceeds the performance of full-data training using a fraction of the data but also outperforms nine state-of-the-art (SOTA) baselines under various scenarios. Furthermore, LAMDAS achieves the most compelling balance between performance gains and computational efficiency compared to all evaluated baselines.

large language model, machine learning, qwen2, (18 more...)

arXiv.org Artificial Intelligence

2509.06524

Country: North America > United States (0.46)

Genre:

Overview (0.87)
Research Report > New Finding (0.48)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

InsBank: Evolving Instruction Subset for Ongoing Alignment

Shi, Jiayi, Li, Yiwei, Feng, Shaoxiong, Yuan, Peiwen, Wang, Xinglin, Zhang, Yueqi, Tan, Chuyi, Pan, Boyuan, Ren, Huan, Hu, Yao, Li, Kan

arXiv.org Artificial IntelligenceFeb-16-2025

Large language models (LLMs) typically undergo instruction tuning to enhance alignment. Recent studies emphasize that quality and diversity of instruction data are more crucial than quantity, highlighting the need to select diverse, high-quality subsets to reduce training costs. However, how to evolve these selected subsets alongside the development of new instruction data remains insufficiently explored. To achieve LLMs' ongoing alignment, we introduce Instruction Bank (InsBank), a continuously updated repository that integrates the latest valuable instruction data. We further propose Progressive Instruction Bank Evolution (PIBE), a novel framework designed to evolve InsBank effectively and efficiently over time. PIBE employs a gradual data selection strategy to maintain long-term efficiency, leveraging a representation-based diversity score to capture relationships between data points and retain historical information for comprehensive diversity evaluation. This also allows for flexible combination of diversity and quality scores during data selection and ranking. Extensive experiments demonstrate that PIBE significantly outperforms baselines in InsBank evolution and is able to extract budget-specific subsets, demonstrating its effectiveness and adaptability.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2502.11419

Country:

Asia (1.00)
North America > United States (0.46)
North America > Mexico (0.28)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Active machine learning for spatio-temporal predictions using feature embedding

Aryandoust, Arsam, Pfenninger, Stefan

arXiv.org Machine LearningDec-8-2020

Active learning (AL) could contribute to solving critical environmental problems through improved spatiotemporal predictions. Yet such predictions involve high-dimensional feature spaces with mixed data types and missing data, which existing methods have difficulties dealing with. Here, we propose a novel batch AL method that fills this gap. We encode and cluster features of candidate data points, and query the best data based on the distance of embedded features to their cluster centers. We introduce a new metric of informativeness that we call embedding entropy and a general class of neural networks that we call embedding networks for using it. Empirical tests on forecasting electricity demand show a simultaneous reduction in average prediction RMSE by up to 63-88% and data usage by up to 50-69% compared to passive learning (PL) benchmarks. Examples include the electricity consumption of buildings, required to operate sustainable power grids; the travel time between city zones, required for the smart charging of electric vehicles; and meteorological conditions, required for weather-based forecasting of wind and solar electricity generation. Sensing and labeling the ground truth data that is necessary for making these predictions in time and space usually comes at a high cost. This cost constrains the total number of sensors that we can place and use to query new data. A fundamental question that arises for many spatiotemporal prediction tasks is where and when to measure and query the data required to make the best possible predictions while staying within a maximum budget for sensors and data.

candidate data, prediction, sensor 0, (16 more...)

arXiv.org Machine Learning

2012.04407

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Energy > Renewable > Solar (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

How is AI Changing the World of Assessments?

#artificialintelligenceDec-11-2019, 04:43:20 GMT

Artificial Intelligence was existed only in the domain of science fiction and fantasy until last few years. However, it has become a part of our normal lives today, in social as well as the business environment. From military, automotive, agriculture, legal, healthcare to education, this technology has touched in almost every field and sector impacting human lives to a great extend. AI systems are capable enough to reduce human efforts in numerous areas. Its applications help to get the work done faster and with accurate results.

assessment, selection process, student, (15 more...)

#artificialintelligence

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.77)
Information Technology > Artificial Intelligence > Science Fiction (0.55)

Add feedback

Interview: Ashutosh Garg, CEO at Eightfold.ai - insideBIGDATA

#artificialintelligenceOct-2-2019, 18:17:34 GMT

I recently caught up with Ashutosh Garg, CEO at Eightfold.ai to discuss how he and his team have deployed AI and machine learning to help with the needs of the talent management industry. With 6000 research citations, 50 patents, 35 peer-reviewed research publications, and the outstanding Ph.D. thesis award from UIUC for his Ph.D. thesis in Machine Learning, it's fair to say that Ashutosh is one of the world's experts in machine learning. After his time managing Search and Personalization efforts at both Google and IBM Research, Ashutosh founded Bloomreach, a leading vendor for Digital Experience Platforms. Now, he is applying his experience to the problem he is most truly passionate about --helping the world's talent find their most meaningful and fulfilling work. Can you give us a sense for what form of AI/machine learning is being used in your product?

artificial intelligence, eightfold, machine learning, (18 more...)

#artificialintelligence

Genre: Personal > Interview (0.55)

Industry: Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

3 Recruitment Tasks Supercharged with Artificial Intelligence

#artificialintelligenceApr-7-2017, 20:05:18 GMT

The recruitment process continues to lengthen as the search for highly skilled talent increases, the fear of making a bad hire remains and the quality of active candidates is lacking. The average time to fill doubled from 2014 (22.9 days) compared to 2010 (12.6 days), and many reports point to the fact those numbers have increased even more from 2014 to 2016. The average time to fill in 2016 is now at a record high of 29 days according to DHI-DFH Vacancy Duration Measure which analyzed the entire US labor market. Well, reviewing a resume is only the beginning. The mundane tasks of pre-screening, interviewing, validating and reference/background checking candidates is where the hold up lies.

artificial intelligence, recruiter, recruitment task supercharged, (6 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback