AITopics | Curriculum

Collaborating Authors

Curriculum

A Synthetic Dataset for Personal Attribute Inference Hanna Yukhymenko

Neural Information Processing SystemsMar-27-2025, 11:00:25 GMT

Recently powerful Large Language Models (LLMs) have become easily accessible to hundreds of millions of users world-wide. However, their strong capabilities and vast world knowledge do not come without associated privacy risks. In this work, we focus on the emerging privacy threat LLMs pose - the ability to accurately infer personal information from online texts. Despite the growing importance of LLM-based author profiling, research in this area has been hampered by a lack of suitable public datasets, largely due to ethical and privacy concerns associated with real personal data. We take two steps to address this problem: (i) we construct a simulation framework for the popular social media platform Reddit using LLM agents seeded with synthetic personal profiles; (ii) using this framework, we generate SynthPAI, a diverse synthetic dataset of over 7800 comments manually labeled for personal attributes. We validate our dataset with a human study showing that humans barely outperform random guessing on the task of distinguishing our synthetic comments from real ones. Further, we verify that our dataset enables meaningful personal attribute inference research by showing across 18 state-of-theart LLMs that our synthetic comments allow us to draw the same conclusions as real-world data. Combined, our experimental results, dataset and pipeline form a strong basis for future privacy-preserving research geared towards understanding and mitigating inference-based privacy threats that LLMs pose.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia (0.67)
South America > Brazil > Rio de Janeiro (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Ohio (0.14)

Genre:

Research Report > New Finding (1.00)
Overview (0.87)
Research Report > Experimental Study (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Curriculum > Subject-Specific Education (0.92)
Education > Educational Setting > Higher Education (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

89e44582fd28ddfea1ea4dcb0ebbf4b0-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsMar-27-2025, 10:45:46 GMT

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > California (0.46)
North America > Canada > Ontario (0.45)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)
(2 more...)

Industry:

Transportation > Passenger (1.00)
Transportation > Air (1.00)
Leisure & Entertainment (1.00)
(24 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Communications > Social Media (1.00)
(6 more...)

Add feedback

89e44582fd28ddfea1ea4dcb0ebbf4b0-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsMar-27-2025, 10:45:43 GMT

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > California (0.46)
North America > Canada > Ontario (0.45)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)
(2 more...)

Industry:

Transportation > Passenger (1.00)
Transportation > Air (1.00)
Leisure & Entertainment (1.00)
(24 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
(6 more...)

Add feedback

MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs Yingjia Wan 2 Jingyao Li1

Neural Information Processing SystemsMar-27-2025, 10:33:48 GMT

Large language models (LLMs) have shown increasing capability in problemsolving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, evaluating these reasoning abilities has become increasingly challenging. Existing outcome-based benchmarks are beginning to saturate, becoming less effective in tracking meaningful progress. To address this, we present a process-based benchmark MR-Ben that demands a meta-reasoning skill, where LMs are asked to locate and analyse potential errors in automatically generated reasoning steps. Our meta-reasoning paradigm is especially suited for system-2 slow thinking, mirroring the human cognitive process of carefully examining assumptions, conditions, calculations, and logic to identify mistakes. MR-Ben comprises 5,975 questions curated by human experts across a wide range of subjects, including physics, chemistry, logic, coding, and more. Through our designed metrics for assessing meta-reasoning on this benchmark, we identify interesting limitations and weaknesses of current LLMs (open-source and closed-source models).

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
Asia > Middle East > UAE (0.14)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Workflow (0.73)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
Materials > Chemicals (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

AI ring tracks spelled words in American Sign Language

AIHubMar-27-2025, 09:48:21 GMT

A Cornell-led research team has developed an artificial intelligence-powered ring equipped with micro-sonar technology that can continuously and in real time track fingerspelling in American Sign Language (ASL). In its current form, SpellRing could be used to enter text into computers or smartphones via fingerspelling, which is used in ASL to spell out words without corresponding signs, such as proper nouns, names and technical terms. With further development, the device could potentially be used to continuously track entire signed words and sentences. "Many other technologies that recognize fingerspelling in ASL have not been adopted by the deaf and hard-of-hearing community because the hardware is bulky and impractical," said Hyunchul Lim, a doctoral student in the field of information science. "We sought to develop a single ring to capture all of the subtle and complex finger movement in ASL." Lim is lead author of "SpellRing: Recognizing Continuous Fingerspelling in American Sign Language using a Ring," which will be presented at the Association of Computing Machinery's conference on Human Factors in Computing Systems (CHI), April 26-May 1 in Yokohama, Japan.

artificial intelligence, machine learning, natural language, (14 more...)

AIHub

Country: Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.25)

Industry:

Education > Curriculum > Subject-Specific Education (0.85)
Health & Medicine (0.57)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.50)
Information Technology > Artificial Intelligence > Natural Language (0.31)

Add feedback

Geometry Awakening: Cross-Geometry Learning Exhibits Superiority over Individual Structures Yu Wang

Neural Information Processing SystemsMar-27-2025, 09:07:59 GMT

Recent research has underscored the efficacy of Graph Neural Networks (GNNs) in modeling diverse geometric structures within graph data.

data mining, machine learning, student model, (22 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Education > Curriculum > Subject-Specific Education (0.41)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Agent Planning with World Knowledge Model

Neural Information Processing SystemsMar-27-2025, 08:46:55 GMT

Recent endeavors towards directly using large language models (LLMs) as agent models to execute interactive planning tasks have shown commendable results.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland (0.14)
Asia > Middle East > UAE (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Education > Educational Setting (0.67)
Education > Curriculum > Subject-Specific Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
(2 more...)

Add feedback

Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment

Neural Information Processing SystemsMar-27-2025, 08:37:23 GMT

We study the problem of aligning large language models (LLMs) with human preference data. Contrastive preference optimization has shown promising results in aligning LLMs with available preference data by optimizing the implicit reward associated with the policy. However, the contrastive objective focuses mainly on the relative values of implicit rewards associated with two responses while ignoring their actual values, resulting in suboptimal alignment with human preferences. To address this limitation, we propose calibrated direct preference optimization (Cal-DPO), a simple yet effective algorithm. We show that substantial improvement in alignment with the given preferences can be achieved simply by calibrating the implicit reward to ensure that the learned implicit rewards are comparable in scale to the ground-truth rewards. We demonstrate the theoretical advantages of Cal-DPO over existing approaches. The results of our experiments on a variety of standard benchmarks show that Cal-DPO remarkably improves off-the-shelf methods.

cal-dpo, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Media (0.68)
Banking & Finance (0.67)
Government > Regional Government > North America Government > United States Government (0.45)
Education > Curriculum > Subject-Specific Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Scaling Sign Language Translation

Neural Information Processing SystemsMar-27-2025, 08:32:49 GMT

Sign language translation (SLT) addresses the problem of translating information from a sign language in video to a spoken language in text. Existing studies, while showing progress, are often limited to narrow domains and/or few sign languages and struggle with open-domain tasks. In this paper, we push forward the frontier of SLT by scaling pretraining data, model size, and number of translation directions. We perform large-scale SLT pretraining on different data including 1) noisy multilingual YouTube SLT data, 2) parallel text corpora, and 3) SLT data augmented by translating video captions to other languages with off-the-shelf machine translation models. We unify different pretraining tasks with task-specific prompts under the encoder-decoder architecture, and initialize the SLT model with pretrained (m/By)T5 models across model sizes. SLT pretraining results on How2Sign and FLEURS-ASL#0 (ASL to 42 spoken languages) demonstrate the significance of data/model scaling and cross-lingual cross-modal transfer, as well as the feasibility of zero-shot SLT.

machine learning, natural language, translation, (19 more...)

Neural Information Processing Systems

Country:

Europe > Italy (0.14)
Asia > Middle East > UAE (0.14)
Europe > Portugal (0.14)
Asia > Thailand (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

Improving Gloss-free Sign Language Translation by Reducing Representation Density Wenxiang Jiao

Neural Information Processing SystemsMar-27-2025, 06:48:45 GMT

Gloss-free sign language translation (SLT) aims to develop well-performing SLT systems with no requirement for the costly gloss annotations, but currently still lags behind gloss-based approaches significantly. In this paper, we identify a representation density problem that could be a bottleneck in restricting the performance of gloss-free SLT. Specifically, the representation density problem describes that the visual representations of semantically distinct sign gestures tend to be closely packed together in feature space, which makes gloss-free methods struggle with distinguishing different sign gestures and suffer from a sharp performance drop. To address the representation density problem, we introduce a simple but effective contrastive learning strategy, namely SignCL, which encourages gloss-free models to learn more discriminative feature representation in a self-supervised manner. Our experiments demonstrate that the proposed SignCL can significantly reduce the representation density and improve performance across various translation frameworks. Specifically, SignCL achieves a significant improvement in BLEU score for the Sign Language Transformer and GFSLT-VLP on the CSL-Daily dataset by 39% and 46%, respectively, without any increase of model parameters. Compared to Sign2GPT, a state-of-the-art method based on large-scale pre-trained vision and language models, SignCL achieves better performance with only 35% of its parameters.

machine learning, natural language, translation, (18 more...)

Neural Information Processing Systems

Country: