AITopics

Country:

Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Neural Information Processing SystemsFeb-19-2026, 04:30:29 GMT

8e2c381d4dd04f1c55093f22c59c3a08-Paper.pdf

algorithm, international conference, regularization, (13 more...)

Country:

Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Neural Information Processing SystemsAug-15-2025, 02:04:55 GMT

Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning

We also study the more general case of an additional entropy regularizer.

algorithm, evaluation step, regularization, (16 more...)

Country:

Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Neural Information Processing SystemsAug-15-2025, 02:04:48 GMT

8e2c381d4dd04f1c55093f22c59c3a08-Paper.pdf

algorithm, international conference, regularization, (13 more...)

Country:

Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Mahjoub, Youssef Ait El, Fourneau, Jean-Michel, Alouah, Salma

Efficient Solving of Large Single Input Superstate Decomposable Markovian Decision Process

arXiv.org Artificial IntelligenceAug-4-2025

Solving Markov Decision Processes (MDPs) remains a central challenge in sequential decision-making, especially when dealing with large state spaces and long-term optimization criteria. A key step in Bellman dynamic programming algorithms is the policy evaluation, which becomes computationally demanding in infinite-horizon settings such as average-reward or discounted-reward formulations. In the context of Markov chains, aggregation and disaggregation techniques have for a long time been used to reduce complexity by exploiting structural decompositions. In this work, we extend these principles to a structured class of MDPs. We define the Single-Input Superstate Decomposable Markov Decision Process (SISDMDP), which combines Chiu's single-input decomposition with Robertazzi's single-cycle recurrence property. When a policy induces this structure, the resulting transition graph can be decomposed into interacting components with centralized recurrence. We develop an exact and efficient policy evaluation method based on this structure. This yields a scalable solution applicable to both average and discounted reward MDPs.

artificial intelligence, machine learning, partition, (17 more...)

2508.00816

Country: Europe > France (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.91)

arXiv.org Artificial IntelligenceFeb-22-2025

Fine-Tuning Qwen 2.5 3B for Realistic Movie Dialogue Generation

Gupta, Kartik

--The Qwen 2.5 3B base model was fine-tuned to generate contextually rich and engaging movie dialogue, leveraging the Cornell Movie-Dialog Corpus, a curated dataset of movie conversations. Due to limitations in GPU computing and VRAM, the training process began with the 0.5B model, progressively scaling up to the 1.5B and 3B versions as efficiency improvements were implemented. The Qwen 2.5 series, developed by Alibaba Group, stands at the forefront of small open-source pre-trained models, particularly excelling in creative tasks compared to alternatives like Meta's Llama 3.2 and Google's Gemma. Results demonstrate the ability of small models to produce high-quality, realistic dialogue, offering a promising approach for real-time, context-sensitive conversation generation. This project aimed to fine-tune a small, pretrained large language model (LLM) to generate realistic and compelling movie dialogue when prompted with a preceding line. To achieve this, the Qwen 2.5 3B base model was fine-tuned using the Cornell Movie-Dialog Corpus, a curated dataset of movie dialogue [1].

base model, dialogue, movie dialogue, (15 more...)

2502.16274

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Guo, Pengsheng, Schwing, Alexander G.

Variational Rectified Flow Matching

arXiv.org Artificial IntelligenceFeb-13-2025

We study Variational Rectified Flow Matching, a framework that enhances classic rectified flow matching by modeling multi-modal velocity vector-fields. At inference time, classic rectified flow matching 'moves' samples from a source distribution to the target distribution by solving an ordinary differential equation via integration along a velocity vector-field. At training time, the velocity vector-field is learnt by linearly interpolating between coupled samples one drawn from the source and one drawn from the target distribution randomly. This leads to ''ground-truth'' velocity vector-fields that point in different directions at the same location, i.e., the velocity vector-fields are multi-modal/ambiguous. However, since training uses a standard mean-squared-error loss, the learnt velocity vector-field averages ''ground-truth'' directions and isn't multi-modal. In contrast, variational rectified flow matching learns and samples from multi-modal flow directions. We show on synthetic data, MNIST, CIFAR-10, and ImageNet that variational rectified flow matching leads to compelling results.

artificial intelligence, baseline, machine learning, (14 more...)

2502.09616

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceDec-19-2024

G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o

Tong, Tony Cheng, He, Sirui, Shao, Zhiwen, Yeung, Dit-Yan

Evaluation metric of visual captioning is important yet not thoroughly explored. Traditional metrics like BLEU, METEOR, CIDEr, and ROUGE often miss semantic depth, while trained metrics such as CLIP-Score, PAC-S, and Polos are limited in zero-shot scenarios. Advanced Language Model-based metrics also struggle with aligning to nuanced human preferences. To address these issues, we introduce G-VEval, a novel metric inspired by G-Eval and powered by the new GPT-4o. G-VEval uses chain-of-thought reasoning in large multimodal models and supports three modes: reference-free, reference-only, and combined, accommodating both video and image inputs. We also propose MSVD-Eval, a new dataset for video captioning evaluation, to establish a more transparent and consistent framework for both human experts and evaluation metrics. It is designed to address the lack of clear criteria in existing datasets by introducing distinct dimensions of Accuracy, Completeness, Conciseness, and Relevance (ACCR). Extensive results show that G-VEval outperforms existing methods in correlation with human annotations, as measured by Kendall tau-b and Kendall tau-c. This provides a flexible solution for diverse captioning tasks and suggests a straightforward yet effective approach for large language models to understand video content, paving the way for advancements in automated captioning. Codes are available at https://github.com/ztangaj/gveval

caption, large language model, machine learning, (21 more...)

2412.13647

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Dominican Republic (0.04)
(3 more...)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

arXiv.org Artificial IntelligenceFeb-2-2024

UNSEE: Unsupervised Non-contrastive Sentence Embeddings

Çağatan, Ömer Veysel

We present UNSEE: Unsupervised Non-Contrastive Sentence Embeddings, a novel approach that outperforms SimCSE in the Massive Text Embedding benchmark. Our exploration begins by addressing the challenge of representation collapse, a phenomenon observed when contrastive objectives in SimCSE are replaced with non-contrastive objectives. To counter this issue, we propose a straightforward solution known as the target network, effectively mitigating representation collapse. The introduction of the target network allows us to leverage non-contrastive objectives, maintaining training stability while achieving performance improvements comparable to contrastive objectives. Our method has achieved peak performance in non-contrastive sentence embeddings through meticulous fine-tuning and optimization. This comprehensive effort has yielded superior sentence representation models, showcasing the effectiveness of our approach.

non-contrastive objective, objective, projection model, (16 more...)

2401.15316

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Dominican Republic (0.04)
North America > Canada (0.04)
(7 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.49)

arXiv.org Artificial IntelligenceNov-7-2023

Which is better? Exploring Prompting Strategy For LLM-based Metrics

Kim, Joonghoon, Park, Saeran, Jeong, Kiyoon, Lee, Sangmin, Han, Seung Hun, Lee, Jiyoon, Kang, Pilsung

This paper describes the DSBA submissions to the Prompting Large Language Models as Explainable Metrics shared task, where systems were submitted to two tracks: small and large summarization tracks. With advanced Large Language Models (LLMs) such as GPT-4, evaluating the quality of Natural Language Generation (NLG) has become increasingly paramount. Traditional similarity-based metrics such as BLEU and ROUGE have shown to misalign with human evaluation and are ill-suited for open-ended generation tasks. To address this issue, we explore the potential capability of LLM-based metrics, especially leveraging open-source LLMs. In this study, wide range of prompts and prompting techniques are systematically analyzed with three approaches: prompting strategy, score aggregation, and explainability. Our research focuses on formulating effective prompt templates, determining the granularity of NLG quality scores and assessing the impact of in-context examples on LLM-based evaluation. Furthermore, three aggregation strategies are compared to identify the most reliable method for aggregating NLG quality scores. To examine explainability, we devise a strategy that generates rationales for the scores and analyzes the characteristics of the explanation produced by the open-source LLMs. Extensive experiments provide insights regarding evaluation capabilities of open-source LLMs and suggest effective prompting strategies.

direct 0, evaluation, open-source llm, (16 more...)

2311.03754

Country:

Europe > United Kingdom > England > Leicestershire > Leicester (0.05)
Europe > United Kingdom > England > East Midlands (0.04)
Europe > Spain > Galicia > Madrid (0.04)
(9 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Transportation (0.94)
Leisure & Entertainment > Sports > Tennis (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)