AITopics | better evaluation

Collaborating Authors

better evaluation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Better Evaluation for Dynamic Link Prediction

Neural Information Processing SystemsDec-25-2025, 09:50:39 GMT

Despite the prevalence of recent success in learning from static graphs, learning from time-evolving graphs remains an open challenge. In this work, we design new, more stringent evaluation procedures for link prediction specific to dynamic graphs, which reflect real-world considerations, to better compare the strengths and weaknesses of methods. First, we create two visualization techniques to understand the reoccurring patterns of edges over time and show that many edges reoccur at later time steps. Based on this observation, we propose a pure memorization-based baseline called EdgeBank. EdgeBank achieves surprisingly strong performance across multiple settings which highlights that the negative edges used in the current evaluation are easy. To sample more challenging negative edges, we introduce two novel negative sampling strategies that improve robustness and better match real-world applications. Lastly, we introduce six new dynamic graph datasets from a diverse set of domains missing from current benchmarks, providing new challenges and opportunities for future research. Our code repository is accessible at https://github.com/fpour/DGB.git.

better evaluation, dynamic link prediction, name change, (4 more...)

Neural Information Processing Systems

Country: Asia > China > Liaoning Province > Shenyang (0.07)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

Reviews: Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems

Neural Information Processing SystemsJun-2-2025, 01:09:23 GMT

The paper attempts to move away from traditional evaluation of open-domain dialog systems (i.e., judge response given its conversation history) and moves towards a more interactive one (i.e., human talking to a bot), which is likely an important step towards better evaluation. However, I do have several serious concerns about this work in its current form: (1) The authors contrast their work with existing evaluation for open-domain dialog evaluation, which they call "single-turn" evaluation. They point out that this type of evaluation prevents it from capturing "failure modes […] such as a lack of diversity in the responses, inability to track long-term aspects of the conversation". I think this is rather misleading and the term is "single-turn" is a misnomer. Most previous work has indeed evaluated each conversation by factorizing it into a sequence of independent turn-level judgments, but each of these judgments assesses the quality of the current turn T_n **given** a history of several previous turns …, T_n-k, … T_n-1.

approximating interactive human evaluation, evaluation, open-domain dialog system, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.64)

Add feedback

Towards Better Evaluation for Dynamic Link Prediction

Neural Information Processing SystemsJan-18-2025, 23:57:35 GMT

better evaluation, dynamic link prediction, negative edge, (1 more...)

Neural Information Processing Systems

Country: Asia > China > Liaoning Province > Shenyang (0.09)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.83)
Information Technology > Information Management > Search (0.65)
Information Technology > Data Science > Data Mining (0.65)

Add feedback