AITopics | event structure

Collaborating Authors

event structure

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MEQA: A Benchmark for Multi-hop Event-centric Question Answering with Explanations

Neural Information Processing SystemsFeb-18-2026, 11:49:07 GMT

However, there's a lack of insight into how these models perform in terms

large language model, machine learning, question answering, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ukraine > Kyiv Oblast > Kyiv (0.05)
Asia > Middle East > Syria (0.04)
(18 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Government (1.00)
Law (0.93)
Leisure & Entertainment > Sports > Basketball (0.67)
Law Enforcement & Public Safety (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.84)
(2 more...)

Add feedback

MEQA: A Benchmark for Multi-hop Event-centric Question Answering with Explanations

Neural Information Processing SystemsOct-10-2025, 19:43:24 GMT

However, there's a lack of insight into how these models perform in terms

computational linguistic, explanation, relation, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ukraine > Kyiv Oblast > Kyiv (0.05)
Asia > Middle East > Syria (0.04)
(18 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Government (1.00)
Law (0.93)
Leisure & Entertainment > Sports > Basketball (0.67)
Law Enforcement & Public Safety (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.84)
(2 more...)

Add feedback

Beyond Isolated Facts: Synthesizing Narrative and Grounded Supervision for VideoQA

Liang, Jianxin, Yue, Tan, Wang, Yuxuan, Wang, Yueqian, Yin, Zhihan, Zhang, Huishuai, Zhao, Dongyan

arXiv.org Artificial IntelligenceSep-30-2025

The performance of Video Question Answering (VideoQA) models is fundamentally constrained by the nature of their supervision, which typically consists of isolated, factual question-answer pairs. This "bag-of-facts" approach fails to capture the underlying narrative and causal structure of events, limiting models to a shallow understanding of video content. To move beyond this paradigm, we introduce a framework to synthesize richer supervisory signals. We propose two complementary strategies: Question-Based Paraphrasing (QBP), which synthesizes the diverse inquiries (what, how, why) from a video's existing set of question-answer pairs into a holistic narrative paragraph that reconstructs the video's event structure; and Question-Based Captioning (QBC), which generates fine-grained visual rationales, grounding the answer to each question in specific, relevant evidence. Leveraging powerful generative models, we use this synthetic data to train VideoQA models under a unified next-token prediction objective. Extensive experiments on STAR and NExT-QA validate our approach, demonstrating significant accuracy gains and establishing new state-of-the-art results, such as improving a 3B model to 72.5\% on STAR (+4.9\%) and a 7B model to 80.8\% on NExT-QA. Beyond accuracy, our analysis reveals that both QBP and QBC substantially enhance cross-dataset generalization, with QBP additionally accelerating model convergence by over 2.5x. These results demonstrate that shifting data synthesis from isolated facts to narrative coherence and grounded rationales yields a more accurate, efficient, and generalizable training paradigm.

large language model, machine learning, question answering, (21 more...)

arXiv.org Artificial Intelligence

2509.24445

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.93)

Add feedback

VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos

Liang, Baoyu, Su, Qile, Zhu, Shoutai, Liang, Yuchen, Tong, Chao

arXiv.org Artificial IntelligenceJun-4-2025

Despite the significant impact of visual events on human cognition, understanding events in videos remains a challenging task for AI due to their complex structures, semantic hierarchies, and dynamic evolution. To address this, we propose the task of video event understanding that extracts event scripts and makes predictions with these scripts from videos. To support this task, we introduce VidEvent, a large-scale dataset containing over 23,000 well-labeled events, featuring detailed event structures, broad hierarchies, and logical relations extracted from movie recap videos. The dataset was created through a meticulous annotation process, ensuring high-quality and reliable event data. We also provide comprehensive baseline models offering detailed descriptions of their architecture and performance metrics. These models serve as benchmarks for future research, facilitating comparisons and improvements. Our analysis of VidEvent and the baseline models highlights the dataset's potential to advance video event understanding and encourages the exploration of innovative algorithms and models. The dataset and related resources are publicly available at www.videvent.top.

machine learning, natural language, relation, (13 more...)

arXiv.org Artificial Intelligence

2506.02448

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Leveraging Full Dependency Parsing Graph Information For Biomedical Event Extraction

Noravesh, Farshad, Haffari, Reza, Fang, Ong Huey, Soon, Layki, Rajalana, Sailaja, Pal, Arghya

arXiv.org Artificial IntelligenceJan-2-2025

Many models are proposed in the literature on biomedical event extraction(BEE). Some of them use the shortest dependency path(SDP) information to represent the argument classification task. There is an issue with this representation since even missing one word from the dependency parsing graph may totally change the final prediction. To this end, the full adjacency matrix of the dependency graph is used to embed individual tokens using a graph convolutional network(GCN). An ablation study is also done to show the effect of the dependency graph on the overall performance. The results show a significant improvement when dependency graph information is used. The proposed model slightly outperforms state-of-the-art models on BEE over different datasets.

computational linguistic, dependency, extraction, (16 more...)

arXiv.org Artificial Intelligence

2501.01158

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Malaysia (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
(5 more...)

Genre:

Research Report > Promising Solution (0.35)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.75)

Add feedback

Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering

Wang, Zimu, Xia, Lei, Wang, Wei, Du, Xinya

arXiv.org Artificial IntelligenceOct-7-2024

As an essential task in information extraction (IE), Event-Event Causal Relation Extraction (ECRE) aims to identify and classify the causal relationships between event mentions in natural language texts. However, existing research on ECRE has highlighted two critical challenges, including the lack of document-level modeling and causal hallucinations. In this paper, we propose a Knowledge-guided binary Question Answering (KnowQA) method with event structures for ECRE, consisting of two stages: Event Structure Construction and Binary Question Answering. We conduct extensive experiments under both zero-shot and fine-tuning settings with large language models (LLMs) on the MECI and MAVEN-ERE datasets. Experimental results demonstrate the usefulness of event structures on document-level ECRE and the effectiveness of KnowQA by achieving state-of-the-art on the MECI dataset. We observe not only the effectiveness but also the high generalizability and low inconsistency of our method, particularly when with complete event structures after fine-tuning the models.

causal relationship, event structure, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2410.04752

Country:

Europe > Ukraine > Kyiv Oblast > Kyiv (0.04)
North America > United States > Texas (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Infrastructure & Services (0.46)
Transportation > Air (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Revealing Vision-Language Integration in the Brain with Multimodal Networks

Subramaniam, Vighnesh, Conwell, Colin, Wang, Christopher, Kreiman, Gabriel, Katz, Boris, Cases, Ignacio, Barbu, Andrei

arXiv.org Artificial IntelligenceJun-20-2024

We use (multi)modal deep neural networks (DNNs) to probe for sites of multimodal integration in the human brain by predicting stereoencephalography (SEEG) recordings taken while human subjects watched movies. We operationalize sites of multimodal integration as regions where a multimodal vision-language model predicts recordings better than unimodal language, unimodal vision, or linearly-integrated language-vision models. Our target DNN models span different architectures (e.g., convolutional networks and transformers) and multimodal training techniques (e.g., cross-attention and contrastive learning). As a key enabling step, we first demonstrate that trained vision and language models systematically outperform their randomly initialized counterparts in their ability to predict SEEG signals. We then compare unimodal and multimodal models against one another. Because our target DNN models often have different architectures, number of parameters, and training sets (possibly obscuring those differences attributable to integration), we carry out a controlled comparison of two models (SLIP and SimCLR), which keep all of these attributes the same aside from input modality. Using this approach, we identify a sizable number of neural sites (on average 141 out of 1090 total sites or 12.94%) and brain regions where multimodal integration seems to occur. Additionally, we find that among the variants of multimodal training techniques we assess, CLIP-style training is the best suited for downstream prediction of the neural activity in these sites.

dataset, electrode, integration, (13 more...)

arXiv.org Artificial Intelligence

2406.14481

Country:

Europe > Austria > Vienna (0.14)
Oceania > Australia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

A Survey of Video Datasets for Grounded Event Understanding

Sanders, Kate, Van Durme, Benjamin

arXiv.org Artificial IntelligenceJun-13-2024

While existing video benchmarks largely consider specialized downstream tasks like retrieval or question-answering (QA), contemporary multimodal AI systems must be capable of well-rounded common-sense reasoning akin to human visual understanding. A critical component of human temporal-visual perception is our ability to identify and cognitively model "things happening", or events. Historically, video benchmark tasks have implicitly tested for this ability (e.g., video captioning, in which models describe visual events with natural language), but they do not consider video event understanding as a task in itself. Recent work has begun to explore video analogues to textual event extraction but consists of competing task definitions and datasets limited to highly specific event types. Therefore, while there is a rich domain of event-centric video research spanning the past 10+ years, it is unclear how video event understanding should be framed and what resources we have to study it. In this paper, we survey 105 video datasets that require event understanding capability, consider how they contribute to the study of robust event understanding in video, and assess proposed video event extraction tasks in the context of this body of research. We propose suggestions informed by this survey for dataset curation and task framing, with an emphasis on the uniquely temporal nature of video events and ambiguity in visual content.

dataset, proceedings, video, (12 more...)

arXiv.org Artificial Intelligence

2406.09646

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Virginia > Fairfax County > Fairfax (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > China > Henan Province > Zhengzhou (0.04)

Genre: Overview (1.00)

Industry:

Leisure & Entertainment (0.68)
Education (0.68)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Targeted Augmentation for Low-Resource Event Extraction

Wang, Sijia, Huang, Lifu

arXiv.org Artificial IntelligenceMay-14-2024

Addressing the challenge of low-resource information extraction remains an ongoing issue due to the inherent information scarcity within limited training examples. Existing data augmentation methods, considered potential solutions, struggle to strike a balance between weak augmentation (e.g., synonym augmentation) and drastic augmentation (e.g., conditional generation without proper guidance). This paper introduces a novel paradigm that employs targeted augmentation and back validation to produce augmented examples with enhanced diversity, polarity, accuracy, and coherence. Extensive experimental results demonstrate the effectiveness of the proposed paradigm. Furthermore, identified limitations are discussed, shedding light on areas for future improvement.

computational linguistic, event structure, extraction, (14 more...)

arXiv.org Artificial Intelligence

2405.08729

Country:

North America > United States > Nevada (0.05)
North America > Dominican Republic (0.04)
North America > Canada > Ontario > Toronto (0.04)
(8 more...)

Genre: Research Report > New Finding (0.48)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
(2 more...)

Add feedback

Programming Distributed Collective Processes in the eXchange Calculus

Audrito, Giorgio, Casadei, Roberto, Damiani, Ferruccio, Torta, Gianluca, Viroli, Mirko

arXiv.org Artificial IntelligenceJan-20-2024

Recent trends like the Internet of Things (IoT) suggest a vision of dense and multi-scale deployments of computing devices in nearly all kinds of environments. A prominent engineering challenge revolves around programming the collective adaptive behaviour of such computational ecosystems. This requires abstractions able to capture concepts like ensembles (dynamic groups of cooperating devices) and collective tasks (joint activities carried out by ensembles). In this work, we consider collections of devices interacting with neighbours and that execute in nearly-synchronised sense-compute-interact rounds, where the computation is given by a single program mapping sensing values and incoming messages to output and outcoming messages. To support programming whole computational collectives, we propose the abstraction of a distributed collective process, which can be used to define at once the ensemble formation logic and its collective task. We formalise the abstraction in the eXchange Calculus (XC), a core functional language based on neighbouring values (maps from neighbours to values) where state and interaction is handled through a single primitive, exchange, and provide a corresponding implementation in the FCPP language. Then, we exercise distributed collective processes using two case studies: multi-hop message propagation and distributed monitoring of spatial properties. Finally, we discuss the features of the abstraction and its suitability for different kinds of distributed computing applications.

computation, dcp, neighbour, (12 more...)

arXiv.org Artificial Intelligence

2401.11212

Country:

North America > United States > California > San Francisco County > San Francisco (0.28)
Europe > Denmark > Capital Region > Kongens Lyngby (0.14)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
(18 more...)

Genre: Research Report (0.50)

Industry: Information Technology > Smart Houses & Appliances (0.34)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Internet of Things (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(3 more...)

Add feedback