AITopics

The VoxCeleb Speaker Recognition Challenges (VoxSRC) were a series of challenges and workshops that ran annually from 2019 to 2023. The challenges primarily evaluated the tasks of speaker recognition and diarisation under various settings including: closed and open training data; as well as supervised, self-supervised, and semi-supervised training for domain adaptation. The challenges also provided publicly available training and evaluation datasets for each task and setting, with new test sets released each year. In this paper, we provide a review of these challenges that covers: what they explored; the methods developed by the challenge participants and how these evolved; and also the current state of the field for speaker verification and diarisation. We chart the progress in performance over the five installments of the challenge on a common evaluation dataset and provide a detailed analysis of how each year's special focus affected participants' performance. This paper is aimed both at researchers who want an overview of the speaker recognition and diarisation field, and also at challenge organisers who want to benefit from the successes and avoid the mistakes of the VoxSRC challenges. We end with a discussion of the current strengths of the field and open challenges. Project page : https://mm.kaist.ac.kr/datasets/voxceleb/voxsrc/workshop.html

artificial intelligence, machine learning, pattern recognition, (18 more...)

doi: 10.1109/TASLP.2024.3444456

2408.14886

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Pennsylvania (0.04)
(11 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.46)

Industry:

Information Technology > Security & Privacy (0.92)
Media (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.91)
(2 more...)

Toward Time-Continuous Data Inference in Sparse Urban CrowdSensing

Sun, Ziyu, Su, Haoyang, Sun, Hanqi, Wang, En, Liu, Wenbin

Mobile Crowd Sensing (MCS) is a promising paradigm that leverages mobile users and their smart portable devices to perform various real-world tasks. However, due to budget constraints and the inaccessibility of certain areas, Sparse MCS has emerged as a more practical alternative, collecting data from a limited number of target subareas and utilizing inference algorithms to complete the full sensing map. While existing approaches typically assume a time-discrete setting with data remaining constant within each sensing cycle, this simplification can introduce significant errors, especially when dealing with long cycles, as real-world sensing data often changes continuously. In this paper, we go from fine-grained completion, i.e., the subdivision of sensing cycles into minimal time units, towards a more accurate, time-continuous completion. We first introduce Deep Matrix Factorization (DMF) as a neural network-enabled framework and enhance it with a Recurrent Neural Network (RNN-DMF) to capture temporal correlations in these finer time slices. To further deal with the continuous data, we propose TIME-DMF, which captures temporal information across unequal intervals, enabling time-continuous completion. Additionally, we present the Query-Generate (Q-G) strategy within TIME-DMF to model the infinite states of continuous data. Extensive experiments across five types of sensing tasks demonstrate the effectiveness of our models and the advantages of time-continuous completion.

completion, information, temporal information, (17 more...)

2408.16027

Country:

Europe > United Kingdom > England (0.05)
Asia > China > Beijing > Beijing (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Industry:

Telecommunications (0.66)
Information Technology > Networks (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Panoptic Perception for Autonomous Driving: A Survey

Li, Yunge, Xu, Lanyu

Traditional methodologies in vehicular perception typically compartmentalize tasks such as object detection, instance segmentation, and semantic segmentation, addressing each in isolation. While these modular approaches yield valuable insights, they fall short of providing an integrated, holistic understanding of the multifaceted driving environment. This limitation underscores the necessity for panoptic perception, an approach designed to unify disparate perception tasks within a comprehensive framework. The concept of panoptic perception was first introduced in the YOLOP[104] in 2021, marking a significant advancement in autonomous driving. Before this, and continuing presently, numerous multi-task networks have been developed with similar objectives, striving to enhance environmental perception in autonomous driving by unifying various perception tasks within a single, cohesive framework.

autonomous driving, detection, segmentation, (11 more...)

2408.15388

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Asia > India (0.04)
(10 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Temporal Graph Neural Network-Powered Paper Recommendation on Dynamic Citation Networks

Shen, Junhao, Haqqani, Mohammad Ausaf Ali, Hu, Beichen, Huang, Cheng, Xie, Xihao, Lee, Tsengdar, Zhang, Jia

Due to the rapid growth of scientific publications, identifying all related reference articles in the literature has become increasingly challenging yet highly demanding. Existing methods primarily assess candidate publications from a static perspective, focusing on the content of articles and their structural information, such as citation relationships. There is a lack of research regarding how to account for the evolving impact among papers on their embeddings. Toward this goal, this paper introduces a temporal dimension to paper recommendation strategies. The core idea is to continuously update a paper's embedding when new citation relationships appear, enhancing its relevance for future recommendations. Whenever a citation relationship is added to the literature upon the publication of a paper, the embeddings of the two related papers are updated through a Temporal Graph Neural Network (TGN). A learnable memory update module based on a Recurrent Neural Network (RNN) is utilized to study the evolution of the embedding of a paper in order to predict its reference impact in a future timestamp. Such a TGN-based model learns a pattern of how people's views of the paper may evolve, aiming to guide paper recommendations more precisely. Extensive experiments on an open citation network dataset, including 313,278 articles from https://paperswithcode.com/about PaperWithCode, have demonstrated the effectiveness of the proposed approach.

citation network, graph, node, (15 more...)

2408.15371

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Overview (1.00)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

MiWaves Reinforcement Learning Algorithm

Ghosh, Susobhan, Guo, Yongyi, Hung, Pei-Yao, Coughlin, Lara, Bonar, Erin, Nahum-Shani, Inbal, Walton, Maureen, Murphy, Susan

The escalating prevalence of cannabis use poses a significant public health challenge globally. In the U.S., cannabis use is more prevalent among emerging adults (EAs) (ages 18-25) than any other age group, with legalization in the multiple states contributing to a public perception that cannabis is less risky than in prior decades. To address this growing concern, we developed MiWaves, a reinforcement learning (RL) algorithm designed to optimize the delivery of personalized intervention prompts to reduce cannabis use among EAs. MiWaves leverages domain expertise and prior data to tailor the likelihood of delivery of intervention messages. This paper presents a comprehensive overview of the algorithm's design, including key decisions and experimental outcomes. The finalized MiWaves RL algorithm was deployed in a clinical trial from March to May 2024.

algorithm, cannabis, participant, (16 more...)

2408.15076

Country:

North America > United States > Michigan (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Brain-inspired Artificial Intelligence: A Comprehensive Review

Ren, Jing, Xia, Feng

Current artificial intelligence (AI) models often focus on enhancing performance through meticulous parameter tuning and optimization techniques. However, the fundamental design principles behind these models receive comparatively less attention, which can limit our understanding of their potential and constraints. This comprehensive review explores the diverse design inspirations that have shaped modern AI models, i.e., brain-inspired artificial intelligence (BIAI). We present a classification framework that categorizes BIAI approaches into physical structure-inspired and human behavior-inspired models. We also examine the real-world applications where different BIAI models excel, highlighting their practical benefits and deployment challenges. By delving into these areas, we provide new insights and propose future research directions to drive innovation and address current gaps in the field. This review offers researchers and practitioners a comprehensive overview of the BIAI landscape, helping them harness its potential and expedite advancements in AI development.

brain, human brain, information, (13 more...)

2408.14811

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(5 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.67)

Industry:

Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(7 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(13 more...)

Ali, Wazir, Pyysalo, Sampo

A Survey of Large Language Models for European Languages

Large Language Models (LLMs) have gained significant attention due to their high performance on a wide range of natural language tasks since the release of ChatGPT. The LLMs learn to understand and generate language by training billions of model parameters on vast volumes of text data. Despite being a relatively new field, LLM research is rapidly advancing in various directions. In this paper, we present an overview of LLM families, including LLaMA, PaLM, GPT, and MoE, and the methods developed to create and enhance LLMs for official European Union (EU) languages. We provide a comprehensive summary of common monolingual and multilingual datasets used for pretraining large language models.

dataset, language model, proceedings, (15 more...)

2408.1504

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Europe > Sweden > Uppsala County > Uppsala (0.04)
(40 more...)

Genre: Overview (1.00)

Industry:

Media > News (0.93)
Government > Regional Government > Europe Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Time Series Analysis for Education: Methods, Applications, and Future Directions

Mao, Shengzhong, Zhang, Chaoli, Song, Yichi, Wang, Jindong, Zeng, Xiao-Jun, Xu, Zenglin, Wen, Qingsong

Recent advancements in the collection and analysis of sequential educational data have brought time series analysis to a pivotal position in educational research, highlighting its essential role in facilitating data-driven decision-making. However, there is a lack of comprehensive summaries that consolidate these advancements. To the best of our knowledge, this paper is the first to provide a comprehensive review of time series analysis techniques specifically within the educational context. We begin by exploring the landscape of educational data analytics, categorizing various data sources and types relevant to education. We then review four prominent time series methods-forecasting, classification, clustering, and anomaly detection-illustrating their specific application points in educational settings. Subsequently, we present a range of educational scenarios and applications, focusing on how these methods are employed to address diverse educational tasks, which highlights the practical integration of multiple time series methods to solve complex educational problems. Finally, we conclude with a discussion on future directions, including personalized learning analytics, multimodal data fusion, and the role of large language models (LLMs) in educational time series. The contributions of this paper include a detailed taxonomy of educational data, a synthesis of time series techniques with specific educational applications, and a forward-looking perspective on emerging trends and future research opportunities in educational analysis. The related papers and resources are available and regularly updated at the project page.

application, prediction, student, (16 more...)

2408.1396

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(16 more...)

Genre:

Research Report (1.00)
Overview (1.00)
Instructional Material > Online (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (1.00)
(5 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

Causal Rule Forest: Toward Interpretable and Precise Treatment Effect Estimation

Hsu, Chan, Wu, Jun-Ting, Kang, Yihuang

Understanding and inferencing Heterogeneous Treatment Effects (HTE) and Conditional Average Treatment Effects (CATE) are vital for developing personalized treatment recommendations. Many state-of-the-art approaches achieve inspiring performance in estimating HTE on benchmark datasets or simulation studies. However, the indirect predicting manner and complex model architecture reduce the interpretability of these approaches. To mitigate the gap between predictive performance and heterogeneity interpretability, we introduce the Causal Rule Forest (CRF), a novel approach to learning hidden patterns from data and transforming the patterns into interpretable multi-level Boolean rules. By training the other interpretable causal inference models with data representation learned by CRF, we can reduce the predictive errors of these models in estimating HTE and CATE, while keeping their interpretability for identifying subgroups that a treatment is more effective. Our experiments underscore the potential of CRF to advance personalized interventions and policies, paving the way for future research to enhance its scalability and application across complex causal inference challenges.

estimation, representation, treatment effect, (12 more...)

2408.15055

Country:

North America > United States (0.14)
Asia > Taiwan > Takao Province > Kaohsiung (0.05)

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.68)
Overview > Innovation (0.54)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)

Singh, Shruti, Singh, Muskaan, Kadyan, Virender

Speech Recognition Transformers: Topological-lingualism Perspective

Transformers have evolved with great success in various artificial intelligence tasks. Thanks to our recent prevalence of self-attention mechanisms, which capture long-term dependency, phenomenal outcomes in speech processing and recognition tasks have been produced. The paper presents a comprehensive survey of transformer techniques oriented in speech modality. The main contents of this survey include (1) background of traditional ASR, end-to-end transformer ecosystem, and speech transformers (2) foundational models in a speech via lingualism paradigm, i.e., monolingual, bilingual, multilingual, and cross-lingual (3) dataset and languages, acoustic features, architecture, decoding, and evaluation metric from a specific topological lingualism perspective (4) popular speech transformer toolkit for building end-to-end ASR systems. Finally, highlight the discussion of open challenges and potential research directions for the community to conduct further research in this domain.

arxiv preprint arxiv, recognition, speech recognition, (9 more...)

2408.14991

Country:

North America > United States > New York (0.04)
Asia > India > Uttarakhand > Dehradun (0.04)
Asia > East Asia (0.04)
(2 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Education (0.45)
Media (0.45)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)