AITopics

2408.1613

Country:

South America > Brazil > São Paulo (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Rhode Island (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Urban context and delivery performance: Modelling service time for cargo bikes and vans across diverse urban environments

Schrader, Maxwell, Kumar, Navish, Sørig, Esben, Yoon, Soonmyeong, Srivastava, Akash, Xu, Kai, Astefanoaei, Maria, Collignon, Nicolas

Light goods vehicles (LGV) used extensively in the last mile of delivery are one of the leading polluters in cities. Cargo-bike logistics and Light Electric Vehicles (LEVs) have been put forward as a high impact candidate for replacing LGVs. Studies have estimated over half of urban van deliveries being replaceable by cargo-bikes, due to their faster speeds, shorter parking times and more efficient routes across cities. However, the logistics sector suffers from a lack of publicly available data, particularly pertaining to cargo-bike deliveries, thus limiting the understanding of their potential benefits. Specifically, service time (which includes cruising for parking, and walking to destination) is a major, but often overlooked component of delivery time modelling. The aim of this study is to establish a framework for measuring the performance of delivery vehicles, with an initial focus on modelling service times of vans and cargo-bikes across diverse urban environments. We introduce two datasets that allow for in-depth analysis and modelling of service times of cargo bikes and use existing datasets to reason about differences in delivery performance across vehicle types. We introduce a modelling framework to predict the service times of deliveries based on urban context. We employ Uber's H3 index to divide cities into hexagonal cells and aggregate OpenStreetMap tags for each cell, providing a detailed assessment of urban context. Leveraging this spatial grid, we use GeoVex to represent micro-regions as points in a continuous vector space, which then serve as input for predicting vehicle service times. We show that geospatial embeddings can effectively capture urban contexts and facilitate generalizations to new contexts and cities. Our methodology addresses the challenge of limited comparative data available for different vehicle types within the same urban settings.

dataset, delivery, service time, (17 more...)

2409.0673

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Greater London > London (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.14)
(17 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.88)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Freight & Logistics Services (1.00)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Schlegel, Udo, Keim, Daniel A.

Interactive dense pixel visualizations for time series and model attribution explanations

The field of Explainable Artificial Intelligence (XAI) for Deep Neural Network models has developed significantly, offering numerous techniques to extract explanations from models. However, evaluating explanations is often not trivial, and differences in applied metrics can be subtle, especially with non-intelligible data. Thus, there is a need for visualizations tailored to explore explanations for domains with such data, e.g., time series. We propose DAVOTS, an interactive visual analytics approach to explore raw time series data, activations of neural networks, and attributions in a dense-pixel visualization to gain insights into the data, models' decisions, and explanations. To further support users in exploring large datasets, we apply clustering approaches to the visualized data domains to highlight groups and present ordering strategies for individual and combined data exploration to facilitate finding patterns. We visualize a CNN trained on the FordA dataset to demonstrate the approach.

attribution, time sery, visualization, (16 more...)

2408.15073

Country: Europe > Germany (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.71)

Fast and Modular Autonomy Software for Autonomous Racing Vehicles

Saba, Andrew, Adetunji, Aderotimi, Johnson, Adam, Kothari, Aadi, Sivaprakasam, Matthew, Spisak, Joshua, Bharatia, Prem, Chauhan, Arjun, Duff, Brendan Jr., Gasparro, Noah, King, Charles, Larkin, Ryan, Mao, Brian, Nye, Micah, Parashar, Anjali, Attias, Joseph, Balciunas, Aurimas, Brown, Austin, Chang, Chris, Gao, Ming, Heredia, Cindy, Keats, Andrew, Lavariega, Jose, Muckelroy, William III, Slavescu, Andre, Stathas, Nickolas, Suvarna, Nayana, Zhang, Chuan Tian, Scherer, Sebastian, Ramanan, Deva

Autonomous motorsports aim to replicate the human racecar driver with software and sensors. As in traditional motorsports, Autonomous Racing Vehicles (ARVs) are pushed to their handling limits in multi-agent scenarios at extremely high ($\geq 150mph$) speeds. This Operational Design Domain (ODD) presents unique challenges across the autonomy stack. The Indy Autonomous Challenge (IAC) is an international competition aiming to advance autonomous vehicle development through ARV competitions. While far from challenging what a human racecar driver can do, the IAC is pushing the state of the art by facilitating full-sized ARV competitions. This paper details the MIT-Pitt-RW Team's approach to autonomous racing in the IAC. In this work, we present our modular and fast approach to agent detection, motion planning and controls to create an autonomy stack. We also provide analysis of the performance of the software stack in single and multi-agent scenarios for rapid deployment in a fast-paced competition environment. We also cover what did and did not work when deployed on a physical system the Dallara AV-21 platform and potential improvements to address these shortcomings. Finally, we convey lessons learned and discuss limitations and future directions for improvement.

controller, detection, vehicle, (16 more...)

doi: 10.55417/fr.2024001

2408.15425

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Nevada > Clark County > Las Vegas (0.05)
North America > United States > Texas (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports > Motorsports (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.86)
(2 more...)

Time Series Analysis for Education: Methods, Applications, and Future Directions

Mao, Shengzhong, Zhang, Chaoli, Song, Yichi, Wang, Jindong, Zeng, Xiao-Jun, Xu, Zenglin, Wen, Qingsong

Recent advancements in the collection and analysis of sequential educational data have brought time series analysis to a pivotal position in educational research, highlighting its essential role in facilitating data-driven decision-making. However, there is a lack of comprehensive summaries that consolidate these advancements. To the best of our knowledge, this paper is the first to provide a comprehensive review of time series analysis techniques specifically within the educational context. We begin by exploring the landscape of educational data analytics, categorizing various data sources and types relevant to education. We then review four prominent time series methods-forecasting, classification, clustering, and anomaly detection-illustrating their specific application points in educational settings. Subsequently, we present a range of educational scenarios and applications, focusing on how these methods are employed to address diverse educational tasks, which highlights the practical integration of multiple time series methods to solve complex educational problems. Finally, we conclude with a discussion on future directions, including personalized learning analytics, multimodal data fusion, and the role of large language models (LLMs) in educational time series. The contributions of this paper include a detailed taxonomy of educational data, a synthesis of time series techniques with specific educational applications, and a forward-looking perspective on emerging trends and future research opportunities in educational analysis. The related papers and resources are available and regularly updated at the project page.

application, prediction, student, (16 more...)

2408.1396

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(16 more...)

Genre:

Research Report (1.00)
Overview (1.00)
Instructional Material > Online (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (1.00)
(5 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

arXiv.org Artificial IntelligenceAug-26-2024

Provable Imbalanced Point Clustering

Denisov, David, Feldman, Dan, Dolev, Shlomi, Segal, Michael

We suggest efficient and provable methods to compute an approximation for imbalanced point clustering, that is, fitting $k$-centers to a set of points in $\mathbb{R}^d$, for any $d,k\geq 1$. To this end, we utilize \emph{coresets}, which, in the context of the paper, are essentially weighted sets of points in $\mathbb{R}^d$ that approximate the fitting loss for every model in a given set, up to a multiplicative factor of $1\pm\varepsilon$. We provide [Section 3 and Section E in the appendix] experiments that show the empirical contribution of our suggested methods for real images (novel and reference), synthetic data, and real-world data. We also propose choice clustering, which by combining clustering algorithms yields better performance than each one separately.

approx-on-coreset, probability, quantization, (15 more...)

2408.14225

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel > Southern District > Beer-Sheva (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

arXiv.org Artificial IntelligenceAug-26-2024

Contrastive Learning Subspace for Text Clustering

Yong, Qian, Chen, Chen, Zhou, Xiabing

Contrastive learning has been frequently investigated to learn effective representations for text clustering tasks. While existing contrastive learning-based text clustering methods only focus on modeling instance-wise semantic similarity relationships, they ignore contextual information and underlying relationships among all instances that needs to be clustered. In this paper, we propose a novel text clustering approach called Subspace Contrastive Learning (SCL) which models cluster-wise relationships among instances. Specifically, the proposed SCL consists of two main modules: (1) a self-expressive module that constructs virtual positive samples and (2) a contrastive learning module that further learns a discriminative subspace to capture task-specific cluster-wise relationships among texts. Experimental results show that the proposed SCL method not only has achieved superior results on multiple task clustering datasets but also has less complexity in positive sample construction.

contrastive learning, representation, subspace, (13 more...)

2408.14119

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > China > Beijing > Beijing (0.04)
(17 more...)

Genre:

Research Report (0.84)
Instructional Material > Course Syllabus & Notes (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

José-García, Adán, Jacques, Julie, Chauvet, Clément, Sobanski, Vincent, Dhaenens, Clarisse

HBIC: A Biclustering Algorithm for Heterogeneous Datasets

arXiv.org Artificial IntelligenceAug-23-2024

Biclustering is an unsupervised machine-learning approach aiming to cluster rows and columns simultaneously in a data matrix. Several biclustering algorithms have been proposed for handling numeric datasets. However, real-world data mining problems often involve heterogeneous datasets with mixed attributes. To address this challenge, we introduce a biclustering approach called HBIC, capable of discovering meaningful biclusters in complex heterogeneous data, including numeric, binary, and categorical data. The approach comprises two stages: bicluster generation and bicluster model selection. In the initial stage, several candidate biclusters are generated iteratively by adding and removing rows and columns based on the frequency of values in the original matrix. In the second stage, we introduce two approaches for selecting the most suitable biclusters by considering their size and homogeneity. Through a series of experiments, we investigated the suitability of our approach on a synthetic benchmark and in a biomedical application involving clinical data of systemic sclerosis patients. The evaluation comparing our method to existing approaches demonstrates its ability to discover high-quality biclusters from heterogeneous data. Our biclustering approach is a starting point for heterogeneous bicluster discovery, leading to a better understanding of complex underlying data structures.

algorithm, bicluster, dataset, (14 more...)

2408.13217

Country:

Europe > France > Hauts-de-France > Nord > Lille (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.87)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area (0.70)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Upadhya, Nakul, Cohen, Eldan

NeurCAM: Interpretable Neural Clustering via Additive Models

arXiv.org Artificial IntelligenceAug-23-2024

Interpretable clustering algorithms aim to group similar data points while explaining the obtained groups to support knowledge discovery and pattern recognition tasks. While most approaches to interpretable clustering construct clusters using decision trees, the interpretability of trees often deteriorates on complex problems where large trees are required. In this work, we introduce the Neural Clustering Additive Model (NeurCAM), a novel approach to the interpretable clustering problem that leverages neural generalized additive models to provide fuzzy cluster membership with additive explanations of the obtained clusters. To promote sparsity in our model's explanations, we introduce selection gates that explicitly limit the number of features and pairwise interactions leveraged. Additionally, we demonstrate the capacity of our model to perform text clustering that considers the contextual representation of the texts while providing explanations for the obtained clusters based on uni- or bi-word terms. Extensive experiments show that NeurCAM achieves performance comparable to black-box methods on tabular datasets while remaining interpretable. Additionally, our approach significantly outperforms other interpretable clustering approaches when clustering on text data.

additive model, interpretable neural clustering, neurcam

2408.13361

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.53)

Corominas, Guillem Rodríguez, Blesa, Maria J., Blum, Christian

Accelerating the k-means++ Algorithm by Using Geometric Information

arXiv.org Artificial IntelligenceAug-23-2024

The k-means clustering is a widely used method in data clustering and unsupervised machine learning, aiming to divide a given dataset into k distinct, non-overlapping clusters. This division seeks to minimize the within-cluster variance. The k-means clustering problem becomes NP-hard when extended beyond a single dimension [3]. Despite this complexity, there are algorithms designed to find sufficiently good solutions within a reasonable amount of time. Among these, Lloyd's algorithm, also referred to as the standard algorithm or batch k-means, is the most renowned [42]. The k-means algorithm is one of the most popular algorithms in data mining [58, 32], mainly due to its simplicity, scalability, and guaranteed termination. However, its performance is highly sensible to the initial placement of the centers [5]. In fact, there is no general approximation expectation for Lloyd's algorithm that applies to all scenarios, i.e., an arbitrary initialization may lead to an arbitrarily bad clustering. Therefore, it is crucial to employ effective initialization methods [24].

algorithm, calculation, norm filter, (14 more...)

2408.13189

Country:

North America > Canada > Ontario > Toronto (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)