AITopics

2306.16291

Country: Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.86)

arXiv.org Artificial IntelligenceJun-20-2023

A Survey on Safety-Critical Driving Scenario Generation -- A Methodological Perspective

Ding, Wenhao, Xu, Chejian, Arief, Mansur, Lin, Haohong, Li, Bo, Zhao, Ding

Autonomous driving systems have witnessed a significant development during the past years thanks to the advance in machine learning-enabled sensing and decision-making algorithms. One critical challenge for their massive deployment in the real world is their safety evaluation. Most existing driving systems are still trained and evaluated on naturalistic scenarios collected from daily life or heuristically-generated adversarial ones. However, the large population of cars, in general, leads to an extremely low collision rate, indicating that the safety-critical scenarios are rare in the collected real-world data. Thus, methods to artificially generate scenarios become crucial to measure the risk and reduce the cost. In this survey, we focus on the algorithms of safety-critical scenario generation in autonomous driving. We first provide a comprehensive taxonomy of existing algorithms by dividing them into three categories: data-driven generation, adversarial generation, and knowledge-based generation. Then, we discuss useful tools for scenario generation, including simulation platforms and packages. Finally, we extend our discussion to five main challenges of current works -- fidelity, efficiency, diversity, transferability, controllability -- and research opportunities lighted up by these challenges.

artificial intelligence, deep learning, machine learning, (17 more...)

2202.02215

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California (0.14)
North America > United States > Illinois > Champaign County > Urbana (0.14)
(5 more...)

Genre: Overview (1.00)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

arXiv.org Artificial IntelligenceJun-19-2023

Unsupervised Framework for Evaluating and Explaining Structural Node Embeddings of Graphs

Dehghan, Ashkan, Siuta, Kinga, Skorupka, Agata, Betlen, Andrei, Miller, David, Kaminski, Bogumil, Pralat, Pawel

An embedding is a mapping from a set of nodes of a network into a real vector space. Embeddings can have various aims like capturing the underlying graph topology and structure, node-to-node relationship, or other relevant information about the graph, its subgraphs or nodes themselves. A practical challenge with using embeddings is that there are many available variants to choose from. Selecting a small set of most promising embeddings from the long list of possible options for a given task is challenging and often requires domain expertise. Embeddings can be categorized into two main types: classical embeddings and structural embeddings. Classical embeddings focus on learning both local and global proximity of nodes, while structural embeddings learn information specifically about the local structure of nodes' neighbourhood. For classical node embeddings there exists a framework which helps data scientists to identify (in an unsupervised way) a few embeddings that are worth further investigation. Unfortunately, no such framework exists for structural embeddings. In this paper we propose a framework for unsupervised ranking of structural graph embeddings. The proposed framework, apart from assigning an aggregate quality score for a structural embedding, additionally gives a data scientist insights into properties of this embedding. It produces information which predefined node features the embedding learns, how well it learns them, and which dimensions in the embedded space represent the predefined node features. Using this information the user gets a level of explainability to an otherwise complex black-box embedding algorithm.

artificial intelligence, data mining, machine learning, (19 more...)

2306.1077

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Poland > Masovia Province > Warsaw (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Sun, Xiaotong, Chen, Xi, Stasinakis, Charalampos, Sermpinis, Georgios

Voter Coalitions and democracy in Decentralized Finance: Evidence from MakerDAO

arXiv.org Artificial IntelligenceJun-19-2023

Decentralized Autonomous Organization (DAO) provides a decentralized governance solution through blockchain, where decision-making process relies on on-chain voting and follows majority rule. This paper focuses on MakerDAO, and we find three voter coalitions after applying clustering algorithm to voting history. The emergence of a dominant voter coalition is a signal of governance centralization in DAO, and voter coalitions have complicated influence on Maker protocol, which is governed by MakerDAO. This paper presents empirical evidence of multicoalition democracy in DAO and further contributes to the contemporary debate on whether decentralized governance is possible.

artificial intelligence, machine learning, social media, (16 more...)

2210.11203

Country:

Europe > United Kingdom (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Government (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Communications > Social Media (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Ariu, Kaito, Proutiere, Alexandre, Yun, Se-Young

Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model

Community detection or clustering refers to the task of gathering similar items into a few groups from the data that, most often, correspond to observations of pair-wise interactions between items Newman and Girvan [2004]. A benchmark commonly used to assess the performance of clustering algorithms is the celebrated Stochastic Block Model (SBM) Holland et al. [1983], where pair-wise interactions are represented by a random graph. In this graph, the vertices correspond to items, and the presence of an edge between two items indicates their interaction. The SBM has been extensively studied over the last two decades; for a recent survey, see Abbe [2018]. However, it provides a relatively simplistic view of how items may interact. In real applications, interactions can be of different types (e.g., represented by ratings in recommender systems or a level of proximity between users in a social network). To capture this richer information about item interactions, the Labeled Stochastic Block Model (LSBM), proposed and analyzed in Heimlicher et al. [2012], Lelarge et al. [2013], Yun and Proutiere [2016], describes interactions by labels drawn from an arbitrary collection. The objective of this paper is to devise a clustering algorithm that, based on the observation of these labels, reconstructs the clusters of items while minimizing the expected number of misclassified items. In the following, we formally introduce LSBMs and outline our results.

artificial intelligence, data mining, machine learning, (18 more...)

2306.12968

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

A Cloud-based Machine Learning Pipeline for the Efficient Extraction of Insights from Customer Reviews

Lakatos, Robert, Bogacsovics, Gergo, Harangi, Balazs, Lakatos, Istvan, Tiba, Attila, Toth, Janos, Szabo, Marianna, Hajdu, Andras

Abstract-- The efficiency of natural language processing the text may be considered noise, depending on how we view has improved dramatically with the advent of the data and what we think is relevant. This in turn makes machine learning models, particularly neural networkbased it difficult to solve this task: it is not enough to find some solutions. However, some tasks are still challenging, specific information in texts, but we also have to decide what especially when considering specific domains. In this information we need based on the texts. Our application is an end-to-end methods integrated into a pipeline. For topic modeling, system that runs on a cloud-based infrastructure and can be our composite model uses transformer-based neural networks used as a service by small and medium-sized businesses. Furthermore, our some aspects, outperforms currently available approaches. Our environment, we needed a model that could identify topics approach is validated and compared with other stateof-the-art in texts and provide a way to determine which topics are relevant, methods using publicly available datasets for given our analysis goals. We focused on these tools Users of social platforms, forums, and online stores generate because we found them to be the most appropriate for our a significant amount of textual data. One of the most goals based on the available literature. It was also important useful applications of machine learning-based text processing to us that these methods and complex models have a stable is to find words and phrases that describe the content of implementation, are verifiably executable in our cloud-based these texts.

artificial intelligence, machine learning, natural language, (21 more...)

2306.07786

Country:

Europe > Hungary (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry:

Education (0.93)
Information Technology > Services (0.90)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Feldstein, Jonathan, Phillips, Dominic, Tsamoura, Efthymia

Principled and Efficient Motif Finding for Structure Learning of Lifted Graphical Models

Structure learning is a core problem in AI central to the fields of neuro-symbolic AI and statistical relational learning. It consists in automatically learning a logical theory from data. The basis for structure learning is mining repeating patterns in the data, known as structural motifs. Finding these patterns reduces the exponential search space and therefore guides the learning of formulas. Despite the importance of motif learning, it is still not well understood. We present the first principled approach for mining structural motifs in lifted graphical models, languages that blend first-order logic with probabilistic models, which uses a stochastic process to measure the similarity of entities in the data. Our first contribution is an algorithm, which depends on two intuitive hyperparameters: one controlling the uncertainty in the entity similarity measure, and one controlling the softness of the resulting rules. Our second contribution is a preprocessing step where we perform hierarchical clustering on the data to reduce the search space to the most relevant data. Our third contribution is to introduce an O(n ln n) (in the size of the entities in the data) algorithm for clustering structurally-related data. We evaluate our approach using standard benchmarks and show that we outperform state-of-the-art structure learning approaches by up to 6% in terms of accuracy and up to 80% in terms of runtime.

artificial intelligence, machine learning, node, (17 more...)

2302.04599

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
North America > United States > Oregon (0.04)
(6 more...)

Genre: Research Report (0.82)

Industry: Education > Curriculum (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Vielhaben, Johanna, Blücher, Stefan, Strodthoff, Nils

Multi-dimensional concept discovery (MCD): A unifying framework with completeness guarantees

For the trustworthy application of XAI, in particular for high-stake decisions, a more global model understanding is required. To this end, concept-based methods have been proposed, which are however not guaranteed to be bound to the actual model reasoning. To circumvent this problem, we propose Multi-dimensional Concept Discovery (MCD) as an extension of previous approaches that fulfills a completeness relation on the level of concepts. Our method starts from general linear subspaces as concepts and does neither require reinforcing concept interpretability nor re-training of model parts. We propose sparse subspace clustering to discover improved concepts and fully leverage the potential of multi-dimensional subspaces. MCD offers two complementary analysis tools for concepts in input space: (1) concept activation maps, that show where a concept is expressed within a sample, allowing for concept characterization through prototypical samples, and (2) concept relevance heatmaps, that decompose the model decision into concept contributions. Both tools together enable a detailed global understanding of the model reasoning, which is guaranteed to relate to the model via a completeness relation. Thus, MCD paves the way towards more trustworthy concept-based XAI. We empirically demonstrate the superiority of MCD against more constrained concept definitions.

artificial intelligence, data mining, machine learning, (18 more...)

2301.11911

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania (0.04)
Europe > France (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.67)
Health & Medicine > Nuclear Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Huang, Tianyi, Cheng, Shenghui, Li, Stan Z., Zhang, Zhengjun

High-dimensional Clustering onto Hamiltonian Cycle

arXiv.org Artificial IntelligenceJun-17-2023

Clustering aims to group unlabelled samples based on their similarities. It has become a significant tool for the analysis of high-dimensional data. However, most of the clustering methods merely generate pseudo labels and thus are unable to simultaneously present the similarities between different clusters and outliers. This paper proposes a new framework called High-dimensional Clustering onto Hamiltonian Cycle (HCHC) to solve the above problems. First, HCHC combines global structure with local structure in one objective function for deep clustering, improving the labels as relative probabilities, to mine the similarities between different clusters while keeping the local structure in each cluster. Then, the anchors of different clusters are sorted on the optimal Hamiltonian cycle generated by the cluster similarities and mapped on the circumference of a circle. Finally, a sample with a higher probability of a cluster will be mapped closer to the corresponding anchor. In this way, our framework allows us to appreciate three aspects visually and simultaneously - clusters (formed by samples with high probabilities), cluster similarities (represented as circular distances), and outliers (recognized as dots far away from all clusters). The experiments illustrate the superiority of HCHC.

artificial intelligence, data mining, machine learning, (16 more...)

2304.14531

Country:

South America > Paraguay > Asunción > Asunción (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
North America > United States > Wisconsin (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.31)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Jana, Soham, Yang, Kun, Kulkarni, Sanjeev

Adversarially robust clustering with optimality guarantees

arXiv.org Artificial IntelligenceJun-16-2023

We consider the problem of clustering data points coming from sub-Gaussian mixtures. Existing methods that provably achieve the optimal mislabeling error, such as the Lloyd algorithm, are usually vulnerable to outliers. In contrast, clustering methods seemingly robust to adversarial perturbations are not known to satisfy the optimal statistical guarantees. We propose a simple algorithm that obtains the optimal mislabeling rate even when we allow adversarial outliers to be present. Our algorithm achieves the optimal error rate in constant iterations when a weak initialization condition is satisfied. In the absence of outliers, in fixed dimensions, our theoretical guarantees are similar to that of the Lloyd algorithm. Extensive experiments on various simulated data sets are conducted to support the theoretical guarantees of our method.

artificial intelligence, data mining, machine learning, (19 more...)

2306.09977

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(6 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)