AITopics

2101.10472

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > Canada > Ontario > Middlesex County > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Sinhababu, Nilanjan, Saxena, Rahul, Sarma, Monalisa, Samanta, Debasis

Medical Information Retrieval and Interpretation: A Question-Answer based Interaction Model

arXiv.org Artificial IntelligenceJan-24-2021

The Internet has become a very powerful platform where diverse medical information are expressed daily. Recently, a huge growth is seen in searches like symptoms, diseases, medicines, and many other health related queries around the globe. The search engines typically populate the result by using the single query provided by the user and hence reaching to the final result may require a lot of manual filtering from the user's end. Current search engines and recommendation systems still lack real time interactions that may provide more precise result generation. This paper proposes an intelligent and interactive system tied up with the vast medical big data repository on the web and illustrates its potential in finding medical information.

algorithm, evaluation, information, (15 more...)

2101.09662

Country: Asia > India > West Bengal > Kharagpur (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Sinnathamby, Karthigan, Hou, Chang-Yu, Venkataramanan, Lalitha, Gkortsas, Vasileios-Marios, Fleuret, François

Unsupervised clustering of series using dynamic programming

arXiv.org Machine LearningJan-23-2021

Unsupervised clustering is a branch of machine learning that aims to categorize the data based on the self-similarity. In other word, data-points in the same group (called a cluster) are more similar to each other than to those in other groups. This task can be achieved by various algorithms (the well-known K-means or spectral clustering but also hierarchical clustering [1] or density-based clustering [2]) that differ significantly in their understanding of what constitutes a cluster and how to efficiently find them. In many cases, there exist models/functions, governed by a finite set of parameters, providing either physics or phenomenology correlations between input data. The presence of these models can in principle be used to characterize clusters (cluster characterization) because one can define a loss function to measure how well a point belongs to this cluster (cluster affiliation).

artificial intelligence, assignment, upstream oil & gas, (17 more...)

2101.09512

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Malhotra, Akanksha, Kamle, Sudhir

ARTH: Algorithm For Reading Text Handily -- An AI Aid for People having Word Processing Issues

arXiv.org Artificial IntelligenceJan-23-2021

The objective of this project is to solve one of the major problems faced by the people having word processing issues like trauma, or mild mental disability. "ARTH" is the short form of Algorithm for Reading Handily. ARTH is a self-learning set of algorithms that is an intelligent way of fulfilling the need for "reading and understanding the text effortlessly" which adjusts according to the needs of every user. The research project propagates in two steps. In the first step, the algorithm tries to identify the difficult words present in the text based on two features -- the number of syllables and usage frequency -- using a clustering algorithm. After the analysis of the clusters, the algorithm labels these clusters, according to their difficulty level. In the second step, the algorithm interacts with the user. It aims to test the user's comprehensibility of the text and his/her vocabulary level by taking an automatically generated quiz. The algorithm identifies the clusters which are difficult for the user, based on the result of the analysis. The meaning of perceived difficult words is displayed next to them. The technology "ARTH" focuses on the revival of the joy of reading among those people, who have a poor vocabulary or any word processing issues.

algorithm, arth, syllable, (11 more...)

2101.09464

Country: North America > United States > California (0.04)

Genre:

Workflow (0.49)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Honysz, Philipp-Jan, Buschjäger, Sebastian, Morik, Katharina

GPU-Accelerated Optimizer-Aware Evaluation of Submodular Exemplar Clustering

arXiv.org Artificial IntelligenceJan-21-2021

The optimization of submodular functions constitutes a viable way to perform clustering. Strong approximation guarantees and feasible optimization w.r.t. streaming data make this clustering approach favorable. Technically, submodular functions map subsets of data to real values, which indicate how "representative" a specific subset is. Optimal sets might then be used to partition the data space and to infer clusters. Exemplar-based clustering is one of the possible submodular functions, but suffers from high computational complexity. However, for practical applications, the particular real-time or wall-clock run-time is decisive. In this work, we present a novel way to evaluate this particular function on GPUs, which keeps the necessities of optimizers in mind and reduces wall-clock run-time. To discuss our GPU algorithm, we investigated both the impact of different run-time critical problem properties, like data dimensionality and the number of data points in a subset, and the influence of required floating-point precision. In reproducible experiments, our GPU algorithm was able to achieve competitive speedups of up to 72x depending on whether multi-threaded computation on CPUs was used for comparison and the type of floating-point precision required. Half-precision GPU computation led to large speedups of up to 452x compared to single-precision, single-thread CPU computations.

algorithm, speedup, submodular function, (16 more...)

2101.08763

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)
North America > United States > Nevada > Washoe County > Sparks (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Antonelli, Diego, Cascella, Roberta, Perrone, Gaetano, Romano, Simon Pietro, Schiano, Antonio

Leveraging AI to optimize website structure discovery during Penetration Testing

arXiv.org Artificial IntelligenceJan-18-2021

Dirbusting is a technique used to brute force directories and file names on web servers while monitoring HTTP responses, in order to enumerate server contents. Such a technique uses lists of common words to discover the hidden structure of the target website. Dirbusting typically relies on response codes as discovery conditions to find new pages. It is widely used in web application penetration testing, an activity that allows companies to detect websites vulnerabilities. Dirbusting techniques are both time and resource consuming and innovative approaches have never been explored in this field. We hence propose an advanced technique to optimize the dirbusting process by leveraging Artificial Intelligence. More specifically, we use semantic clustering techniques in order to organize wordlist items in different groups according to their semantic meaning. The created clusters are used in an ad-hoc implemented next-word intelligent strategy. This paper demonstrates that the usage of clustering techniques outperforms the commonly used brute force methods. Performance is evaluated by testing eight different web applications. Results show a performance increase that is up to 50% for each of the conducted experiments.

application, experiment, wordlist, (15 more...)

2101.07223

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Jiangsu Province > Changzhou (0.04)

Genre:

Research Report > Promising Solution (0.48)
Overview > Innovation (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.96)

arXiv.org Artificial IntelligenceJan-18-2021

CaEGCN: Cross-Attention Fusion based Enhanced Graph Convolutional Network for Clustering

Huo, Guangyu, Zhang, Yong, Gao, Junbin, Wang, Boyue, Hu, Yongli, Yin, Baocai

With the powerful learning ability of deep convolutional networks, deep clustering methods can extract the most discriminative information from individual data and produce more satisfactory clustering results. However, existing deep clustering methods usually ignore the relationship between the data. Fortunately, the graph convolutional network can handle such relationship, opening up a new research direction for deep clustering. In this paper, we propose a cross-attention based deep clustering framework, named Cross-Attention Fusion based Enhanced Graph Convolutional Network (CaEGCN), which contains four main modules: the cross-attention fusion module which innovatively concatenates the Content Auto-encoder module (CAE) relating to the individual data and Graph Convolutional Auto-encoder module (GAE) relating to the relationship between the data in a layer-by-layer manner, and the self-supervised model that highlights the discriminative information for clustering tasks. While the cross-attention fusion module fuses two kinds of heterogeneous representation, the CAE module supplements the content information for the GAE module, which avoids the over-smoothing problem of GCN. In the GAE module, two novel loss functions are proposed that reconstruct the content and relationship between the data, respectively. Finally, the self-supervised module constrains the distributions of the middle layer representations of CAE and GAE to be consistent. Experimental results on different types of datasets prove the superiority and robustness of the proposed CaEGCN.

information, module, representation, (14 more...)

2101.06883

Country:

North America > United States (0.29)
Asia > China > Beijing > Beijing (0.06)
Asia > China > Liaoning Province > Dalian (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report (0.82)

Industry:

Education (0.68)
Transportation > Ground (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Hurley, Catherine B., O'Connell, Mark, Domijan, Katarina

Interactive slice visualization for exploring machine learning models

arXiv.org Machine LearningJan-18-2021

Machine learning models fit complex algorithms to arbitrarily large datasets. These algorithms are well-known to be high on performance and low on interpretability. We use interactive visualization of slices of predictor space to address the interpretability deficit; in effect opening up the black-box of machine learning algorithms, for the purpose of interrogating, explaining, validating and comparing model fits. Slices are specified directly through interaction, or using various touring algorithms designed to visit high-occupancy sections or regions where the model fits have interesting properties. The methods presented here are implemented in the R package \pkg{condvis2}.

algorithm, predictor, visualization, (17 more...)

2101.06986

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
North America > United States > New Jersey > Hudson County > Secaucus (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Rodosthenous, Theodoulos, Shahrezaei, Vahid, Evangelou, Marina

Multi-view Data Visualisation via Manifold Learning

arXiv.org Machine LearningJan-17-2021

Manifold learning approaches, such as Stochastic Neighbour Embedding (SNE), Locally Linear Embedding (LLE) and Isometric Feature Mapping (ISOMAP) have been proposed for performing non-linear dimensionality reduction. These methods aim to produce two or three latent embeddings, in order to visualise the data in intelligible representations. This manuscript proposes extensions of Student's t-distributed SNE (t-SNE), LLE and ISOMAP, to allow for dimensionality reduction and subsequent visualisation of multi-view data. Nowadays, it is very common to have multiple data-views on the same samples. Each data-view contains a set of features describing different aspects of the samples. For example, in biomedical studies it is possible to generate multiple OMICS data sets for the same individuals, such as transcriptomics, genomics, epigenomics, enabling better understanding of the relationships between the different biological processes. Through the analysis of real and simulated datasets, the visualisation performance of the proposed methods is illustrated. Data visualisations have been often utilised for identifying any potential clusters in the data sets. We show that by incorporating the low-dimensional embeddings obtained via the multi-view manifold learning approaches into the K-means algorithm, clusters of the samples are accurately identified. Our proposed multi-SNE method outperforms the corresponding multi-ISOMAP and multi-LLE proposed methods. Interestingly, multi-SNE is found to have comparable performance with methods proposed in the literature for performing multi-view clustering.

algorithm, manifold, visualisation, (15 more...)

2101.06763

Country:

Europe > United Kingdom (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Chen, Xin, Zhang, Anderson Y.

Optimal Clustering in Anisotropic Gaussian Mixture Models

arXiv.org Machine LearningJan-17-2021

We study the clustering task under anisotropic Gaussian Mixture Models where the covariance matrices from different clusters are unknown and are not necessarily the identical matrix. We characterize the dependence of signal-to-noise ratios on the cluster centers and covariance matrices and obtain the minimax lower bound for the clustering problem. In addition, we propose a computationally feasible procedure and prove it achieves the optimal rate within a few iterations. The proposed procedure is a hard EM type algorithm, and it can also be seen as a variant of the Lloyd's algorithm that is adjusted to the anisotropic covariance matrices.

covariance matrix, probability, snr null, (16 more...)

2101.05402

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > New York (0.04)
North America > United States > California (0.04)

Genre: Research Report (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)