AITopics

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.98)

#artificialintelligenceNov-25-2021, 00:30:41 GMT

Learn How To Do K-Means Clustering On An Image

If you've ever read anything related to data science, machine learning or data mining, there is a high probability of you coming across clustering. Clustering is a process of classifying data in clusters based on how similar the data is. There are many clustering algorithms. One of the most known is the K-means algorithm. K-means clusters the data into a determined number of clusters.

algorithm, color quantization, quantization, (12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

#artificialintelligenceNov-23-2021, 12:34:17 GMT

Introduction to Components with Knime Analytics - Analytics Vidhya

This article was published as a part of the Data Science Blogathon. In the last article A Friendly Introduction to KNIME Analytics Platform I provided a brief insight into the open-source software KNIME Analytics Platform and what it is capable of. With the help of a customer segmentation example, I showed the general functions of KNIME Analytics Platform. This article takes up a topic that was briefly mentioned at the end of the last article: Components. I'll provide an in-depth explanation of what components are, what functionalities they have, and why they are useful.

configuration window, knime analytic platform, node, (11 more...)

Technology:

Information Technology > Data Science > Data Mining (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.31)

arXiv.org Artificial IntelligenceNov-23-2021

Structural clustering of volatility regimes for dynamic trading strategies

Prakash, Arjun, James, Nick, Menzies, Max, Francis, Gilad

We develop a new method to find the number of volatility regimes in a nonstationary financial time series by applying unsupervised learning to its volatility structure. We use change point detection to partition a time series into locally stationary segments and then compute a distance matrix between segment distributions. The segments are clustered into a learned number of discrete volatility regimes via an optimization routine. Using this framework, we determine a volatility clustering structure for financial indices, large-cap equities, exchange-traded funds and currency pairs. Our method overcomes the rigid assumptions necessary to implement many parametric regime-switching models, while effectively distilling a time series into several characteristic behaviours. Our results provide significant simplification of these time series and a strong descriptive analysis of prior behaviours of volatility. Finally, we create and validate a dynamic trading strategy that learns the optimal match between the current distribution of a time series and its past regimes, thereby making online risk-avoidance decisions in the present.

data mining, machine learning, regime, (15 more...)

doi: 10.1080/1350486X.2021.2007146

2004.09963

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.87)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Data Science > Data Mining (0.92)

Raj, Chahat, Meel, Priyanka

Is Dynamic Rumor Detection on social media Viable? An Unsupervised Perspective

arXiv.org Artificial IntelligenceNov-23-2021

With the growing popularity and ease of access to the internet, the problem of online rumors is escalating. People are relying on social media to gain information readily but fall prey to false information. There is a lack of credibility assessment techniques for online posts to identify rumors as soon as they arrive. Existing studies have formulated several mechanisms to combat online rumors by developing machine learning and deep learning algorithms. The literature so far provides supervised frameworks for rumor classification that rely on huge training datasets. However, in the online scenario where supervised learning is exigent, dynamic rumor identification becomes difficult. Early detection of online rumors is a challenging task, and studies relating to them are relatively few. It is the need of the hour to identify rumors as soon as they appear online. This work proposes a novel framework for unsupervised rumor detection that relies on an online post's content and social features using state-of-the-art clustering techniques. The proposed architecture outperforms several existing baselines and performs better than several supervised techniques. The proposed method, being lightweight, simple, and robust, offers the suitability of being adopted as a tool for online rumor identification.

algorithm, dataset, detection, (16 more...)

2111.11982

Country:

Asia > Russia (0.14)
North America > Canada > Ontario > Toronto (0.05)
North America > United States > Missouri (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Media > News (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)
Leisure & Entertainment (0.91)
Information Technology > Services (0.68)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Kim, Doyeon, Lee, Jeonghwan, Chung, Hye Won

A Worker-Task Specialization Model for Crowdsourcing: Efficient Inference and Fundamental Limits

arXiv.org Machine LearningNov-19-2021

Crowdsourcing system has emerged as an effective platform to label data with relatively low cost by using non-expert workers. However, inferring correct labels from multiple noisy answers on data has been a challenging problem, since the quality of answers varies widely across tasks and workers. Many previous works have assumed a simple model where the order of workers in terms of their reliabilities is fixed across tasks, and focused on estimating the worker reliabilities to aggregate answers with different weights. We propose a highly general $d$-type worker-task specialization model in which the reliability of each worker can change depending on the type of a given task, where the number $d$ of types can scale in the number of tasks. In this model, we characterize the optimal sample complexity to correctly infer labels with any given recovery accuracy, and propose an inference algorithm achieving the order-wise optimal bound. We conduct experiments both on synthetic and real-world datasets, and show that our algorithm outperforms the existing algorithms developed based on strict model assumptions.

algorithm, algorithm 1, subset-selection scheme, (15 more...)

arXiv.org Machine Learning

2111.1255

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > South Korea > Daejeon > Daejeon (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Padhee, Swati, Swygert, Kimberly, Micir, Ian

Exploring Language Patterns in a Medical Licensure Exam Item Bank

arXiv.org Artificial IntelligenceNov-19-2021

This study examines the use of natural language processing (NLP) models to evaluate whether language patterns used by item writers in a medical licensure exam might contain evidence of biased or stereotypical language. This type of bias in item language choices can be particularly impactful for items in a medical licensure assessment, as it could pose a threat to content validity and defensibility of test score validity evidence. To the best of our knowledge, this is the first attempt using machine learning (ML) and NLP to explore language bias on a large item bank. Using a prediction algorithm trained on clusters of similar item stems, we demonstrate that our approach can be used to review large item banks for potential biased language or stereotypical patient characteristics in clinical science vignettes. The findings may guide the development of methods to address stereotypical language patterns found in test items and enable an efficient updating of those items, if needed, to reflect contemporary norms, thereby improving the evidence to support the validity of the test scores.

accuracy, item stem, patient characteristic, (16 more...)

2111.10501

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(6 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.66)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Consumer Health (0.93)
Education (0.87)
Health & Medicine > Pharmaceuticals & Biotechnology (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

arXiv.org Artificial IntelligenceNov-18-2021

CLMB: deep contrastive learning for robust metagenomic binning

Zhang, Pengfei, Jiang, Zhengyuan, Wang, Yixuan, Li, Yu

The reconstruction of microbial genomes from large metagenomic datasets is a critical procedure for finding uncultivated microbial populations and defining their microbial functional roles. To achieve that, we need to perform metagenomic binning, clustering the assembled contigs into draft genomes. Despite the existing computational tools, most of them neglect one important property of the metagenomic data, that is, the noise. To further improve the metagenomic binning step and reconstruct better metagenomes, we propose a deep Contrastive Learning framework for Metagenome Binning (CLMB), which can efficiently eliminate the disturbance of noise and produce more stable and robust results. Essentially, instead of denoising the data explicitly, we add simulated noise to the training data and force the deep learning model to produce similar and stable representations for both the noise-free data and the distorted data. Consequently, the trained model will be robust to noise and handle it implicitly during usage. CLMB outperforms the previous state-of-the-art binning methods significantly, recovering the most near-complete genomes on almost all the benchmarking datasets (up to 17\% more reconstructed genomes compared to the second-best method). It also improves the performance of bin refinement, reconstructing 8-22 more high-quality genomes and 15-32 more middle-quality genomes than the second-best result. Impressively, in addition to being compatible with the binning refiner, single CLMB even recovers on average 15 more HQ genomes than the refiner of VAMB and Maxbin on the benchmarking datasets. CLMB is open-source and available at https://github.com/zpf0117b/CLMB/.

contig, dataset, genome, (15 more...)

2111.09656

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

#artificialintelligenceNov-17-2021, 16:26:55 GMT

How to perform topic modeling with Top2Vec

Topic modeling is a problem in natural language processing that has many real-world applications. Being able to discover topics within large sections of text helps us understand text data in greater detail. For many years, Latent Dirichlet Allocation (LDA) has been the most commonly used algorithm for topic modeling. The algorithm was first introduced in 2003 and treats topics as probability distributions for the occurrence of different words. If you want to see an example of LDA in action, you should check out my article below where I performed LDA on a fake news classification dataset.

algorithm, top2vec, vector, (16 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

#artificialintelligenceNov-16-2021, 01:18:42 GMT

Designing a Promotional Strategy for Alcoholic Drinks in Russia

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. Alcohol consumption in Russia remains among the highest in the world.

alcohol consumption, consumption, dataset, (16 more...)

Country:

Asia > Russia (0.62)
Europe > Russia > Northwestern Federal District > Vologda Oblast > Vologda (0.05)

Industry:

Health & Medicine > Therapeutic Area (0.38)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.71)