AITopics

doi: 10.1109/TMI.2021.3125459

2112.03694

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.90)

Madanapalli, Sharat Chandra, Mathai, Alex, Gharakheili, Hassan Habibi, Sivaraman, Vijay

Modeling Live Video Streaming: Real-Time Classification, QoE Inference, and Field Evaluation

arXiv.org Artificial IntelligenceDec-5-2021

Social media, professional sports, and video games are driving rapid growth in live video streaming, on platforms such as Twitch and YouTube Live. Live streaming experience is very susceptible to short-time-scale network congestion since client playback buffers are often no more than a few seconds. Unfortunately, identifying such streams and measuring their QoE for network management is challenging, since content providers largely use the same delivery infrastructure for live and video-on-demand (VoD) streaming, and packet inspection techniques (including SNI/DNS query monitoring) cannot always distinguish between the two. In this paper, we design, build, and deploy ReCLive: a machine learning method for live video detection and QoE measurement based on network-level behavioral characteristics. Our contributions are four-fold: (1) We analyze about 23,000 video streams from Twitch and YouTube, and identify key features in their traffic profile that differentiate live and on-demand streaming. We release our traffic traces as open data to the public; (2) We develop an LSTM-based binary classifier model that distinguishes live from on-demand streams in real-time with over 95% accuracy across providers; (3) We develop a method that estimates QoE metrics of live streaming flows in terms of resolution and buffer stall events with overall accuracies of 93% and 90%, respectively; and (4) Finally, we prototype our solution, train it in the lab, and deploy it in a live ISP network serving more than 7,000 subscribers. Our method provides ISPs with fine-grained visibility into live video streams, enabling them to measure and improve user experience.

resolution, video, video stream, (16 more...)

2112.02637

Country:

Oceania > Australia > New South Wales > Sydney (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(13 more...)

Genre: Research Report (0.81)

Industry:

Telecommunications > Networks (1.00)
Information Technology > Networks (1.00)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

#artificialintelligenceDec-3-2021, 06:25:21 GMT

What is the Performance Measure, learning algorithm?

In order to gauge the skills of a machine learning algorithm, we must design a quantitative measure of its performance. Usually, this performance measure P is restricted to the task T being administered by the system. Accuracy is simply the proportion of examples that the model produces the right output. We will also obtain equivalent information by measuring the error rate, the proportion of examples for which the model produces incorrect output. The 0–1 loss on a specific example is 0 if it's correctly classified and 1 if it's not.

criterion, objective, performance measure, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.39)

Cao, Longbing, Zhu, Chengzhang

Table2Vec: Automated Universal Representation Learning to Encode All-round Data DNA for Benchmarkable and Explainable Enterprise Data Science

arXiv.org Artificial IntelligenceDec-3-2021

Enterprise data typically involves multiple heterogeneous data sources and external data that respectively record business activities, transactions, customer demographics, status, behaviors, interactions and communications with the enterprise, and the consumption and feedback of its products, services, production, marketing, operations, and management, etc. A critical challenge in enterprise data science is to enable an effective whole-of-enterprise data understanding and data-driven discovery and decision-making on all-round enterprise DNA. We introduce a neural encoder Table2Vec for automated universal representation learning of entities such as customers from all-round enterprise DNA with automated data characteristics analysis and data quality augmentation. The learned universal representations serve as representative and benchmarkable enterprise data genomes and can be used for enterprise-wide and domain-specific learning tasks. Table2Vec integrates automated universal representation learning on low-quality enterprise data and downstream learning tasks. We illustrate Table2Vec in characterizing all-round customer data DNA in an enterprise on complex heterogeneous multi-relational big tables to build universal customer vector representations. The learned universal representation of each customer is all-round, representative and benchmarkable to support both enterprise-wide and domain-specific learning goals and tasks in enterprise data science. Table2Vec significantly outperforms the existing shallow, boosting and deep learning methods typically used for enterprise analytics. We further discuss the research opportunities, directions and applications of automated universal enterprise representation and learning and the learned enterprise data DNA for automated, all-purpose, whole-of-enterprise and ethical machine learning and data science.

customer, enterprise data science, representation, (12 more...)

2112.0183

Country:

Oceania > Australia > New South Wales > Sydney (0.14)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report (1.00)

Industry:

Banking & Finance (1.00)
Information Technology (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Probabilistic Approach for Road-Users Detection

Melotti, G., Lu, W., Zhao, D., Asvadi, A., Gonçalves, N., Premebida, C.

Object detection in autonomous driving applications implies that the detection and tracking of semantic objects are commonly native to urban driving environments, as pedestrians and vehicles. One of the major challenges in state-of-the-art deep-learning based object detection is false positive which occurrences with overconfident scores. This is highly undesirable in autonomous driving and other critical robotic-perception domains because of safety concerns. This paper proposes an approach to alleviate the problem of overconfident predictions by introducing a novel probabilistic layer to deep object detection networks in testing. The suggested approach avoids the traditional Sigmoid or Softmax prediction layer which often produces overconfident predictions. It is demonstrated that the proposed technique reduces overconfidence in the false positives without degrading the performance on the true positives. The approach is validated on the 2D-KITTI objection detection through the YOLOV4 and SECOND (Lidar-based detector). The proposed approach enables enabling interpretable probabilistic predictions without the requirement of re-training the network and therefore is very practical.

artificial intelligence, machine learning, prediction, (19 more...)

2112.0136

Country:

Europe > Portugal > Coimbra > Coimbra (0.04)
South America > Brazil (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (0.55)
Transportation > Ground > Road (0.54)
Automobiles & Trucks (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Magán-Carrión, Roberto, Urda, Daniel, Díaz-Cano, Ignacio, Dorronsoro, Bernabé

Improving the Reliability of Network Intrusion Detection Systems through Dataset Integration

This work presents Reliable-NIDS (R-NIDS), a novel methodology for Machine Learning (ML) based Network Intrusion Detection Systems (NIDSs) that allows ML models to work on integrated datasets, empowering the learning process with diverse information from different datasets. Therefore, R-NIDS targets the design of more robust models, that generalize better than traditional approaches. We also propose a new dataset, called UNK21. It is built from three of the most well-known network datasets (UGR'16, USNW-NB15 and NLS-KDD), each one gathered from its own network environment, with different features and classes, by using a data aggregation approach present in R-NIDS. Following R-NIDS, in this work we propose to build two well-known ML models (a linear and a non-linear one) based on the information of three of the most common datasets in the literature for NIDS evaluation, those integrated in UNK21. The results that the proposed methodology offers show how these two ML models trained as a NIDS solution could benefit from this approach, being able to generalize better when training on the newly proposed UNK21 dataset. Furthermore, these results are carefully analyzed with statistical tools that provide high confidence on our conclusions.

dataset, ml model, network dataset, (15 more...)

doi: 10.1109/TETC.2022.3178283

2112.0208

Country:

Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)
Europe > Spain > Castile and León > Burgos Province > Burgos (0.04)
Europe > Spain > Andalusia > Málaga Province > Málaga (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

An Automatic Approach for Generating Rich, Linked Geo-Metadata from Historical Map Images

Li, Zekun, Chiang, Yao-Yi, Tavakkol, Sasan, Shbita, Basel, Uhl, Johannes H., Leyk, Stefan, Knoblock, Craig A.

Historical maps contain detailed geographic information difficult to find elsewhere covering long-periods of time (e.g., 125 years for the historical topographic maps in the US). However, these maps typically exist as scanned images without searchable metadata. Existing approaches making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate the metadata (e.g., geolocations and keywords). Optical character recognition (OCR) software could alleviate the required manual work, but the recognition results are individual words instead of location phrases (e.g., "Black" and "Mountain" vs. "Black Mountain"). This paper presents an end-to-end approach to address the real-world problem of finding and indexing historical map images. This approach automatically processes historical map images to extract their text content and generates a set of metadata that is linked to large external geospatial knowledge bases. The linked metadata in the RDF (Resource Description Framework) format support complex queries for finding and indexing historical maps, such as retrieving all historical maps covering mountain peaks higher than 1,000 meters in California. We have implemented the approach in a system called mapKurator. We have evaluated mapKurator using historical maps from several sources with various map styles, scales, and coverage. Our results show significant improvement over the state-of-the-art methods. The code has been made publicly available as modules of the Kartta Labs project at https://github.com/kartta-labs/Project.

information, location phrase, text region, (15 more...)

2112.01671

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Computing Class Hierarchies from Classifiers

Kang, Kai, Lin, Fangzhen

A class or taxonomic hierarchy is often manually constructed, and part of our knowledge about the world. In this paper, we propose a novel algorithm for automatically acquiring a class hierarchy from a classifier which is often a large neural network these days. The information that we need from a classifier is its confusion matrix which contains, for each pair of base classes, the number of errors the classifier makes by mistaking one for another. Our algorithm produces surprisingly good hierarchies for some well-known deep neural network models trained on the CIFAR-10 dataset, a neural network model for predicting the native language of a non-native English speaker, a neural network model for detecting the language of a written text, and a classifier for identifying music genre. In the literature, such class hierarchies have been used to provide interpretability to the neural networks. We also discuss some other potential uses of the acquired hierarchies.

algorithm, class hierarchy, hierarchy, (16 more...)

2112.01187

Country:

Asia > China > Hong Kong (0.04)
Asia > India (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (0.93)
Media > Music (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

arXiv.org Machine LearningDec-1-2021

Asymmetric error control under imperfect supervision: a label-noise-adjusted Neyman-Pearson umbrella algorithm

Yao, Shunan, Rava, Bradley, Tong, Xin, James, Gareth

Label noise in data has long been an important problem in supervised learning applications as it affects the effectiveness of many widely used classification methods. Recently, important real-world applications, such as medical diagnosis and cybersecurity, have generated renewed interest in the Neyman-Pearson (NP) classification paradigm, which constrains the more severe type of error (e.g., the type I error) under a preferred level while minimizing the other (e.g., the type II error). However, there has been little research on the NP paradigm under label noise. It is somewhat surprising that even when common NP classifiers ignore the label noise in the training stage, they are still able to control the type I error with high probability. However, the price they pay is excessive conservativeness of the type I error and hence a significant drop in power (i.e., $1 - $ type II error). Assuming that domain experts provide lower bounds on the corruption severity, we propose the first theory-backed algorithm that adapts most state-of-the-art classification methods to the training label noise under the NP paradigm. The resulting classifiers not only control the type I error with high probability under the desired level but also improve power.

algorithm, algorithm 1, type ii error, (14 more...)

arXiv.org Machine Learning

2112.00314

Country:

North America > United States > California (0.14)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

arXiv.org Artificial IntelligenceDec-1-2021

Analysis of an adaptive lead weighted ResNet for multiclass classification of 12-lead ECGs

Zhao, Zhibin, Murphy, Darcy, Gifford, Hugh, Williams, Stefan, Darlington, Annie, Relton, Samuel D., Fang, Hui, Wong, David C.

Background: Twelve lead ECGs are a core diagnostic tool for cardiovascular diseases. Here, we describe and analyse an ensemble deep neural network architecture to classify 24 cardiac abnormalities from 12-lead ECGs. Method: We proposed a squeeze and excite ResNet to automatically learn deep features from 12-lead ECGs, in order to identify 24 cardiac conditions. The deep features were augmented with age and gender features in the final fully connected layers. Output thresholds for each class were set using a constrained grid search. To determine why the model made incorrect predictions, two expert clinicians independently interpreted a random set of 100 misclassified ECGs concerning Left Axis Deviation. Results: Using the bespoke weighted accuracy metric, we achieved a 5-fold cross validation score of 0.684, and sensitivity and specificity of 0.758 and 0.969, respectively. We scored 0.520 on the full test data, and ranked 2nd out of 41 in the official challenge rankings. On a random set of misclassified ECGs, agreement between two clinicians and training labels was poor (clinician 1: kappa = -0.057, clinician 2: kappa = -0.159). In contrast, agreement between the clinicians was very high (kappa = 0.92). Discussion: The proposed prediction model performed well on the validation and hidden test data in comparison to models trained on the same data. We also discovered considerable inconsistency in training labels, which is likely to hinder development of more accurate models.

adaptive lead weighted resnet, classification, multiclass classification, (14 more...)

2112.01496

Country:

Europe > United Kingdom > England > Leicestershire > Loughborough (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)