Victoria
A guide to machine learning in search: Key terms, concepts and algorithms
When it comes to machine learning, there are some broad concepts and terms that everyone in search should know. We should all know where machine learning is used, and the different types of machine learning that exist. Read on to gain a better grasp of how machine learning impacts search, what the search engines are doing and how to recognize machine learning at work. Let's start with a few definitions. Then we'll get into machine learning algorithms and models.
Predicting Time-to-conversion for Dementia of Alzheimer's Type using Multi-modal Deep Survival Analysis
Mirabnahrazam, Ghazal, Ma, Da, Beaulac, Cédric, Lee, Sieun, Popuri, Karteek, Lee, Hyunwoo, Cao, Jiguo, Galvin, James E, Wang, Lei, Beg, Mirza Faisal, Initiative, the Alzheimer's Disease Neuroimaging
Dementia of Alzheimer's Type (DAT) is a complex disorder influenced by numerous factors, but it is unclear how each factor contributes to disease progression. An in-depth examination of these factors may yield an accurate estimate of time-to-conversion to DAT for patients at various disease stages. We used 401 subjects with 63 features from MRI, genetic, and CDC (Cognitive tests, Demographic, and CSF) data modalities in the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. We used a deep learning-based survival analysis model that extends the classic Cox regression model to predict time-to-conversion to DAT. Our findings showed that genetic features contributed the least to survival analysis, while CDC features contributed the most. Combining MRI and genetic features improved survival prediction over using either modality alone, but adding CDC to any combination of features only worked as well as using only CDC features. Consequently, our study demonstrated that using the current clinical procedure, which includes gathering cognitive test results, can outperform survival analysis results produced using costly genetic or CSF data.
AI-Assisted Authentication: State of the Art, Taxonomy and Future Roadmap
Zhu, Guangyi, Al-Qaraghuli, Yasir
Abstract--Artificial Intelligence (AI) has found its applications in a variety of environments ranging from data science to cybersecurity. AI helps break through the limitations of traditional algorithms and provides more efficient and flexible methods for solving problems. In this paper, we focus on the applications of artificial intelligence in authentication, which is used in a wide range of scenarios including facial recognition to access buildings, keystroke dynamics to unlock smartphones. With the emerging AI-assisted authentication schemes, our comprehensive survey provides an overall understanding on a high level, which paves the way for future research in this area. In contrast to other relevant surveys, our research is the first of its kind to focus on the roles of AI in authentication. Learning and neural networks are The traditional password-based authentication method has two main mechanisms used in AI. Learning is the process of slowly faded out due to its inadequate ...
KL Divergence Estimation with Multi-group Attribution
Gopalan, Parikshit, Narodytska, Nina, Reingold, Omer, Sharan, Vatsal, Wieder, Udi
Estimating the Kullback-Leibler (KL) divergence between two distributions given samples from them is well-studied in machine learning and information theory. Motivated by considerations of multi-group fairness, we seek KL divergence estimates that accurately reflect the contributions of sub-populations to the overall divergence. We model the sub-populations coming from a rich (possibly infinite) family $\mathcal{C}$ of overlapping subsets of the domain. We propose the notion of multi-group attribution for $\mathcal{C}$, which requires that the estimated divergence conditioned on every sub-population in $\mathcal{C}$ satisfies some natural accuracy and fairness desiderata, such as ensuring that sub-populations where the model predicts significant divergence do diverge significantly in the two distributions. Our main technical contribution is to show that multi-group attribution can be derived from the recently introduced notion of multi-calibration for importance weights [HKRR18, GRSW21]. We provide experimental evidence to support our theoretical results, and show that multi-group attribution provides better KL divergence estimates when conditioned on sub-populations than other popular algorithms.
The 100 Most Disruptive Companies to Watch In 2021
Disruptive technology is the technology that affects the normal operation of a market or an industry. Digital disruption entails established companies and start-ups alike enlisting new technologies in the fight to dislodge incumbents, protect entrenched positions, or to re-invent entire industries and business activities. And to remain disruptive in the market, it is really important to keep innovating. This is crucial because, innovations occur now and then in every industry, however, to be truly disruptive, and innovation must entirely transform a product or solution that historically was so complicated only a few could access it. On a minimum level, digital transformation enables an organization to address the needs of its customers more simply and directly. But through disruptive innovation, companies can offer a far better way to users of doing things that current incumbents simply cannot compete with. Artificial intelligence (AI), E-Commerce, cloud, social networking, Internet of Things, 5G, blockchain and other emerging technologies are being leveraged to blur the lines between industries, creating new business models and converging sectors. A company that disrupts its market is in a great position to take advantage of new opportunities. Sometimes offering something different can change the whole market for the better. Most of the top disruptive companies get this label by offering highly innovative products and services and here are 100 such top disruptive companies listed below. The company provides innovative, managed cloud services to help its customers succeed. With best-in-class service and technology, 403Tech protects companies against cybercrimes while enabling greater efficiency and productivity. Some of its popular services include desktop support, server support, wired and wireless networking, virus removal, data recovery, and backup and hosted cloud services. Aegeus Technologies aims to design and develop robotic technologies and solutions.
UN-AVOIDS: Unsupervised and Nonparametric Approach for Visualizing Outliers and Invariant Detection Scoring
Yousef, Waleed A., Traore, Issa, Briguglio, William
The visualization and detection of anomalies (outliers) are of crucial importance to many fields, particularly cybersecurity. Several approaches have been proposed in these fields, yet to the best of our knowledge, none of them has fulfilled both objectives, simultaneously or cooperatively, in one coherent framework. The visualization methods of these approaches were introduced for explaining the output of a detection algorithm, not for data exploration that facilitates a standalone visual detection. This is our point of departure: UN-AVOIDS, an unsupervised and nonparametric approach for both visualization (a human process) and detection (an algorithmic process) of outliers, that assigns invariant anomalous scores (normalized to $[0,1]$), rather than hard binary-decision. The main aspect of novelty of UN-AVOIDS is that it transforms data into a new space, which is introduced in this paper as neighborhood cumulative density function (NCDF), in which both visualization and detection are carried out. In this space, outliers are remarkably visually distinguishable, and therefore the anomaly scores assigned by the detection algorithm achieved a high area under the ROC curve (AUC). We assessed UN-AVOIDS on both simulated and two recently published cybersecurity datasets, and compared it to three of the most successful anomaly detection methods: LOF, IF, and FABOD. In terms of AUC, UN-AVOIDS was almost an overall winner. The article concludes by providing a preview of new theoretical and practical avenues for UN-AVOIDS. Among them is designing a visualization aided anomaly detection (VAAD), a type of software that aids analysts by providing UN-AVOIDS' detection algorithm (running in a back engine), NCDF visualization space (rendered to plots), along with other conventional methods of visualization in the original feature space, all of which are linked in one interactive environment.
Image-Guided Navigation of a Robotic Ultrasound Probe for Autonomous Spinal Sonography Using a Shadow-aware Dual-Agent Framework
Li, Keyu, Xu, Yangxin, Wang, Jian, Ni, Dong, Liu, Li, Meng, Max Q. -H.
Ultrasound (US) imaging is commonly used to assist in the diagnosis and interventions of spine diseases, while the standardized US acquisitions performed by manually operating the probe require substantial experience and training of sonographers. In this work, we propose a novel dual-agent framework that integrates a reinforcement learning (RL) agent and a deep learning (DL) agent to jointly determine the movement of the US probe based on the real-time US images, in order to mimic the decision-making process of an expert sonographer to achieve autonomous standard view acquisitions in spinal sonography. Moreover, inspired by the nature of US propagation and the characteristics of the spinal anatomy, we introduce a view-specific acoustic shadow reward to utilize the shadow information to implicitly guide the navigation of the probe toward different standard views of the spine. Our method is validated in both quantitative and qualitative experiments in a simulation environment built with US data acquired from 17 volunteers. The average navigation accuracy toward different standard views achieves 5.18mm/5.25deg and 12.87mm/17.49deg in the intra- and inter-subject settings, respectively. The results demonstrate that our method can effectively interpret the US images and navigate the probe to acquire multiple standard views of the spine.
Modelling and Optimisation of Resource Usage in an IoT Enabled Smart Campus
University campuses are essentially a microcosm of a city. They comprise diverse facilities such as residences, sport centres, lecture theatres, parking spaces, and public transport stops. Universities are under constant pressure to improve efficiencies while offering a better experience to various stakeholders including students, staff, and visitors. Nonetheless, anecdotal evidence indicates that campus assets are not being utilised efficiently, often due to the lack of data collection and analysis, thereby limiting the ability to make informed decisions on the allocation and management of resources. Advances in the Internet of Things (IoT) technologies that can sense and communicate data from the physical world, coupled with data analytics and Artificial intelligence (AI) that can predict usage patterns, have opened up new opportunities for organisations to lower cost and improve user experience. This thesis explores this opportunity via theory and experimentation using UNSW Sydney as a living laboratory.
SMProbLog: Stable Model Semantics in ProbLog and its Applications in Argumentation
Totis, Pietro, Kimmig, Angelika, De Raedt, Luc
We introduce SMProbLog, a generalization of the probabilistic logic programming language ProbLog. A ProbLog program defines a distribution over logic programs by specifying for each clause the probability that it belongs to a randomly sampled program, and these probabilities are mutually independent. The semantics of ProbLog is given by the success probability of a query, which corresponds to the probability that the query succeeds in a randomly sampled program. It is well-defined when each random sample uniquely determines the truth values of all logical atoms. Argumentation problems, however, represent an interesting practical application where this is not always the case. SMProbLog generalizes the semantics of ProbLog to the setting where multiple truth assignments are possible for a randomly sampled program, and implements the corresponding algorithms for both inference and learning tasks. We then show how this novel framework can be used to reason about probabilistic argumentation problems. Therefore, the key contribution of this paper are: a more general semantics for ProbLog programs, its implementation into a probabilistic programming framework for both inference and parameter learning, and a novel approach to probabilistic argumentation problems based on such framework.
A systematic evaluation of methods for cell phenotype classification using single-cell RNA sequencing data
Cao, Xiaowen, Xing, Li, Majd, Elham, He, Hua, Gu, Junhua, Zhang, Xuekui
Background: Single-cell RNA sequencing (scRNA-seq) yields valuable insights about gene expression and gives critical information about complex tissue cellular composition. In the analysis of single-cell RNA sequencing, the annotations of cell subtypes are often done manually, which is time-consuming and irreproducible. Garnett is a cell-type annotation software based the on elastic net method. Besides cell-type annotation, supervised machine learning methods can also be applied to predict other cell phenotypes from genomic data. Despite the popularity of such applications, there is no existing study to systematically investigate the performance of those supervised algorithms in various sizes of scRNA-seq data sets. Methods and Results: This study evaluates 13 popular supervised machine learning algorithms to classify cell phenotypes, using published real and simulated data sets with diverse cell sizes. The benchmark contained two parts. In the first part, we used real data sets to assess the popular supervised algorithms' computing speed and cell phenotype classification performance. The classification performances were evaluated using AUC statistics, F1-score, precision, recall, and false-positive rate. In the second part, we evaluated gene selection performance using published simulated data sets with a known list of real genes. Conclusion: The study outcomes showed that ElasticNet with interactions performed best in small and medium data sets. NB was another appropriate method for medium data sets. In large data sets, XGB works excellent. Ensemble algorithms were not significantly superior to individual machine learning methods. Adding interactions to ElasticNet can help, and the improvement was significant in small data sets.