Oceania
Voice Biometrics Security: Extrapolating False Alarm Rate via Hierarchical Bayesian Modeling of Speaker Verification Scores
Sholokhov, Alexey, Kinnunen, Tomi, Vestman, Ville, Lee, Kong Aik
How secure automatic speaker verification (ASV) technology is? More concretely, given a specific target speaker, how likely is it to find another person who gets falsely accepted as that target? This question may be addressed empirically by studying naturally confusable pairs of speakers within a large enough corpus. To this end, one might expect to find at least some speaker pairs that are indistinguishable from each other in terms of ASV. To a certain extent, such aim is mirrored in the standardized ASV evaluation benchmarks. However, the number of speakers in such evaluation benchmarks represents only a small fraction of all possible human voices, making it challenging to extrapolate performance beyond a given corpus. Furthermore, the impostors used in performance evaluation are usually selected randomly. A potentially more meaningful definition of an impostor - at least in the context of security-driven ASV applications - would be closest (most confusable) other speaker to a given target. We put forward a novel performance assessment framework to address both the inadequacy of the random-impostor evaluation model and the size limitation of evaluation corpora by addressing ASV security against closest impostors on arbitrarily large datasets. The framework allows one to make a prediction of the safety of given ASV technology, in its current state, for arbitrarily large speaker database size consisting of virtual (sampled) speakers. As a proof-of-concept, we analyze the performance of two state-of-the-art ASV systems, based on i-vector and x-vector speaker embeddings (as implemented in the popular Kaldi toolkit), on the recent VoxCeleb 1 & 2 corpora. We found that neither the i-vector or x-vector system is immune to increased false alarm rate at increased impostor database size.
A Crowdsourcing Framework for On-Device Federated Learning
Pandey, Shashi Raj, Tran, Nguyen H., Bennis, Mehdi, Tun, Yan Kyaw, Manzoor, Aunas, Hong, Choong Seon
Federated learning (FL) rests on the notion of training a global model in a decentralized manner. Under this setting, mobile devices perform computations on their local data before uploading the required updates to improve the global model. However, when the participating clients implement an uncoordinated computation strategy, the difficulty is to handle the communication efficiency (i.e., the number of communications per iteration) while exchanging the model parameters during aggregation. Therefore, a key challenge in FL is how users participate to build a high-quality global model with communication efficiency. We tackle this issue by formulating a utility maximization problem, and propose a novel crowdsourcing framework to leverage FL that considers the communication efficiency during parameters exchange. First, we show an incentive-based interaction between the crowdsourcing platform and the participating client's independent strategies for training a global learning model, where each side maximizes its own benefit. We formulate a two-stage Stackelberg game to analyze such scenario and find the game's equilibria. Second, we formalize an admission control scheme for participating clients to ensure a level of local accuracy. Simulated results demonstrate the efficacy of our proposed solution with up to 22 % gain in the offered reward. A preliminary version of this paper has been accepted at IEEE GLOBECOM [1]. Nguyen H. Tran is with the School of Computer Science, The University of Sydney, NSW 2006, Australia, email: nguyen.tran@sydney.edu.au. Mehdi Bennis is with the Center for Wireless Communications, University of Oulu, 90014 Oulu, Finland, email: mehdi.bennis@oulu.fi. I NTRODUCTION A. Background and motivation Recent years have admittedly witnessed a tremendous growth in the use of Machine Learning (ML) techniques and its applications in mobile devices. On one hand, according to International Data Corporation, the shipments of smartphones reached 3 billions in 2018 [2], which implies a large crowd of mobile users generating personalized data via the interaction with mobile applications, or with the use of inbuilt sensors (e.g., cameras, microphones and GPS) exploited efficiently by mobile crowdsensing paradigm (e.g., for indoor localization, traffic monitoring, navigation [3], [4], [5], [6]). On the other hand, mobile devices are getting empowered extensively with specialized hardware architectures and computing engines such as the CPU, GPU and DSP (e.g., energy efficient Qualcomm Hexagon V ector eXtensions on Snapdragon 835 [7]) for solving diverse machine learning problems. Gartner predicts that 80 percent of smartphones will have on-device AI capabilities by 2022.
Assessing Social and Intersectional Biases in Contextualized Word Representations
Tan, Yi Chern, Celis, L. Elisa
Social bias in machine learning has drawn significant attention, with work ranging from demonstrations of bias in a multitude of applications, curating definitions of fairness for different contexts, to developing algorithms to mitigate bias. In natural language processing, gender bias has been shown to exist in context-free word embeddings. Recently, contextual word representations have outperformed word embeddings in several downstream NLP tasks. These word representations are conditioned on their context within a sentence, and can also be used to encode the entire sentence. In this paper, we analyze the extent to which state-of-the-art models for contextual word representations, such as BERT and GPT-2, encode biases with respect to gender, race, and intersectional identities. Towards this, we propose assessing bias at the contextual word level. This novel approach captures the contextual effects of bias missing in context-free word embeddings, yet avoids confounding effects that underestimate bias at the sentence encoding level. We demonstrate evidence of bias at the corpus level, find varying evidence of bias in embedding association tests, show in particular that racial bias is strongly encoded in contextual word models, and observe that bias effects for intersectional minorities are exacerbated beyond their constituent minority identities. Further, evaluating bias effects at the contextual word level captures biases that are not captured at the sentence level, confirming the need for our novel approach.
Artificial Intelligence in Healthcare Application Market Industry Analysis and Forecast (2019-2027)
A recent market intelligence report that is published by Data Insights Partner on the global Artificial Intelligence in Healthcare Application market makes an offering of in-depth analysis of segments and sub-segments in the regional and international Artificial Intelligence in Healthcare Application market. The research also emphasizes on the impact of restraints, drivers, and macro indicators on the regional and global Artificial Intelligence in Healthcare Application market over the short as well as long period of time. A detailed presentation of forecast, trends, and dollar values of global Artificial Intelligence in Healthcare Application market is offered. In accordance with the report, the global Artificial Intelligence in Healthcare Application market is projected to expand at a CAGR of 35% over the period of forecast. One of the greatest challenges in the healthcare arena is huge healthcare data and it is impossible to handle all of these data for medical personnel. Here comes the opportunity of artificial intelligence (AI) in healthcare field.
Food Waste Is a Serious Problem. AI Is Trying to Solve It
You're probably familiar with the oft-quoted statistics from the Food and Agriculture Organization (FAO) of the United Nations by now: Globally, about one-third of food is lost or wasted each year from the farm to the refrigerator, representing about 1.3 billion tons. The economic price tag is estimated at nearly $1 trillion annually. The refrain from the FAO goes even further: If we could reverse this trend, we would have enough food to feed the world's undernourished population, as well as help meet the nutritional needs of a planet estimated to reach nearly 10 billion people by 2050. Technology has long been helping to hack world hunger. These days most conversations about tech's impact on any sector of the economy inevitably involves artificial intelligence--sophisticated software that allows machines to make decisions and even predictions in ways similar to humans.
'Smart pods' blaze a trail for autonomous public transport
The NEXT Future Transportation module is a far cry from the sleek visions of self-driving cars designed by Tesla or Mercedes. With an average cruising speed of 20 kilometers per hour, the electric pods are unlikely to set pulses racing. But perhaps the most crucial distinction is that this self-driving vehicle is passenger ready. Following trials in 2018, several NEXT units are expected to be in action at the Expo 2020 site in Dubai, providing short-distance rides for some of the estimated 25 million visitors attending the six-month world fair. The Ruler of Dubai, Sheikh Mohammed bin Rashid Al Maktoum, has set the demanding target of having 25% of journeys in the city to be made through driverless transport by 2030 -- going beyond the existing driverless metro and monorail systems.
A Study of Data Pre-processing Techniques for Imbalanced Biomedical Data Classification
Liu, Shigang, Zhang, Jun, Xiang, Yang, Zhou, Wanlei, Xiang, Dongxi
Biomedical data are widely accepted in developing prediction models for identifying a specific tumor, drug discovery and classification of human cancers. However, previous studies usually focused on different classifiers, and overlook the class imbalance problem in real-world biomedical datasets. There are a lack of studies on evaluation of data pre-processing techniques, such as resampling and feature selection, on imbalanced biomedical data learning. The relationship between data pre-processing techniques and the data distributions has never been analysed in previous studies. This article mainly focuses on reviewing and evaluating some popular and recently developed resampling and feature selection methods for class imbalance learning. We analyse the effectiveness of each technique from data distribution perspective. Extensive experiments have been done based on five classifiers, four performance measures, eight learning techniques across twenty real-world datasets. Experimental results show that: (1) resampling and feature selection techniques exhibit better performance using support vector machine (SVM) classifier. However, resampling and Feature Selection techniques perform poorly when using C4.5 decision tree and Linear discriminant analysis classifiers; (2) for datasets with different distributions, techniques such as Random undersampling and Feature Selection perform better than other data pre-processing methods with T Location-Scale distribution when using SVM and KNN (K-nearest neighbours) classifiers. Random oversampling outperforms other methods on Negative Binomial distribution using Random Forest classifier with lower level of imbalance ratio; (3) Feature Selection outperforms other data pre-processing methods in most cases, thus, Feature Selection with SVM classifier is the best choice for imbalanced biomedical data learning.
Novel semi-metrics for multivariate change point analysis and anomaly detection
James, Nick, Menzies, Max, Azizi, Lamiae, Chan, Jennifer
This paper proposes a new method for determining similarity and anomalies between time series, most practically effective in large collections of (likely related) time series, with a particular focus on measuring distances between structural breaks within such a collection. We consolidate and generalise a class of semi-metric distance measures, which we term MJ distances. Experiments on simulated data demonstrate that our proposed family of distances uncover similarity within collections of time series more effectively than measures such as the Hausdorff and Wasserstein metrics. Although our class of distances do not necessarily satisfy the triangle inequality requirement of a metric, we analyse the transitivity properties of respective distance matrices in various contextual scenarios. There, we demonstrate a trade-off between robust performance in the presence of outliers, and the triangle inequality property. We show in experiments using real data that the contrived scenarios that severely violate the transitivity property rarely exhibit themselves in real data; instead, our family of measures satisfies all the properties of a metric most of the time. We illustrate three ways of analysing the distance and similarity matrices, via eigenvalue analysis, hierarchical clustering, and spectral clustering. The results from our hierarchical and spectral clustering experiments on simulated data demonstrate that the Hausdorff and Wasserstein metrics may lead to erroneous inference as to which time series are most similar with respect to their structural breaks, while our semi-metrics provide an improvement.
Precision Medicine Informatics: Principles, Prospects, and Challenges
Afzal, Muhammad, Islam, S. M. Riazul, Hussain, Maqbool, Lee, Sungyoung
Prec ision Medicine (PM) is an emerging approach that appears with the impression of changing the existing paradigm of medical practice. Recent advances in technological innovations and genetics, and the growing availability of health data have set a new pace o f the research and imposes a set of new requirements on different stakeholders. To date, some studies are available that discuss about different aspects of PM. Nevertheless, a holistic representation of those aspects deemed to confer the technological pers pective, in relation to applications and challenges, is mostly ignored. In this context, this paper surveys advances in PM from informatics viewpoint and reviews the enabling tools and techniques in a categorized manner. In addition, the study discusses ho w other technological paradigms including big data, artificial intelligence, and internet of things can be exploited to advance the potentials of PM. Furthermore, the paper provides some guidelines for future research for seamless implementation and wide - s cale deployment of PM based on identified open issues and associated challenges. To this end, the paper proposes an integrated holistic framework for PM motivating informatics researchers to design their relevant research works in an appropriate context.
AI tech founder urges business leaders, innovators to consider ethical responsibility
It is incumbent upon business leaders and Australian organisations to put diversity and the ethical implications of artificial intelligence (AI) at the heart of innovation if we're to ensure the world's third major disruptive force is harnessed for human good. That was the big call-out made by Dr Catriona Wallace, founder and executive director of the ASX-listed machine learning tech innovator, Flamingo AI, during this week's CeBIT conference in Sydney. Speaking on the rise of AI and the relationship between humans and machines, the entrepreneur highlighted several facts and figures on the extent of AI impact and innovation over the short and longer-term horizon, as well as the good and negative potential human consequences that come with it. As outlined by Dr Wallace, disruptive technologies, such as AI, are predicted to be the third of three major problems the world is facing that could detrimentally affect humanity. The other two are climate change, and nuclear war.