Big Data: Overviews


Convergence of AI, IoT, Big Data and Blockchain: A Review.

#artificialintelligence

Data is the lifeblood of any business. Today, big data has applications in just about every industry -- retail, healthcare,financial services, government, agriculture, customer service among others. Any organization that can assimilate data to answer nagging questions about their operations can benefit from big data. Those who work to understand their customers' business and their problems will be able to proactively identify big data solutions appropriate to their needs, and thus gain competitive advantage over their competitors. Job demand for people with big data skill-set is also in the rise especially professional, scientific and technical services; information technology; manufacturing; and finance and insurance; and retail.


Censored Semi-Bandits: A Framework for Resource Allocation with Censored Feedback

arXiv.org Machine Learning

In this paper, we study Censored Semi-Bandits, a novel variant of the semi-bandits problem. The learner is assumed to have a fixed amount of resources, which it allocates to the arms at each time step. The loss observed from an arm is random and depends on the amount of resource allocated to it. More specifically, the loss equals zero if the allocation for the arm exceeds a constant (but unknown) threshold that can be dependent on the arm. Our goal is to learn a feasible allocation that minimizes the expected loss. The problem is challenging because the loss distribution and threshold value of each arm are unknown. We study this novel setting by establishing its `equivalence' to Multiple-Play Multi-Armed Bandits (MP-MAB) and Combinatorial Semi-Bandits. Exploiting these equivalences, we derive optimal algorithms for our setting using existing algorithms for MP-MAB and Combinatorial Semi-Bandits. Experiments on synthetically generated data validate performance guarantees of the proposed algorithms.


How data can predict which employees are about to quit: Rather than relying on exit interviews and their comparisons to occasional employee surveys to determine engagement, organizations can turn instead to big data and advanced analytics to identify those workers at greatest risk of quitting.

#artificialintelligence

Rather than relying on exit interviews and their comparisons to occasional employee surveys to determine engagement, organizations can turn instead to big data and advanced analytics to identify those workers at greatest risk of quitting. A new Harvard Business Review article outlines how applying machine learning algorithms to turnover data and employee information can provide a much more accurate picture of workplace satisfaction. This measure of "turnover propensity" comprised two main indicators: turnover shocks, which are organizational and personal events that cause workers to reconsider their jobs, and job embeddedness, which describes an employee's social ties in their workplace and interest in the work they do. Though achieving this kind of "proactive anticipation" will require a sizable investment of time and effort to develop the necessary data and algorithms, the payoff will likely be worth it: "Leaders can proactively engage valued employees at risk of leaving through interviews, to better understand how the firm can increase the odds that they stay," per HBR. More articles on leadership and management: Can your anesthesia department handle NORA?


Sequential estimation of quantiles with applications to A/B-testing and best-arm identification

arXiv.org Machine Learning

Consider the problem of sequentially estimating quantiles of any distribution over a complete, fully-ordered set, based on a stream of i.i.d. observations. We propose new, theoretically sound and practically tight confidence sequences for quantiles, that is, sequences of confidence intervals which are valid uniformly over time. We give two methods for tracking a fixed quantile and two methods for tracking all quantiles simultaneously. Specifically, we provide explicit expressions with small constants for intervals whose widths shrink at the fastest possible $\sqrt{t^{-1} \log\log t}$ rate, as determined by the law of the iterated logarithm (LIL). As a byproduct, we give a non-asymptotic concentration inequality for the empirical distribution function which holds uniformly over time with the LIL rate, thus strengthening Smirnov's asymptotic empirical process LIL, and extending the famed Dvoretzky-Kiefer-Wolfowitz (DKW) inequality to hold uniformly over all sample sizes while only being about twice as wide in practice. This inequality directly yields sequential analogues of the one- and two-sample Kolmogorov-Smirnov tests, and a test of stochastic dominance. We apply our results to the problem of selecting an arm with an approximately best quantile in a multi-armed bandit framework, proving a state-of-the-art sample complexity bound for a novel allocation strategy. Simulations demonstrate that our method stops with fewer samples than existing methods by a factor of five to fifty. Finally, we show how to compute confidence sequences for the difference between quantiles of two arms in an A/B test, along with corresponding always-valid $p$-values.


Artificial Intelligence In The Workplace: How AI Is Transforming Your Employee Experience 7wData

#artificialintelligence

Soon, even those of us who don't happen to work for technology companies (although as every company moves towards becoming a tech company, that will be increasingly few of us) will find AI-enabled machines increasingly present as we go about our day-to-day activities. From how we are recruited and on-boarded to how we go about on-the-job training, personal development and eventually passing on our skills and experience to those who follow in our footsteps, AI technology will play an increasingly prominent role. Here's an overview of some of the recent advances made in businesses that are currently on the cutting-edge of the AI revolution, and are likely to be increasingly adopted by others seeking to capitalize on the arrival of smart machines. Before we even set foot in a new workplace, it could soon be a fact that AI-enabled machines have played their part in ensuring we're the right person for the job. AI pre-screening of candidates before inviting the most suitable in for interviews is an increasingly common practice at large companies that make thousands of hires each year, and sometimes attract millions of applicants.


Global Big Data Conference

#artificialintelligence

Artificial intelligence (AI) is quickly changing just about every aspect of how we live our lives, and our working lives certainly aren't exempt from this. Soon, even those of us who don't happen to work for technology companies (although as every company moves towards becoming a tech company, that will be increasingly few of us) will find AI-enabled machines increasingly present as we go about our day-to-day activities. From how we are recruited and on-boarded to how we go about on-the-job training, personal development and eventually passing on our skills and experience to those who follow in our footsteps, AI technology will play an increasingly prominent role. Here's an overview of some of the recent advances made in businesses that are currently on the cutting-edge of the AI revolution, and are likely to be increasingly adopted by others seeking to capitalize on the arrival of smart machines. Before we even set foot in a new workplace, it could soon be a fact that AI-enabled machines have played their part in ensuring we're the right person for the job.


Adaptive Model Selection Framework: An Application to Airline Pricing

arXiv.org Machine Learning

Multiple machine learning and prediction models are often used for the same prediction or recommendation task. In our recent work, where we develop and deploy airline ancillary pricing models in an online setting, we found that among multiple pricing models developed, no one model clearly dominates other models for all incoming customer requests. Thus, as algorithm designers, we face an exploration - exploitation dilemma. In this work, we introduce an adaptive meta-decision framework that uses Thompson sampling, a popular multi-armed bandit solution method, to route customer requests to various pricing models based on their online performance. We show that this adaptive approach outperform a uniformly random selection policy by improving the expected revenue per offer by 43% and conversion score by 58% in an offline simulation.


How AI and Big Data are Improving Research Results Qualtrics

#artificialintelligence

Market research is a $44.5 B market and growing. Online research is among the fastest growing parts of the market thanks to the pervasiveness of the web and the ease with which we can now collect data. However, as the world conducts more and more survey research, the issues that we see elsewhere with big data are now affecting the survey research industry as well, specifically the issue of data quality. Thanks to the growth in online survey research, billions of survey responses are collected every year. But 1/4th of those responses are of poor quality[1].


Taking the pulse of machine learning adoption ZDNet

#artificialintelligence

A few months back, we gave our take on a survey from the O'Reilly folks regarding interest in deep learning. The survey reported that interest was more than latent, but there's little question that the bulk of the action today is in the (relatively) better understood confines of machine learning (ML). So on this go round, O'Reilly jumped into the shallower side of the pond to survey the people who subscribe to its publications and go to its big data-related Strata and AI conferences regarding ML. Before diving in, let's put some perspective on this cohort: it's likely a group that on average is ahead of the curve by virtue of its attendance at these big data events or consumption of O'Reilly learning services that are skewing increasingly toward the AI domain. Nonetheless, it provides a useful counterpoint to their earlier work exploring interest in deep learning.


Big data in GIS environment - Geospatial World

#artificialintelligence

GIS is virtual world, a world that is represented by points, polygon, line and graph. Processing of these datasets has always been a challenge since the day GIS got established as a field. Processing of huge data has always been a long standing problem not only in traditional Information and Technology(IT) sectors but also in the Geo-Spatial domain. However recent development in the both hardware and software infrastructure has enabled processing of huge data sets. This has given big push and new direction to those industries which were marred by slow data processing capabilities.