# symposium

### Artificial intelligence is changing the workplace. Finding the right employees is more important than ever.

Whether it is being used to guide financial advisors at investment companies or track employee productivity at restaurant chains, artificial intelligence is becoming increasingly common in the workplace. And human resources departments need to make sure employees are ready for the change. "In the age of digital transformation, organizations need to adjust and innovate to stay competitive," said Uwe Hohgrawe, the faculty director of analytics and enterprise intelligence in Northeastern's College of Professional Studies. "And people need to develop new skills to work with AI." Hohgrawe and Carl Zangerl, who directs Northeastern's human resources management program, have organized a symposium to help companies figure out how to handle the human side of incorporating artificial intelligence into the workplace. The Symposium on the Intersection of AI and Talent Strategy will be held on Tuesday, February 12 in Northeastern's Interdisciplinary Science and Engineering Complex.

### Building softer, friendlier robots

Oussama Khatib, a professor of computer science at Stanford University, encountered a pivotal moment during the first outing of his deep-sea robot, Ocean One, off the coast of France. The robot was trapped, far too deep for human retrieval, between the cannons of a sunken ship. Weather was threatening to force the robotics crew to return to shore, but Khatib and his team resisted. "No way, I'm not leaving the robot," Khatib said before moving to the haptic controls, which simulate a sense of touch and allow for remote operation. Able to control the robot's arms, Khatib pushed.

### Who's Who in AI Today

Now that the Consumer Electronics Show is over, let's shift gear and move our attention from shiny gadgets to grownup stuff, like Artificial Intelligence technology. Before I headed to Las Vegas with my sidekick Brian Santo, editor-in-chief of EDN, both of us fully expected to hear lots of AI talk and see loads of AI-integrated devices at CES. Surprisingly, we encountered less AI buzz than we'd anticipated. Mostly, it was what we already knew, such as AI for voice (i.e. Evidently, in initial commercial deployment, AI is focused on convenient, easy-to-use UI (voice) for consumer products. Consequently, voice AI supported by Amazon, Google and Microsoft is popping up everywhere, as another product gimmick for CE vendors.

### AISIS2019

The symposium will take place on 21-25 October 2019 at the Universidad Nacional Autonoma de México (UNAM), Mexico City, Mexico. The conference is organized by the Instituto de Ciencias Nucleares, the Coordinacion de la Investigación Cientifica (UNAM), and the CERN Open Lab. The policy aspects will be organised by the OECD. The recent progress in Artificial Intelligence and Machine Learning has provided new ways to process large data sets. The new techniques are particularly powerful when dealing with unstructured data or data with complex, non-linear relationships, which are hard to model and analyse with traditional, statistical tools.

### Fully Understanding The Hashing Trick

Feature hashing, also known as {\em the hashing trick}, introduced by Weinberger et al. (2009), is one of the key techniques used in scaling-up machine learning algorithms. Loosely speaking, feature hashing uses a random sparse projection matrix $A : \mathbb{R}^n \to \mathbb{R}^m$ (where $m \ll n$) in order to reduce the dimension of the data from $n$ to $m$ while approximately preserving the Euclidean norm. Every column of $A$ contains exactly one non-zero entry, equals to either $-1$ or $1$. Weinberger et al. showed tail bounds on $\|Ax\|_2^2$. Specifically they showed that for every $\varepsilon, \delta$, if $\|x\|_{\infty} / \|x\|_2$ is sufficiently small, and $m$ is sufficiently large, then \begin{equation*}\Pr[ \; | \;\|Ax\|_2^2 - \|x\|_2^2\; | < \varepsilon \|x\|_2^2 \;] \ge 1 - \delta \;.\end{equation*} These bounds were later extended by Dasgupta et al. (2010) and most recently refined by Dahlgaard et al. (2017), however, the true nature of the performance of this key technique, and specifically the correct tradeoff between the pivotal parameters $\|x\|_{\infty} / \|x\|_2, m, \varepsilon, \delta$ remained an open question. We settle this question by giving tight asymptotic bounds on the exact tradeoff between the central parameters, thus providing a complete understanding of the performance of feature hashing. We complement the asymptotic bound with empirical data, which shows that the constants "hiding" in the asymptotic notation are, in fact, very close to $1$, thus further illustrating the tightness of the presented bounds in practice.

### Distributed $k$-Clustering for Data with Heavy Noise

In this paper, we consider the $k$-center/median/means clustering with outliers problems (or the $(k, z)$-center/median/means problems) in the distributed setting. Most previous distributed algorithms have their communication costs linearly depending on $z$, the number of outliers. Recently Guha et al.[10] overcame this dependence issue by considering bi-criteria approximation algorithms that output solutions with $2z$ outliers. For the case where $z$ is large, the extra $z$ outliers discarded by the algorithms might be too large, considering that the data gathering process might be costly. In this paper, we improve the number of outliers to the best possible $(1+\epsilon)z$, while maintaining the $O(1)$-approximation ratio and independence of communication cost on $z$. The problems we consider include the $(k, z)$-center problem, and $(k, z)$-median/means problems in Euclidean metrics. Implementation of the our algorithm for $(k, z)$-center shows that it outperforms many previous algorithms, both in terms of the communication cost and quality of the output solution.

### Robust Learning of Fixed-Structure Bayesian Networks

We investigate the problem of learning Bayesian networks in a robust model where an $\epsilon$-fraction of the samples are adversarially corrupted. In this work, we study the fully observable discrete case where the structure of the network is given. Even in this basic setting, previous learning algorithms either run in exponential time or lose dimension-dependent factors in their error guarantees. We provide the first computationally efficient robust learning algorithm for this problem with dimension-independent error guarantees. Our algorithm has near-optimal sample complexity, runs in polynomial time, and achieves error that scales nearly-linearly with the fraction of adversarially corrupted samples. Finally, we show on both synthetic and semi-synthetic data that our algorithm performs well in practice.

### On Coresets for Logistic Regression

Coresets are one of the central methods to facilitate the analysis of large data. We continue a recent line of research applying the theory of coresets to logistic regression. First, we show the negative result that no strongly sublinear sized coresets exist for logistic regression. To deal with intractable worst-case instances we introduce a complexity measure $\mu(X)$, which quantifies the hardness of compressing a data set for logistic regression. $\mu(X)$ has an intuitive statistical interpretation that may be of independent interest. For data sets with bounded $\mu(X)$-complexity, we show that a novel sensitivity sampling scheme produces the first provably sublinear $(1\pm\eps)$-coreset. We illustrate the performance of our method by comparing to uniform sampling as well as to state of the art methods in the area. The experiments are conducted on real world benchmark data for logistic regression.

### Learning Optimal Reserve Price against Non-myopic Bidders

We consider the problem of learning optimal reserve price in repeated auctions against non-myopic bidders, who may bid strategically in order to gain in future rounds even if the single-round auctions are truthful. Previous algorithms, e.g., empirical pricing, do not provide non-trivial regret rounds in this setting in general. We introduce algorithms that obtain small regret against non-myopic bidders either when the market is large, i.e., no bidder appears in a constant fraction of the rounds, or when the bidders are impatient, i.e., they discount future utility by some factor mildly bounded away from one. Our approach carefully controls what information is revealed to each bidder, and builds on techniques from differentially private online learning as well as the recent line of works on jointly differentially private algorithms.

### Differentially Private k-Means with Constant Multiplicative Error

We design new differentially private algorithms for the Euclidean k-means problem, both in the centralized model and in the local model of differential privacy. In both models, our algorithms achieve significantly improved error guarantees than the previous state-of-the-art. In addition, in the local model, our algorithm significantly reduces the number of interaction rounds. Although the problem has been widely studied in the context of differential privacy, all of the existing constructions achieve only super constant approximation factors. We present, for the first time, efficient private algorithms for the problem with constant multiplicative error. Furthermore, we show how to modify our algorithms so they compute private coresets for k-means clustering in both models.