Goto

Collaborating Authors

 Data Science


60 of the best Harvard University courses you can take online for free

Mashable

The only catch is that these free courses don't include a certificate of completion. Students can still enroll at any time and start learning at their own pace, so what's stopping you from trying out these online courses for yourself? Find the best free online courses from Harvard University with edX.


The Artificial State

The New Yorker

"Jacob Javits of New York is the first United States senator to become fully automated," the Chicago Tribune announced in 1962 from the Republican state convention in Buffalo, where an electronic Javits spat out slips of paper with answers to questions about everything from Cuba's missiles ("a serious threat") to the Cubs' prospects (dim). Javits also harbors thoughts on medical care for the elderly, Berlin, the communist menace," and more than a hundred other subjects, the Tribune reported after an interview with the machine. Javits may have been the first automated American politician, but he wasn't the last. Since the nineteen-sixties, much of American public life has become automated, driven by computers and predictive algorithms that can do the political work of rallying support, running campaigns, communicating with constituents, and even crafting policy. In that same stretch of time, the proportion of Americans who say that they trust the U.S. government to do what is right ...


Why data is the Achilles Heel of AI (and every other business plan)

ZDNet

Excitement is over the top about all the marvels of today's technology -- artificial intelligence, real-time analytics, virtual reality, and connected enterprises, to name a few. However, without the right data, these initiatives are dead in the water. Two new surveys warn that companies still need to put their data houses in order, and as a result, aren't ready to move forward with initiatives such as generative AI (gen AI). The challenge is that data remains too much of a risk, rather than an asset in data-driven or AI-based initiatives. Also: Could AI make data science obsolete?


Why data is the Achilles Heel of AI (and most every other business plan)

ZDNet

Excitement is over the top about all the marvels of today's technology -- artificial intelligence, real-time analytics, virtual reality, and connected enterprises, to name a few. However, without the right data, these initiatives are dead in the water. Two new surveys warn that companies still need to put their data houses in order, and as a result, aren't ready to move forward with initiatives such as generative AI (gen AI). The challenge is that data remains too much of a risk, rather than an asset in data-driven or AI-based initiatives. Also: Could AI make data science obsolete?


Want to make your resume stand out? Learn in-demand data science skills in 30 course bundle.

Mashable

TL;DR: Get lifetime access to this data science course bundle for only 29.99 (reg. Whether you're aiming for a promotion or a new career in tech, this comprehensive data science bundle covers in-demand skills. With lifetime access for 29.99, you can dive into a range of tools and techniques at your own pace, building skills that can set you apart in the workplace. This bundle covers some of the most important tools and skills in data science, like advanced Power BI visualizations, deep learning with TensorFlow, and big data analysis using PySpark. It's packed with practical projects, so you're not just learning concepts -- you're actually applying them.


Learning Retrospective Knowledge with Reverse Reinforcement Learning

Neural Information Processing Systems

We present a Reverse Reinforcement Learning (Reverse RL) approach for representing retrospective knowledge. General Value Functions (GVFs) have enjoyed great success in representing predictive knowledge, i.e., answering questions about possible future outcomes such as "how much fuel will be consumed in expectation if we drive from A to B?". GVFs, however, cannot answer questions like "how much fuel do we expect a car to have given it is at B at time t?". To answer this question, we need to know when that car had a full tank and how that car came to B. Since such questions emphasize the influence of possible past events on the present, we refer to their answers as retrospective knowledge. In this paper, we show how to represent retrospective knowledge with Reverse GVFs, which are trained via Reverse RL. We demonstrate empirically the utility of Reverse GVFs in both representation learning and anomaly detection.


LinGCN: Structural Linearized Graph Convolutional Network for Homomorphically Encrypted Inference

Neural Information Processing Systems

The growth of Graph Convolution Network (GCN) model sizes has revolutionized numerous applications, surpassing human performance in areas such as personal healthcare and financial systems. The deployment of GCNs in the cloud raises privacy concerns due to potential adversarial attacks on client data. To address security concerns, Privacy-Preserving Machine Learning (PPML) using Homomorphic Encryption (HE) secures sensitive client data. However, it introduces substantial computational overhead in practical applications. To tackle those challenges, we present LinGCN, a framework designed to reduce multiplication depth and optimize the performance of HE based GCN inference.


Recovering Unbalanced Communities in the Stochastic Block Model with Application to Clustering with a Faulty Oracle

Neural Information Processing Systems

The stochastic block model (SBM) is a fundamental model for studying graph clustering or community detection in networks. It has received great attention in the last decade and the balanced case, i.e., assuming all clusters have large size, has been well studied. However, our understanding of SBM with unbalanced communities (arguably, more relevant in practice) is still limited. In this paper, we provide a simple SVD-based algorithm for recovering the communities in the SBM with communities of varying sizes.We improve upon a result of Ailon, Chen and Xu [ICML 2013; JMLR 2015] by removing the assumption that there is a large interval such that the sizes of clusters do not fall in, and also remove the dependency of the size of the recoverable clusters on the number of underlying clusters. We further complement our theoretical improvements with experimental comparisons.Under the planted clique conjecture, the size of the clusters that can be recovered by our algorithm is nearly optimal (up to poly-logarithmic factors) when the probability parameters are constant. As a byproduct, we obtain an efficient clustering algorithm with sublinear query complexity in a faulty oracle model, which is capable of detecting all clusters larger than \tilde{\Omega}({\sqrt{n}}), even in the presence of \Omega(n) small clusters in the graph.


Doubly Robust Thompson Sampling with Linear Payoffs

Neural Information Processing Systems

A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chosen arm and the rewards of other arms remain missing. The dependence of the arm choice on the past context and reward pairs compounds the complexity of regret analysis.We propose a novel multi-armed contextual bandit algorithm called Doubly Robust Thompson Sampling (DRTS) employing the doubly-robust estimator used in missing data literature to Thompson Sampling with contexts (\texttt{LinTS}).Different from previous works relying on missing data techniques (Dimakopoulou et al. [2019], Kim and Paik [2019]), the proposed algorithm is designed to allow a novel additive regret decomposition leading to an improved regret bound with the order of \tilde{O}(\phi {-2}\sqrt{T}), where \phi 2 is the minimum eigenvalue of the covariance matrix of contexts.This is the first regret bound of \texttt{LinTS} using \phi 2 without d, where d is the dimension of the context.Applying the relationship between \phi 2 and d, the regret bound of the proposed algorithm is \tilde{O}(d\sqrt{T}) in many practical scenarios, improving the bound of \texttt{LinTS} by a factor of \sqrt{d} .A benefit of the proposed method is that it uses all the context data, chosen or not chosen, thus allowing to circumvent the technical definition of unsaturated arms used in theoretical analysis of \texttt{LinTS}.Empirical studies show the advantage of the proposed algorithm over \texttt{LinTS}.


Online Learning in Contextual Bandits using Gated Linear Networks

Neural Information Processing Systems

We introduce a new and completely online contextual bandit algorithm called Gated Linear Contextual Bandits (GLCB). This algorithm is based on Gated Linear Networks (GLNs), a recently introduced deep learning architecture with properties well-suited to the online setting. Leveraging data-dependent gating properties of the GLN we are able to estimate prediction uncertainty with effectively zero algorithmic overhead. We empirically evaluate GLCB compared to 9 state-of-the-art algorithms that leverage deep neural networks, on a standard benchmark suite of discrete and continuous contextual bandit problems. GLCB obtains mean first-place despite being the only online method, and we further support these results with a theoretical study of its convergence properties.