Not enough data to create a plot.
Try a different view from the menu above.
A Synthetic Dataset for Personal Attribute Inference Hanna Yukhymenko
Recently powerful Large Language Models (LLMs) have become easily accessible to hundreds of millions of users world-wide. However, their strong capabilities and vast world knowledge do not come without associated privacy risks. In this work, we focus on the emerging privacy threat LLMs pose - the ability to accurately infer personal information from online texts. Despite the growing importance of LLM-based author profiling, research in this area has been hampered by a lack of suitable public datasets, largely due to ethical and privacy concerns associated with real personal data. We take two steps to address this problem: (i) we construct a simulation framework for the popular social media platform Reddit using LLM agents seeded with synthetic personal profiles; (ii) using this framework, we generate SynthPAI, a diverse synthetic dataset of over 7800 comments manually labeled for personal attributes. We validate our dataset with a human study showing that humans barely outperform random guessing on the task of distinguishing our synthetic comments from real ones. Further, we verify that our dataset enables meaningful personal attribute inference research by showing across 18 state-of-theart LLMs that our synthetic comments allow us to draw the same conclusions as real-world data. Combined, our experimental results, dataset and pipeline form a strong basis for future privacy-preserving research geared towards understanding and mitigating inference-based privacy threats that LLMs pose.
I didn't know I needed a smart floor lamp until this 50 eufy deal showed up
SAVE 50: As of March 27, eufy E10 RGBWW Smart Floor Lamp is available for 49.99 during Amazon's Spring Sale. I'm not usually the type to get excited over a floor lamp, but this eufy E10 is doing things I didn't know lamps could do -- and it's only 49.99 during Amazon's Spring Sale. For something that's part mood-setter, part light show, and part futuristic room accessory, that's a ridiculous value. I'm talking smooth transitions between 16 million colors, preset lighting modes for holidays like Christmas or Halloween, and even AI-powered "themes" that adapt to your mood. If your lamp can't do that, it's officially basic.
What is vibe coding, should you be doing it, and does it matter?
Getting an AI to write software for you? Want to write software, but haven't got the first clue where to start? Enter "vibe coding", a term that has swept the internet to describe the use of AI tools, including large language models (LLMs) like ChatGPT, to generate computer code even if you can't program. "Vibe coding basically refers to using generative AI not just to assist with coding, but to generate the entire code for an app," says Noah Giansiracusa at Bentley University in Waltham, Massachusetts. Users ask, or prompt, LLM-based models such as ChatGPT, Claude or Copilot to produce the code for an app or service, and the AI system does all the work.
Dimension-free Private Mean Estimation for Anisotropic Distributions
This rate is unavoidable when the distribution is isotropic, namely, when the covariance is a multiple of the identity matrix. Yet, real-world data is often highly anisotropic, with signals concentrated on a small number of principal components. We develop estimators that are appropriate for such signals--our estimators are (ฮต, ฮด)-differentially private and have sample complexity that is dimension-independent for anisotropic subgaussian distributions.
Gradient-Free Methods for Deterministic and Stochastic Nonsmooth Nonconvex Optimization
Nonsmooth nonconvex optimization problems broadly emerge in machine learning and business decision making, whereas two core challenges impede the development of efficient solution methods with finite-time convergence guarantee: the lack of computationally tractable optimality criterion and the lack of computationally powerful oracles. The contributions of this paper are two-fold. First, we establish the relationship between the celebrated Goldstein subdifferential [46] and uniform smoothing, thereby providing the basis and intuition for the design of gradient-free methods that guarantee the finite-time convergence to a set of Goldstein stationary points.
A Further Related Work on Nonsmooth Nonconvex Optimization
To appreciate the difficulty and the broad scope of the research agenda in nonsmooth nonconvex optimization, we start by describing the existing relevant literature. First, the existing work is mostly devoted to establishing the asymptotic convergence properties of various optimization algorithms, including gradient sampling (GS) methods [16-18, 57, 19], bundle methods [56, 40] and subgradient methods [8, 65, 30, 28, 12]. More specifically, Burke et al. [16] provided a systematic investigation of approximating the Clarke subdifferential through random sampling and proposed a gradient bundle method [17]--the precursor of GS methods--for optimizing a nonconvex, nonsmooth and non-Lipschitz function. Later, Burke et al. [18] and Kiwiel [57] proposed the GS methods by incorporating key modifications into the algorithmic scheme in Burke et al. [17] and proved that every cluster point of the iterates generated by GS methods is a Clarke stationary point. For an overview of GS methods, we refer to Burke et al. [19].
Dylan J. Foster
Imitation learning (IL) aims to mimic the behavior of an expert in a sequential decision making task by learning from demonstrations, and has been widely applied to robotics, autonomous driving, and autoregressive text generation. The simplest approach to IL, behavior cloning (BC), is thought to incur sample complexity with unfavorable quadratic dependence on the problem horizon, motivating a variety of different online algorithms that attain improved linear horizon dependence under stronger assumptions on the data and the learner's access to the expert. We revisit the apparent gap between offline and online IL from a learning-theoretic perspective, with a focus on the realizable/well-specified setting with general policy classes up to and including deep neural networks. Through a new analysis of behavior cloning with the logarithmic loss, we show that it is possible to achieve horizon-independent sample complexity in offline IL whenever (i) the range of the cumulative payoffs is controlled, and (ii) an appropriate notion of supervised learning complexity for the policy class is controlled. Specializing our results to deterministic, stationary policies, we show that the gap between offline and online IL is smaller than previously thought: (i) it is possible to achieve linear dependence on horizon in offline IL under dense rewards (matching what was previously only known to be achievable in online IL); and (ii) without further assumptions on the policy class, online IL cannot improve over offline IL with the logarithmic loss, even in benign MDPs. We complement our theoretical results with experiments on standard RL tasks and autoregressive language generation to validate the practical relevance of our findings.