Goto

Collaborating Authors

 jung


Online multiclass boosting

Neural Information Processing Systems

Recent work has extended the theoretical analysis of boosting algorithms to multiclass problems and to online settings. However, the multiclass extension is in the batch setting and the online extensions only consider binary classification. We fill this gap in the literature by defining, and justifying, a weak learning condition for online multiclass boosting. This condition leads to an optimal boosting algorithm that requires the minimal number of weak learners to achieve a certain accuracy. Additionally, we propose an adaptive algorithm which is near optimal and enjoys an excellent performance on real data due to its adaptive property.



This is now the most valuable piece of Star Wars memorabilia

Popular Science

Artist Tom Jung's 1977 painting introduced the world to the look and feel of George Lucas' blockbuster adventure. Breakthroughs, discoveries, and DIY tips sent every weekday. Darth Vader's reign has ended. For a brief time, he owned the mantle of "Most Expensive Piece of Star Wars Memorabilia," but before you could say "more wealth than you can imagine" he fell once again, with a new challenger rising to take his place. It was only this past September that a verified screen-used lightsaber hilt wielded by the Dark Lord of the Sith in and set a sales record by fetching $3.65 million.



WildSpoof Challenge Evaluation Plan

Wu, Yihan, Jung, Jee-weon, Shim, Hye-jin, Cheng, Xin, Wang, Xin

arXiv.org Artificial Intelligence

The WildSpoof Challenge aims to advance the use of in-the-wild data in two intertwined speech processing tasks. It consists of two parallel tracks: (1) Text-to-Speech (TTS) synthesis for generating spoofed speech, and (2) Spoofing-robust Automatic Speaker Verification (SASV) for detecting spoofed speech. While the organizers coordinate both tracks and define the data protocols, participants treat them as separate and independent tasks. The primary objectives of the challenge are: (i) to promote the use of in-the-wild data for both TTS and SASV, moving beyond conventional clean and controlled datasets and considering real-world scenarios; and (ii) to encourage interdisciplinary collaboration between the spoofing generation (TTS) and spoofing detection (SASV) communities, thereby fostering the development of more integrated, robust, and realistic systems.


Trillion 7B Technical Report

Han, Sungjun, Suk, Juyoung, An, Suyeong, Kim, Hyungguk, Kim, Kyuseok, Yang, Wonsuk, Choi, Seungtaek, Shin, Jamin

arXiv.org Artificial Intelligence

We introduce Trillion-7B, the most token-efficient Korean-centric multilingual LLM available. Our novel Cross-lingual Document Attention (XLDA) mechanism enables highly efficient and effective knowledge transfer from English to target languages like Korean and Japanese. Combined with optimized data mixtures, language-specific filtering, and tailored tokenizer construction, Trillion-7B achieves competitive performance while dedicating only 10\% of its 2T training tokens to multilingual data and requiring just 59.4K H100 GPU hours (\$148K) for full training. Comprehensive evaluations across 27 benchmarks in four languages demonstrate Trillion-7B's robust multilingual performance and exceptional cross-lingual consistency.


Towards the mathematical foundation of the minimum enclosing ball and related problems

Vrahatis, Michael N.

arXiv.org Artificial Intelligence

Theoretical background is provided towards the mathematical foundation of the minimum enclosing ball problem. This problem concerns the determination of the unique spherical surface of smallest radius enclosing a given bounded set in the d-dimensional Euclidean space. The study of several problems that are similar or related to the minimum enclosing ball problem has received a considerable impetus from the large amount of applications of these problems in various fields of science and technology. The proposed theoretical framework is based on several enclosing (covering) and partitioning (clustering) theorems and provides among others bounds and relations between the circumradius, inradius, diameter and width of a set. These enclosing and partitioning theorems are considered as cornerstones in the field that strongly influencing developments and generalizations to other spaces and non-Euclidean geometries.


Deep neural networks with dependent weights: Gaussian Process mixture limit, heavy tails, sparsity and compressibility

Lee, Hoil, Ayed, Fadhel, Jung, Paul, Lee, Juho, Yang, Hongseok, Caron, François

arXiv.org Machine Learning

This article studies the infinite-width limit of deep feedforward neural networks whose weights are dependent, and modelled via a mixture of Gaussian distributions. Each hidden node of the network is assigned a nonnegative random variable that controls the variance of the outgoing weights of that node. We make minimal assumptions on these per-node random variables: they are iid and their sum, in each layer, converges to some finite random variable in the infinite-width limit. Under this model, we show that each layer of the infinite-width neural network can be characterised by two simple quantities: a non-negative scalar parameter and a L\'evy measure on the positive reals. If the scalar parameters are strictly positive and the L\'evy measures are trivial at all hidden layers, then one recovers the classical Gaussian process (GP) limit, obtained with iid Gaussian weights. More interestingly, if the L\'evy measure of at least one layer is non-trivial, we obtain a mixture of Gaussian processes (MoGP) in the large-width limit. The behaviour of the neural network in this regime is very different from the GP regime. One obtains correlated outputs, with non-Gaussian distributions, possibly with heavy tails. Additionally, we show that, in this regime, the weights are compressible, and some nodes have asymptotically non-negligible contributions, therefore representing important hidden features. Many sparsity-promoting neural network models can be recast as special cases of our approach, and we discuss their infinite-width limits; we also present an asymptotic analysis of the pruning error. We illustrate some of the benefits of the MoGP regime over the GP regime in terms of representation learning and compressibility on simulated, MNIST and Fashion MNIST datasets.


Communist party accessed TikTok data of Hong Kong protesters, former executive alleges

The Guardian

A former executive at TikTok's parent company, ByteDance, has alleged that the Chinese Communist party accessed user data from the social video app belonging to Hong Kong protesters and civil rights activists. Yintao Yu, a former head of engineering at ByteDance's US operation, claimed in a legal filing that a committee of Communist party members accessed TikTok data that included the users' network information, Sim card identifications and IP addresses in a bid to identify the individuals and their locations. The claims, in a wrongful dismissal lawsuit brought by Yu in a California court and reported by the Wall Street Journal, also allege the party accessed TikTok users' communications, monitored Hong Kong users who uploaded protest-related content and that Beijing-based ByteDance maintained a "backdoor channel" for the party to access US user data. Yu alleges in the filing that members of a Communist party committee inside ByteDance had access to a "superuser" credential which was also called a "God credential" and allowed them to view all data collected by ByteDance. The filing adds that when Yu was at ByteDance, between August 2017 and November 2018, TikTok stored all users' direct messages, search histories and content viewed by users.


Does AI Have a Subconscious?

WIRED

"There's been a lot of speculation recently about the possibility of AI consciousness or self-awareness. But I wonder: Does AI have a subconscious?" For philosophical guidance on encounters with technology, open a support ticket via email; or register and post a comment below. Dear Psychobabble, Sometime in the early 2000s, I came across an essay in which the author argued that no artificial consciousness will ever be believably human unless it can dream. I cannot remember who wrote it or where it was published, though I vividly recall where I was when I read it (the periodicals section of Barbara's Bookstore, Halsted Street, Chicago) and the general feel of that day (twilight, early spring).