Goto

Collaborating Authors

 Asia


Pretrained Multilingual Transformers Reveal Quantitative Distance Between Human Languages

arXiv.org Machine Learning

Understanding the distance between human languages is central to linguistics, anthropology, and tracing human evolutionary history. Yet, while linguistics has long provided rich qualitative accounts of cross-linguistic variation, a unified and scalable quantitative approach to measuring language distance remains lacking. In this paper, we introduce a method that leverages pretrained multilingual language models as systematic instruments for linguistic measurement. Specifically, we show that the spontaneously emerged attention mechanisms of these models provide a robust, tokenization-agnostic measure of cross-linguistic distance, termed Attention Transport Distance (ATD). By treating attention matrices as probability distributions and measuring their geometric divergence via optimal transport, we quantify the representational distance between languages during translation. Applying ATD to a large and diverse set of languages, we demonstrate that the resulting distances recover established linguistic groupings with high fidelity and reveal patterns aligned with geographic and contact-induced relationships. Furthermore, incorporating ATD as a regularizer improves transfer performance in low-resource machine translation. Our results establish a principled foundation for testing linguistic hypotheses using artificial neural networks. This framework transforms multilingual models into powerful tools for quantitative linguistic discovery, facilitating more equitable multilingual AI.


Self-Regularized Learning Methods

arXiv.org Machine Learning

We introduce a general framework for analyzing learning algorithms based on the notion of self-regularization, which captures implicit complexity control without requiring explicit regularization. This is motivated by previous observations that many algorithms, such as gradient-descent based learning, exhibit implicit regularization. In a nutshell, for a self-regularized algorithm the complexity of the predictor is inherently controlled by that of the simplest comparator achieving the same empirical risk. This framework is sufficiently rich to cover both classical regularized empirical risk minimization and gradient descent. Building on self-regularization, we provide a thorough statistical analysis of such algorithms including minmax-optimal rates, where it suffices to show that the algorithm is self-regularized -- all further requirements stem from the learning problem itself. Finally, we discuss the problem of data-dependent hyperparameter selection, providing a general result which yields minmax-optimal rates up to a double logarithmic factor and covers data-driven early stopping for RKHS-based gradient descent.


Murmurations, Mestre--Nagao sums, and Convolutional Neural Networks for elliptic curves

arXiv.org Machine Learning

We apply one-dimensional convolutional neural networks to the Frobenius traces of elliptic curves over $\mathbb{Q}$ and evaluate and interpret their predictive capacity. In keeping with similar experiments by Kazalicki--Vlah, Bujanović--Kazalicki--Novak, and Pozdnyakov, we observe high accuracy predictions for the analytic rank across a range of conductors. We interpret the prediction using saliency curves and explore the interesting interplay between murmurations and Mestre--Nagao sums, the details of which vary with the conductor and the (predicted) rank.


rSDNet: Unified Robust Neural Learning against Label Noise and Adversarial Attacks

arXiv.org Machine Learning

Neural networks are central to modern artificial intelligence, yet their training remains highly sensitive to data contamination. Standard neural classifiers are trained by minimizing the categorical cross-entropy loss, corresponding to maximum likelihood estimation under a multinomial model. While statistically efficient under ideal conditions, this approach is highly vulnerable to contaminated observations including label noises corrupting supervision in the output space, and adversarial perturbations inducing worst-case deviations in the input space. In this paper, we propose a unified and statistically grounded framework for robust neural classification that addresses both forms of contamination within a single learning objective. We formulate neural network training as a minimum-divergence estimation problem and introduce rSDNet, a robust learning algorithm based on the general class of $S$-divergences. The resulting training objective inherits robustness properties from classical statistical estimation, automatically down-weighting aberrant observations through model probabilities. We establish essential population-level properties of rSDNet, including Fisher consistency, classification calibration implying Bayes optimality, and robustness guarantees under uniform label noise and infinitesimal feature contamination. Experiments on three benchmark image classification datasets show that rSDNet improves robustness to label corruption and adversarial attacks while maintaining competitive accuracy on clean data, Our results highlight minimum-divergence learning as a principled and effective framework for robust neural classification under heterogeneous data contamination.


Mirror Descent on Riemannian Manifolds

arXiv.org Machine Learning

Mirror Descent (MD) is a scalable first-order method widely used in large-scale optimization, with applications in image processing, policy optimization, and neural network training. This paper generalizes MD to optimization on Riemannian manifolds. In particular, we develop a Riemannian Mirror Descent (RMD) framework via reparameterization and further propose a stochastic variant of RMD. We also establish non-asymptotic convergence guarantees for both RMD and stochastic RMD. As an application to the Stiefel manifold, our RMD framework reduces to the Curvilinear Gradient Descent (CGD) method proposed in [26]. Moreover, when specializing the stochastic RMD framework to the Stiefel setting, we obtain a stochastic extension of CGD, which effectively addresses large-scale manifold optimization problems.


The Pentagon Wants an Obedient A.I. Soldier. Will It Get One?

The New Yorker

The reported use of Claude in recent military operations has shifted the Overton window around A.I. in warfare--and sparked a battle between Anthropic and the Department of War. The staff writer Gideon Lewis-Kraus joins Tyler Foggatt to discuss the escalating standoff between the A.I. company Anthropic and the Department of War. They consider recent reporting on the use of Claude--Anthropic's family of large language models--in military operations in Venezuela and Iran, and how that news has pushed the company's relationship with the Pentagon to a breaking point. They also explore how the tech industry is responding to the conflict between the Trump Administration and Anthropic, and the thorny question of whether A.I. should be subject to greater safeguards and more oversight than previous technological innovations. " The Pentagon Went to War with Anthropic. " The Iran War Is Another Reason to Quit Oil," by Bill McKibben " How Should We Remember the Hippies?


7 glittery minerals up for auction

Popular Science

Over 200 minerals from a private collection decades in the making are up for bid. An aquamarine with muscovite was found in Nagar District in Gilgit-Baltistan, Pakistan. Breakthroughs, discoveries, and DIY tips sent six days a week. Over 200 colorful minerals will hit the auction block on March 20 as part of Heritage's The Collection of William and Ruth Loomis Fine Minerals Signature Auction . What started as a shared hobby evolved into a lifelong passion that soon will be offered to mineral collectors everywhere.


Roman artifact discovered in the Americas shatters New World history as we know it

Daily Mail - Science & tech

THE LOST WEDDING PHOTOS: See JFK Jr and Carolyn Bessette at their secret nuptials... and read every intimate detail of ultra-private ceremony Tulsi Gabbard lets Iran nuke bombshell slip as Senate hearing spirals for Trump's embattled spy chief Candace Owens's sickening low-blow at Karoline Leavitt as Iran war sparks wild attacks Lunatic Megyn Kelly is FINALLY ruined! Her appalling X-rated smear of my friend proves it... but now I know her truly disturbing plan: JOSH HAMMER Inside the epidemic of midlife women who are repulsed by their husbands, the age and'vital statistics' that make men most at risk - and the telltale signs YOUR marriage is about to die: Special report by SADIE NICHOLAS Meghan gives glimpse of'mama's little helpers' Archie and Lilibet in'behind the scenes' video of her latest As Ever launch Shameful hypocrisy of NASCAR star Daniel Suarez's nepo-baby wife: 'Victim' mask slips as she ignites new Las Vegas drama... and dark family past rears its ugly head Princess Kate dons her favourite tiara and the late Queen's earrings as she arrives at King's banquet for the Nigerian President in country's first state visit in almost 40 years Everything JFK Jr told friends about his love affair with'sexual dynamo' Madonna... her unprintable pillow talk... and his perverse incest request that she couldn't go through with Site of'Jesus' crucifixion' forced to shut for Holy Week in unprecedented move tied to biblical prophecies of the Antichrist Ugly new Nicole Kidman and Keith Urban divorce fight ERUPTS: Her friends share humiliating details of'midlife crisis'... and reveal brutal REAL reason daughter Sunday Rose'snubbed' him Outrage after Seattle museum vandal destroys $250,000 of famous Dale Chihuly glass at city's museum dedicated to him Amanda Bynes, 39, 'is now a size 4 after losing 35lb' thanks to weight-loss medication... after hitting 180lb Chilling unclassified threat report reveals the'most likely' terror attack scenario on US soil Three's Company bombshell Jenilee Harrison who was also on Dallas and The Love Boat still looks great at 67, see her now The discovery of a Roman artifact in the Americas has sparked a debate about who truly discovered the New World. While Christopher Columbus is hailed as the first in 1492, archaeologists uncovered a small terracotta head of a bearded man carved with distinctive European features tucked inside a Mexican tomb. The artifact, known as the Tecaxic-Calixtlahuaca Head, was discovered in 1933 inside a sealed pre-Hispanic burial beneath multiple intact layers, indicating it had not been disturbed after its placement. Experts say its facial features, beard style and craftsmanship bear a striking resemblance to objects from the ancient Mediterranean rather than indigenous Mesoamerican traditions.


Volunteers spend 30 years restoring a Victorian sewer pump station

Popular Science

Reviving the Claymills Pumping Station in Staffordshire, England has been a labor of love. Restoration work has progressed steadily for over 30 years. Breakthroughs, discoveries, and DIY tips sent six days a week. It's always good to have a passion project, but what's going on in Staffordshire, England, is likely a one-of-a-kind endeavor. In the town of Burton upon Trent, a rotating team of volunteers has spent over 30 years restoring a Victorian pump house.


Sony removes 135,000 deepfakes of its artists' music

BBC News

Sony removes 135,000 'deepfakes' of its artists' music Music giant Sony Music says it has requested the removal of more than 135,000 songs by fraudsters impersonating its artists on streaming services. The so-called deepfakes were created using generative AI, and targeted some of the company's biggest acts, who include Beyoncé, Queen and Harry Styles In the worst cases, [the deepfakes] potentially damage a release campaign or tarnish the reputation of an artist, said Dennis Kooker, president of Sony's global digital business. The company says the number of songs generated in this fashion is only increasing as artificial intelligence technology becomes cheaper and easier to access. It believes the 135,000 tracks it has discovered to date represents just a percentage of the total uploaded to streaming services. Since last March alone, it has identified some 60,000 songs falsely purporting to feature artists from their roster.