Goto

Collaborating Authors

 chan



Octic Vision Transformers: Quicker ViTs Through Equivariance

Nordström, David, Edstedt, Johan, Kahl, Fredrik, Bökman, Georg

arXiv.org Artificial Intelligence

Why are state-of-the-art Vision Transformers (ViTs) not designed to exploit natural geometric symmetries such as 90-degree rotations and reflections? In this paper, we argue that there is no fundamental reason, and what has been missing is an efficient implementation. To this end, we introduce Octic Vision Transformers (octic ViTs) which rely on octic group equivariance to capture these symmetries. In contrast to prior equivariant models that increase computational cost, our octic linear layers achieve 5.33x reductions in FLOPs and up to 8x reductions in memory compared to ordinary linear layers. In full octic ViT blocks the computational reductions approach the reductions in the linear layers with increased embedding dimension. We study two new families of ViTs, built from octic blocks, that are either fully octic equivariant or break equivariance in the last part of the network. Training octic ViTs supervised (DeiT-III) and unsupervised (DINOv2) on ImageNet-1K, we find that they match baseline accuracy while at the same time providing substantial efficiency gains.


Box, run, crash: China's humanoid robot games show advances and limitations

The Guardian

A quick left hook, a front kick to the chest, a few criss-cross jabs, and the crowd cheers. But it is not kickboxing prowess that concludes the match. It is an attempted roundhouse kick that squarely misses its target, sending the kickboxer from a top university team tumbling to the floor. While traditional kickboxing comes with the risk of blood, sweat and serious head injuries, the competitors in Friday's match at the inaugural World Humanoid Robot Games in Beijing faced a different set of challenges. The kickboxers, pint-sized humanoid robots entered by teams from leading Chinese technological universities, are part of a jamboree of humanoid events taking place at China's latest technology event.


A Gibbs Sampler for Efficient Bayesian Inference in Sign-Identified SVARs

Arias, Jonas E., Rubio-Ramírez, Juan F., Shin, Minchul

arXiv.org Machine Learning

We develop a new algorithm for inference based on structural vector autoregressions (SVARs) identified with sign restrictions. The key insight of our algorithm is to break apart from the accept-reject tradition associated with sign-identified SVARs. We show that embedding an elliptical slice sampling within a Gibbs sampler approach can deliver dramatic gains in speed and turn previously infeasible applications into feasible ones. We provide a tractable example to illustrate the power of the elliptical slice sampling applied to sign-identified SVARs. We demonstrate the usefulness of our algorithm by applying it to a well-known small-SVAR model of the oil market featuring a tight identified set, as well as to a large SVAR model with more than 100 sign restrictions.


Statistical Mean Estimation with Coded Relayed Observations

Ling, Yan Hao, Yang, Zhouhao, Scarlett, Jonathan

arXiv.org Artificial Intelligence

We consider a problem of statistical mean estimation in which the samples are not observed directly, but are instead observed by a relay (``teacher'') that transmits information through a memoryless channel to the decoder (``student''), who then produces the final estimate. We consider the minimax estimation error in the large deviations regime, and establish achievable error exponents that are tight in broad regimes of the estimation accuracy and channel quality. In contrast, two natural baseline methods are shown to yield strictly suboptimal error exponents. We initially focus on Bernoulli sources and binary symmetric channels, and then generalize to sub-Gaussian and heavy-tailed settings along with arbitrary discrete memoryless channels.


State Bar of California admits it used AI to develop exam questions, triggering new furor

Los Angeles Times

Nearly two months after hundreds of prospective California lawyers complained that their bar exams were plagued with technical problems and irregularities, the state's legal licensing body has caused fresh outrage by admitting that some multiple-choice questions were developed with the aid of artificial intelligence. The State Bar of California said in a news release Monday that it will ask the California Supreme Court to adjust test scores for those who took its February bar exam. But it declined to acknowledge significant problems with its multiple-choice questions -- even as it revealed that a subset of questions were recycled from a first-year law student exam, while others were developed with the assistance of AI by ACS Ventures, the State Bar's independent psychometrician. "The debacle that was the February 2025 bar exam is worse than we imagined," said Mary Basick, assistant dean of academic skills at UC Irvine Law School. Having the questions drafted by non-lawyers using ...


UK engineering firm Arup falls victim to 20m deepfake scam

The Guardian

The British engineering company Arup has confirmed it was the victim of a deepfake fraud after an employee was duped into sending HK 200m ( 20m) to criminals by an artificial intelligence-generated video call. Hong Kong police said in February that a worker at a then-unnamed company had been tricked into transferring vast sums by people on a hoax call "posing as senior officers of the company". Arup said in a statement that it was the company involved, confirming that at the beginning of the year it had "notified the police about an incident of fraud in Hong Kong". It confirmed that fake voices and images were used. It added: "Our financial stability and business operations were not affected and none of our internal systems were compromised."


Control3Diff: Learning Controllable 3D Diffusion Models from Single-view Images

Gu, Jiatao, Gao, Qingzhe, Zhai, Shuangfei, Chen, Baoquan, Liu, Lingjie, Susskind, Josh

arXiv.org Artificial Intelligence

Diffusion models have recently become the de-facto approach for generative modeling in the 2D domain. However, extending diffusion models to 3D is challenging due to the difficulties in acquiring 3D ground truth data for training. On the other hand, 3D GANs that integrate implicit 3D representations into GANs have shown remarkable 3D-aware generation when trained only on single-view image datasets. However, 3D GANs do not provide straightforward ways to precisely control image synthesis. To address these challenges, We present Control3Diff, a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis for single-view datasets. Control3Diff explicitly models the underlying latent distribution (optionally conditioned on external inputs), thus enabling direct control during the diffusion process. Moreover, our approach is general and applicable to any type of controlling input, allowing us to train it with the same diffusion objective without any auxiliary supervision. We validate the efficacy of Control3Diff on standard image generation benchmarks, including FFHQ, AFHQ, and ShapeNet, using various conditioning inputs such as images, sketches, and text prompts. Please see the project website (\url{https://jiataogu.me/control3diff}) for video comparisons.


'Jeopardy' fans furious over 'petty' ruling that ended contestants 9-day winning streak

FOX News

Fox Nation's'Who Can Forget 2021?' revisits the year's biggest headlines. To watch the full program, visit foxnation.com "Jeopardy" fans are angry on behalf of nine-day champion Ben Chan after a spelling error caused his winning streak to come to an end. On Tuesday night's episode, Chan reached the Final Jeopardy category after a rocky start with a Daily Double loss that put him close with his opponents, Lynn Di Vito and Danny Lesserman. The category was "Shakespeare's Characters," and the clue was "Both of the names of these 2 lovers in a Shakespeare play come from Latin words for'blessed.'"


Real-Time Radiance Fields for Single-Image Portrait View Synthesis

Trevithick, Alex, Chan, Matthew, Stengel, Michael, Chan, Eric R., Liu, Chao, Yu, Zhiding, Khamis, Sameh, Chandraker, Manmohan, Ramamoorthi, Ravi, Nagano, Koki

arXiv.org Artificial Intelligence

We present a one-shot method to infer and render a photorealistic 3D representation from a single unposed image (e.g., face portrait) in real-time. Given a single RGB input, our image encoder directly predicts a canonical triplane representation of a neural radiance field for 3D-aware novel view synthesis via volume rendering. Our method is fast (24 fps) on consumer hardware, and produces higher quality results than strong GAN-inversion baselines that require test-time optimization. To train our triplane encoder pipeline, we use only synthetic data, showing how to distill the knowledge from a pretrained 3D GAN into a feedforward encoder. Technical contributions include a Vision Transformer-based triplane encoder, a camera data augmentation strategy, and a well-designed loss function for synthetic data training. We benchmark against the state-of-the-art methods, demonstrating significant improvements in robustness and image quality in challenging real-world settings. We showcase our results on portraits of faces (FFHQ) and cats (AFHQ), but our algorithm can also be applied in the future to other categories with a 3D-aware image generator.