Goto

Collaborating Authors

 alto



Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization

Neural Information Processing Systems

Estimating the homography between two images is crucial for mid-or high-level vision tasks, such as image stitching and fusion. However, using supervised learning methods is often challenging or costly due to the difficulty of collecting ground-truth data. In response, unsupervised learning approaches have emerged. Most early methods, though, assume that the given image pairs are from the same camera or have minor lighting differences. Consequently, while these methods perform effectively under such conditions, they generally fail when input image pairs come from different domains, referred to as multimodal image pairs.To address these limitations, we propose AltO, an unsupervised learning framework for estimating homography in multimodal image pairs. Our method employs a two-phase alternating optimization framework, similar to Expectation-Maximization (EM), where one phase reduces the geometry gap and the other addresses the modality gap. To handle these gaps, we use Barlow Twins loss for the modality gap and propose an extended version, Geometry Barlow Twins, for the geometry gap. As a result, we demonstrate that our method, AltO, can be trained on multimodal datasets without any ground-truth data. It not only outperforms other unsupervised methods but is also compatible with various architectures of homography estimators.The source code can be found at: https://github.com/songsang7/AltO



Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization

Neural Information Processing Systems

Estimating the homography between two images is crucial for mid- or high-level vision tasks, such as image stitching and fusion. However, using supervised learning methods is often challenging or costly due to the difficulty of collecting ground-truth data. In response, unsupervised learning approaches have emerged. Most early methods, though, assume that the given image pairs are from the same camera or have minor lighting differences. Consequently, while these methods perform effectively under such conditions, they generally fail when input image pairs come from different domains, referred to as multimodal image pairs.To address these limitations, we propose AltO, an unsupervised learning framework for estimating homography in multimodal image pairs. Our method employs a two-phase alternating optimization framework, similar to Expectation-Maximization (EM), where one phase reduces the geometry gap and the other addresses the modality gap.


Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization

Song, Sanghyeob, Lew, Jaihyun, Jang, Hyemi, Yoon, Sungroh

arXiv.org Artificial Intelligence

Estimating the homography between two images is crucial for mid- or high-level vision tasks, such as image stitching and fusion. However, using supervised learning methods is often challenging or costly due to the difficulty of collecting ground-truth data. In response, unsupervised learning approaches have emerged. Most early methods, though, assume that the given image pairs are from the same camera or have minor lighting differences. Consequently, while these methods perform effectively under such conditions, they generally fail when input image pairs come from different domains, referred to as multimodal image pairs. To address these limitations, we propose AltO, an unsupervised learning framework for estimating homography in multimodal image pairs. Our method employs a two-phase alternating optimization framework, similar to Expectation-Maximization (EM), where one phase reduces the geometry gap and the other addresses the modality gap. To handle these gaps, we use Barlow Twins loss for the modality gap and propose an extended version, Geometry Barlow Twins, for the geometry gap. As a result, we demonstrate that our method, AltO, can be trained on multimodal datasets without any ground-truth data. It not only outperforms other unsupervised methods but is also compatible with various architectures of homography estimators. The source code can be found at:~\url{https://github.com/songsang7/AltO}


The Long Road to Genuine AI Mastery

TIME - Tech

In the early 1970s, programming computers involved punching holes in cards and feeding them to room-size machines that would produce results through a line printer, often hours or even days later. This is what computing had looked like for a long time, and it was against this backdrop that a team of 29 scientists and researchers at the famed Xerox PARC created the more intimate form of computing we know today: one with a display, a keyboard, and a mouse. This computer, called Alto, was so bewilderingly different that it necessitated a new term: interactive computing. Alto was viewed by some as absurdly extravagant because of its expensive components. But fast-forward 50 years, and multitrillion-dollar supply chains have sprung up to transform silica-rich sands into sophisticated, wondrous computers that live in our pockets.


ALTO: An Efficient Network Orchestrator for Compound AI Systems

Santhanam, Keshav, Raghavan, Deepti, Rahman, Muhammad Shahir, Venkatesh, Thejas, Kunjal, Neha, Thaker, Pratiksha, Levis, Philip, Zaharia, Matei

arXiv.org Artificial Intelligence

We present ALTO, a network orchestrator for efficiently serving compound AI systems such as pipelines of language models. ALTO achieves high throughput and low latency by taking advantage of an optimization opportunity specific to generative language models: streaming intermediate outputs. As language models produce outputs token by token, ALTO exposes opportunities to stream intermediate outputs between stages when possible. We highlight two new challenges of correctness and load balancing which emerge when streaming intermediate data across distributed pipeline stage instances. We also motivate the need for an aggregation-aware routing interface and distributed prompt-aware scheduling to address these challenges. We demonstrate the impact of ALTO's partial output streaming on a complex chatbot verification pipeline, increasing throughput by up to 3x for a fixed latency target of 4 seconds / request while also reducing tail latency by 1.8x compared to a baseline serving approach.


Steve Jobs' $4 check written in 1976 draws bid of over $33,000 at auction

The Guardian

A four-dollar check that Apple co-founder Steve Jobs wrote to Radio Shack in 1976 was up for auction on Wednesday at Boston-based RR Auction with a bid of more than $33,000 with five hours left to go. The signed check, drawn against an "Apple Computer Company" account at a Wells Fargo Bank branch in Los Altos, California, joins a hot market for Jobs' signature and memorabilia. Last year, a $9.18 Apple Computer cheque signed by Jobs in 1976 sold for $55,000; another from the same year, for $13.86 to Elmar Electronics, sold in March for $37,564. The Apple inventor's signature on a job application for employment as an "electronics tech or design engineer" from 1973, classified as Jobs' earliest known signature by the auctioneer, sold in 2018 for $174,757. A signature from three years later, when Jobs was 21, that appeared on an original Apple founding contract signed by Jobs, Steve Wozniak and Ronald Wayne was sold by Sotheby's in December 2011 for $1,594,500.


One man and his dog: Summerhill turns shepherding into a video game puzzle

The Guardian

In the soft, rolling hills of the Derbyshire dales the grass is clipped to just a few centimetres by gently bleating sheep. For game artist and designer Harry Nesbitt who grew up here, this countryside is in his blood. "There's something there deep in my subconscious," he says. "I always want to tell stories or depict worlds that are close to my heart." Nesbitt's fondness for this terrain is visible the very first time you look at Summerhill, his forthcoming puzzle-adventure game that tasks the player with herding sheep through a bucolic landscape.


ALTO: A Large-Scale Dataset for UAV Visual Place Recognition and Localization

Cisneros, Ivan, Yin, Peng, Zhang, Ji, Choset, Howie, Scherer, Sebastian

arXiv.org Artificial Intelligence

We present the ALTO dataset, a vision-focused dataset for the development and benchmarking of Visual Place Recognition and Localization methods for Unmanned Aerial Vehicles. The dataset is composed of two long (approximately 150km and 260km) trajectories flown by a helicopter over Ohio and Pennsylvania, and it includes high precision GPS-INS ground truth location data, high precision accelerometer readings, laser altimeter readings, and RGB downward facing camera imagery. In addition, we provide reference imagery over the flight paths, which makes this dataset suitable for VPR benchmarking and other tasks common in Localization, such as image registration and visual odometry. To the author's knowledge, this is the largest real-world aerial-vehicle dataset of this kind. Our dataset is available at https://github.com/MetaSLAM/ALTO.