Goto

Collaborating Authors

Image2Struct: Benchmarking Structure Extraction for Vision-Language Models Tony Lee

Neural Information Processing Systems

We introduce Image2Struct, a benchmark to evaluate vision-language models (VLMs) on extracting structure from images. Our benchmark 1) captures realworld use cases, 2) is fully automatic and does not require human judgment, and 3) is based on a renewable stream of fresh data. In Image2Struct, VLMs are prompted to generate the underlying structure (e.g., LaTeX code or HTML) from an input image (e.g., webpage screenshot). The structure is then rendered to produce an output image (e.g., rendered webpage), which is compared against the input image to produce a similarity score. This round-trip evaluation allows us to quantitatively evaluate VLMs on tasks with multiple valid structures. We create a pipeline that downloads fresh data from active online communities upon execution and evaluates the VLMs without human intervention. We introduce three domains (Webpages, LaTeX, and Musical Scores) and use five image metrics (pixel similarity, cosine similarity between the Inception vectors, learned perceptual image patch similarity, structural similarity index measure, and earth mover similarity) that allow efficient and automatic comparison between pairs of images. We evaluate Image2Struct on 14 prominent VLMs and find that scores vary widely, indicating that Image2Struct can differentiate between the performances of different VLMs. Additionally, the best score varies considerably across domains (e.g., 0.402 on sheet music vs. 0.830 on LaTeX equations), indicating that Image2Struct contains tasks of varying difficulty.


Maximizing Influence in an Ising Network: A Mean-Field Optimal Solution

Neural Information Processing Systems

Influence maximization in social networks has typically been studied in the context of contagion models and irreversible processes. In this paper, we consider an alternate model that treats individual opinions as spins in an Ising system at dynamic equilibrium. We formalize the \textit{Ising influence maximization} problem, which has a natural physical interpretation as maximizing the magnetization given a budget of external magnetic field. Under the mean-field (MF) approximation, we present a gradient ascent algorithm that uses the susceptibility to efficiently calculate local maxima of the magnetization, and we develop a number of sufficient conditions for when the MF magnetization is concave and our algorithm converges to a global optimum. We apply our algorithm on random and real-world networks, demonstrating, remarkably, that the MF optimal external fields (i.e., the external fields which maximize the MF magnetization) exhibit a phase transition from focusing on high-degree individuals at high temperatures to focusing on low-degree individuals at low temperatures.


Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization

Neural Information Processing Systems

We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonsmooth part is convex. Surprisingly, unlike the smooth case, our knowledge of this fundamental problem is very limited. For example, it is not known whether the proximal stochastic gradient method with constant minibatch converges to a stationary point. To tackle this issue, we develop fast stochastic algorithms that provably converge to a stationary point for constant minibatches. Furthermore, using a variant of these algorithms, we obtain provably faster convergence than batch proximal gradient descent.


Anthropic's free Claude 4 Sonnet aced my coding tests - but its paid Opus model somehow didn't

ZDNet

What sucked a year ago is at the top of the heap this year. I saw that over the past few months, as both Google's Gemini and Microsoft's Copilot went from the bottom rung on the AI coding ladder to the winner's circle, passing all of my coding tests. Also: How I test an AI chatbot's coding ability - and you can, too Today, another language model is making the trek up the ladder. What makes this interesting is that the underdog player is moving into the winner's circle, where the odds-on favorite only climbed up a rung or two before getting stuck. Like most LLM offerings, Anthropic offers its Claude chatbot in both free and paid versions.


Cyclades: Conflict-free Asynchronous Machine Learning

Neural Information Processing Systems

We present Cyclades, a general framework for parallelizing stochastic optimization algorithms in a shared memory setting. Cyclades is asynchronous during model updates, and requires no memory locking mechanisms, similar to Hogwild!-type algorithms. Unlike Hogwild!, Cyclades introduces no conflicts during parallel execution, and offers a black-box analysis for provable speedups across a large family of algorithms. Due to its inherent cache locality and conflict-free nature, our multi-core implementation of Cyclades consistently outperforms Hogwild!-type algorithms on sufficiently sparse datasets, leading to up to 40% speedup gains compared to Hogwild!, and up to 5\times gains over asynchronous implementations of variance reduction algorithms.


e6be4c22a5963ab00dfe8f3b695b5332-AuthorFeedback.pdf

Neural Information Processing Systems

We appreciate the praise for the "Extremely good and unique empirical Lasso or Markov Blanket (MB) requires causal sufficiency, let alone curse of dimensionality. In sparse large graphs FS gives more FP. BE performs worse in small sparse graphs. Overall, our method manages to keep FPs very low ( 2.1%) for all BE and FS computations took significantly long. Indeed, an empirical example of this was given in section A2, Figure 1 of suppl.


These colourful origami figures are actually robots

Mashable

They look like folded art, but act like machines. These metamaterials combine origami design with advanced geometry to create structures that change shape, respond to their environment, and could reshape construction and robotics.


cf66f995883298c4db2f0dcba28fb211-Paper-Conference.pdf

Neural Information Processing Systems

Time series forecasting is crucial for applications across multiple domains and various scenarios. Although Transformers have dramatically advanced the landscape of forecasting, their effectiveness remains debated. Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches, highlighting the potential for more streamlined architectures. In this paper, we shift the focus from evaluating the overall Transformer architecture to specifically examining the effectiveness of self-attention for time series forecasting. To this end, we introduce a new architecture, Cross-Attention-only Time Series transformer (CATS), that rethinks the traditional transformer framework by eliminating self-attention and leveraging cross-attention mechanisms instead. By establishing future horizon-dependent parameters as queries and enhanced parameter sharing, our model not only improves long-term forecasting accuracy but also reduces the number of parameters and memory usage. Extensive experiment across various datasets demonstrates that our model achieves superior performance with the lowest mean squared error and uses fewer parameters compared to existing models. The implementation of our model is available at: https://github.com/dongbeank/CATS.


Oracle-Efficient Algorithms for Online Linear Optimization with Bandit Feedback Shinji Ito

Neural Information Processing Systems

Although existing algorithms achieve an optimal regret bound of Õ( T) for T rounds (ignoring factors of poly(d, log T)), computationally efficient ways of implementing them have not yet been specified, in particular when |A| is not bounded by a polynomial size in d. A standard way to pursue computational efficiency is to assume that we have an efficient algorithm referred to as oracle that solves (offline) linear optimization problems over A. Under this assumption, the computational efficiency of a bandit algorithm can then be measured in terms of oracle complexity, i.e., the number of oracle calls. Our contribution is to propose algorithms that offer optimal regret bounds of Õ( T) as well as low oracle complexity for both non-stochastic settings and stochastic settings. Our algorithm for non-stochastic settings has an oracle complexity of Õ(T) and is the first algorithm that achieves both a regret bound of Õ( T) and an oracle complexity of Õ(poly(T)), given only linear optimization oracles. Our algorithm for stochastic settings calls the oracle only O(poly(d, log T)) times, which is smaller than the current best oracle complexity of O(T) if T is sufficiently large. This work was supported by JST, ERATO, Grant Number JPMJER1201, Japan. This work was supported by JST, ACT-I, Grant Number JPMJPR18U5, Japan. This work was supported by JST, PRESTO, Grant Number JPMJPR1759, Japan. This work was supported by JSPS, KAKENHI, Grant Number JP18H05291, Japan.


Iterative Refinement of the Approximate Posterior for Directed Belief Networks

Neural Information Processing Systems

Variational methods that rely on a recognition network to approximate the posterior of directed graphical models offer better inference and learning than previous methods. Recent advances that exploit the capacity and flexibility in this approach have expanded what kinds of models can be trained. However, as a proposal for the posterior, the capacity of the recognition network is limited, which can constrain the representational power of the generative model and increase the variance of Monte Carlo estimates. To address these issues, we introduce an iterative refinement procedure for improving the approximate posterior of the recognition network and show that training with the refined posterior is competitive with state-of-the-art methods. The advantages of refinement are further evident in an increased effective sample size, which implies a lower variance of gradient estimates.