Not enough data to create a plot.
Try a different view from the menu above.
ReEvo: Large Language Models as Hyper-Heuristics with Reflective Evolution Haoran Ye
The omnipresence of NP-hard combinatorial optimization problems (COPs) compels domain experts to engage in trial-and-error heuristic design. The long-standing endeavor of design automation has gained new momentum with the rise of large language models (LLMs). This paper introduces Language Hyper-Heuristics (LHHs), an emerging variant of Hyper-Heuristics that leverages LLMs for heuristic generation, featuring minimal manual intervention and open-ended heuristic spaces. To empower LHHs, we present Reflective Evolution (ReEvo), a novel integration of evolutionary search for efficiently exploring the heuristic space, and LLM reflections to provide verbal gradients within the space.
Appendices A Additional Information on Prompts 16 A.1 Words and Word Frequencies 16 A.2 The Distribution of Prompt Types in the Benchmark 17 A.3 The Encoding Scheme for Task 2 Answers
To answer the first question, we split the data into the two groups: the first group contains the subset of data for numeric-simple prompts, and the second group the subset of data for attribute-color prompts. We only consider prompts in both groups that contain the same numbers (1-4) and the same words ("cat", "apple", "koala", "bottle", "mushroom"), to isolate the effect of adding the color term as opposed to potential confounding factors. For example a confounding factor might be the word identity, as a model might be more accurate in generating correct images when the prompt contains the word "dog", and if this word exists only in the first prompt type and not in the second then responses in the first prompt type will on average have higher accuracy that may or may not
Evaluating Numerical Reasoning in Text-to-Image Models
Text-to-image generative models are capable of producing high-quality images that often faithfully depict concepts described using natural language. In this work, we comprehensively evaluate a range of text-to-image models on numerical reasoning tasks of varying difficulty, and show that even the most advanced models have only rudimentary numerical skills. Specifically, their ability to correctly generate an exact number of objects in an image is limited to small numbers, it is highly dependent on the context the number term appears in, and it deteriorates quickly with each successive number. We also demonstrate that models have poor understanding of linguistic quantifiers (such as "a few" or "as many as"), the concept of zero, and struggle with more advanced concepts such as partial quantities and fractional representations.
EBench: A Comprehensive Benchmark for Instruction-based Image Editing Ke Ye
Significant progress has been made in the field of Instruction-based Image Editing (IIE). However, evaluating these models poses a significant challenge. A crucial requirement in this field is the establishment of a comprehensive evaluation benchmark for accurately assessing editing results and providing valuable insights for its further development.
World's tallest 3D-printed building is unveiled in Switzerland: Futuristic tower stands at almost 100ft tall - so, would you be brave enough to scale it?
Among the charming centuries-old cottages, an elaborate white tower in Switzerland stands out like a sore thumb. To put that into perspective, that's more than six times the size of a double-decker bus! Known as Tor Alva (the'White Tower'), the gleaming white construction in the small village of Mulegns offers a new tourist attraction and cultural hub. Tor Alva is intended to emulate a layered cake – a tribute to the history of confectioners in the region – and also takes inspiration from filigree, an intricate metalwork technique used in making jewellery. Giovanni Netzer, founder of the Origen Cultural Foundation, which designed and built the tower with ETH Zurich, called it'a technical triumph'. 'It inspires the building sector, encourages sustainable tourism and offers new cultural space,' Mr Netzer said.
Graphcode: Learning from multiparameter persistent homology using graph neural networks
We introduce graphcodes, a novel multi-scale summary of the topological properties of a dataset that is based on the well-established theory of persistent homology. Graphcodes handle datasets that are filtered along two real-valued scale parameters. Such multi-parameter topological summaries are usually based on complicated theoretical foundations and difficult to compute; in contrast, graphcodes yield an informative and interpretable summary and can be computed as efficient as one-parameter summaries. Moreover, a graphcode is simply an embedded graph and can therefore be readily integrated in machine learning pipelines using graph neural networks. We describe such a pipeline and demonstrate that graphcodes achieve better classification accuracy than state-of-the-art approaches on various datasets.
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks Andy Zhou 1,2 Bo Li1 Haohan Wang
Despite advances in AI alignment, large language models (LLMs) remain vulnerable to adversarial attacks or jailbreaking, in which adversaries can modify prompts to induce unwanted behavior. While some defenses have been proposed, they have not been adapted to newly proposed attacks and more challenging threat models. To address this, we propose an optimization-based objective for defending LLMs against jailbreaking attacks and an algorithm, Robust Prompt Optimization (RPO) to create robust system-level defenses. Our approach directly incorporates the adversary into the defensive objective and optimizes a lightweight and transferable suffix, enabling RPO to adapt to worst-case adaptive attacks. Our theoretical and experimental results show improved robustness to both jailbreaks seen during optimization and unknown jailbreaks, reducing the attack success rate (ASR) on GPT-4 to 6% and Llama-2 to 0% on JailbreakBench, setting the state-of-the-art.
Supplementary Material: Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness
This is different from the full Monte Carlo sampling used by MC Dropout or deep ensembles which require multiple forward passes and are computationally expensive. As shown in the latency results in experiments (Section 5.2), the extra variance-related computation only adds a small overhead to the inference time of a deterministic DNN. In the experiments, we use 10 samples to compute the mean predictive distribution. In applications where the inference latency is of high priority (e.g., real-time pCTR prediction for online advertising), we can reduce the computational overhead further by replacing the Monte Carlo averaging with the mean-field approximation [15]. We leave this for future work.
Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness Zi Lin
Bayesian neural networks and deep ensembles are principled approaches to estimate the predictive uncertainty of a deep learning model. However their practicality in real-time, industrial-scale applications are limited due to their heavy memory and inference cost. This motivates us to study principled approaches to high-quality uncertainty estimation that require only a single deep neural network (DNN). By formalizing the uncertainty quantification as a minimax learning problem, we first identify distance awareness, i.e., the model's ability to properly quantify the distance of a testing example from the training data manifold, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs, by adding a weight normalization step during training and replacing the output layer with a Gaussian Process. On a suite of vision and language understanding tasks and on modern architectures (Wide-ResNet and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.