Goto

Collaborating Authors

 nectar


Extensive Self-Contrast Enables Feedback-Free Language Model Alignment

Liu, Xiao, Song, Xixuan, Dong, Yuxiao, Tang, Jie

arXiv.org Artificial Intelligence

Reinforcement learning from human feedback (RLHF) has been a central technique for recent large language model (LLM) alignment. However, its heavy dependence on costly human or LLM-as-Judge preference feedback could stymie its wider applications. In this work, we introduce Self-Contrast, a feedback-free large language model alignment method via exploiting extensive self-generated negatives. With only supervised fine-tuning (SFT) targets, Self-Contrast leverages the LLM itself to generate massive diverse candidates, and harnesses a pre-trained embedding model to filter multiple negatives according to text similarity. Theoretically, we illustrate that in this setting, merely scaling negative responses can still effectively approximate situations with more balanced positive and negative preference annotations. Our experiments with direct preference optimization (DPO) on three datasets show that, Self-Contrast could consistently outperform SFT and standard DPO training by large margins. And as the number of self-generated negatives increases, the performance of Self-Contrast continues to grow. Code and data are available at https://github.com/THUDM/Self-Contrast.


Implementing Artificial Bee Colony Algorithm to Solve Business Problems

#artificialintelligence

Artificial Bee Colony Algorithm (ABC) is an optimization algorithm based on the intelligent foraging behavior of a honey bee swarm. We'll be looking at the ABC algorithm in detail through its purpose, implementation, and functionality. We will then solve a few problems optimizing benchmark functions such as Sphere, Himmelblau, and the Cross-In-Tray function shown below. We will also look at the application of the ABC Algorithm to real-world business problems. Full, reusable code for the implementation is available on Github. At AAXIS Digital, we routinely encounter intractable business optimization problems that require out-of-the-box thinking. To model these business problems to be solved computationally, we need to model it as a list of decision variables representing a candidate solution and be able to compute a "measure of goodness", called the objective function.


Haitham Al-Beik, CEO & Co-Founder of Wings – Interview Series

#artificialintelligence

Haitham Al-Beik is the CEO and Founder of Wings, an emerging startup producing autonomous foodservice businesses designed with proprietary, purpose-built "HiveRobotics" and intuitive end-to-end experiences without human intervention. Could you share the genesis story behind Wings? Wings began as a fundamental research and development lab to address the complexities and frictions of the services industry. The industry has grown more labor intensive, surrounded by non-cohesive, vendor-driven components on non-standard logistics and operations, diminishing the industry's creative potential and resulting in no innovation or increase in entrepreneurship. The barrier to entry has only become higher and more complex for any individual to enter the space regardless of their experience or intended sector.


Fetch.ai announces partnership with IoT-focused distributed ledger IOTA

#artificialintelligence

Cambridge-based AI startup Fetch.ai has announced a partnership with IOTA, an open-source distributed ledger focused on the Internet of Things. Fetch.ai has caught the attention of investors for its potentially groundbreaking machine learning network of autonomous "agents" that can perform real-world tasks. It's also caught our attention, making our list of innovative companies to watch in 2021. IOTA was among the most hyped projects during the 2017 cryptocurrency/blockchain frenzy (or bubble, dependent on your perspective.) While most projects have since collapsed, IOTA continues to go from strength-to-strength and has announced a series of partnerships with giants like Dell and Jaguar Land Rover.


Echo State Networks for Reinforcement Learning

Hart, Allen G., Olding, Kevin R., Cox, A. M. G., Isupova, Olga, Dawes, J. H. P.

arXiv.org Artificial Intelligence

Echo State Networks (ESNs) are a type of single-layer recurrent neural network with randomly-chosen internal weights and a trainable output layer. We prove under mild conditions that a sufficiently large Echo State Network (ESN) can approximate the value function of a broad class of stochastic and deterministic control problems. Such control problems are generally non-Markovian. We describe how the ESN can form the basis for novel (and computationally efficient) reinforcement learning algorithms in a non-Markovian framework. We demonstrate this theory with two examples. In the first, we use an ESN to solve a deterministic, partially observed, control problem which is a simple game we call `Bee World'. In the second example, we consider a stochastic control problem inspired by a market making problem in mathematical finance. In both cases we can compare the dynamics of the algorithms with analytic solutions to show that even after only a single reinforcement policy iteration the algorithms perform with reasonable skill.


Robotic bees could take the sting out of Colony Collapse Disorder

#artificialintelligence

America's agricultural sector faces an unprecedented crisis. Native honeybees, one of the most prolific pollinators in the animal kingdom, are dying off at an unprecedented rate from Colony Collapse Disorder (CCD), threatening an ecosystem service worth about $15 billion. Supported by the National Science Foundation (NSF), the "RoboBees" project looks to minimize the loss of this critical resource with remarkable microbots that can mimic the pollinating role of a honeybee. But the project has a number of challenges to overcome before these robots can take to the skies. The RoboBee is a microrobot inspired by the biology of a honey bee.


Foraging in an Uncertain Environment Using Predictive Hebbian Learning

Montague, P. Read, Dayan, Peter, Sejnowski, Terrence J.

Neural Information Processing Systems

Survival is enhanced by an ability to predict the availability of food, the likelihood of predators, and the presence of mates. We present a concrete model that uses diffuse neurotransmitter systems to implement a predictive version of a Hebb learning rule embedded in a neural architecture based on anatomical and physiological studies on bees. The model captured the strategies seen in the behavior of bees and a number of other animals when foraging in an uncertain environment. The predictive model suggests a unified way in which neuromodulatory influences can be used to bias actions and control synaptic plasticity. Successful predictions enhance adaptive behavior by allowing organisms to prepare for future actions, rewards, or punishments. Moreover, it is possible to improve upon behavioral choices if the consequences of executing different actions can be reliably predicted. Although classical and instrumental conditioning results from the psychological literature [1] demonstrate that the vertebrate brain is capable of reliable prediction, how these predictions are computed in brains is not yet known. The brains of vertebrates and invertebrates possess small nuclei which project axons throughout large expanses of target tissue and deliver various neurotransmitters such as dopamine, norepinephrine, and acetylcholine [4]. The activity in these systems may report on reinforcing stimuli in the world or may reflect an expectation of future reward [5, 6,7,8].


Foraging in an Uncertain Environment Using Predictive Hebbian Learning

Montague, P. Read, Dayan, Peter, Sejnowski, Terrence J.

Neural Information Processing Systems

Survival is enhanced by an ability to predict the availability of food, the likelihood of predators, and the presence of mates. We present a concrete model that uses diffuse neurotransmitter systems to implement a predictive version of a Hebb learning rule embedded in a neural architecture based on anatomical and physiological studies on bees. The model captured the strategies seen in the behavior of bees and a number of other animals when foraging in an uncertain environment. The predictive model suggests a unified way in which neuromodulatory influences can be used to bias actions and control synaptic plasticity. Successful predictions enhance adaptive behavior by allowing organisms to prepare for future actions, rewards, or punishments. Moreover, it is possible to improve upon behavioral choices if the consequences of executing different actions can be reliably predicted. Although classical and instrumental conditioning results from the psychological literature [1] demonstrate that the vertebrate brain is capable of reliable prediction, how these predictions are computed in brains is not yet known. The brains of vertebrates and invertebrates possess small nuclei which project axons throughout large expanses of target tissue and deliver various neurotransmitters such as dopamine, norepinephrine, and acetylcholine [4]. The activity in these systems may report on reinforcing stimuli in the world or may reflect an expectation of future reward [5, 6,7,8].


Foraging in an Uncertain Environment Using Predictive Hebbian Learning

Montague, P. Read, Dayan, Peter, Sejnowski, Terrence J.

Neural Information Processing Systems

Survival is enhanced by an ability to predict the availability of food, the likelihood of predators, and the presence of mates. We present a concrete model that uses diffuse neurotransmitter systems to implement a predictive version of a Hebb learning rule embedded in a neural architecture basedon anatomical and physiological studies on bees. The model captured the strategies seen in the behavior of bees and a number of other animals when foraging in an uncertain environment. The predictive model suggests a unified way in which neuromodulatory influences can be used to bias actions and control synaptic plasticity. Successful predictions enhance adaptive behavior by allowing organisms to prepare for future actions,rewards, or punishments. Moreover, it is possible to improve upon behavioral choices if the consequences of executing different actions can be reliably predicted. Although classicaland instrumental conditioning results from the psychological literature [1] demonstrate that the vertebrate brain is capable of reliable prediction, how these predictions are computed in brains is not yet known. The brains of vertebrates and invertebrates possess small nuclei which project axons throughout large expanses of target tissue and deliver various neurotransmitters such as dopamine, norepinephrine, and acetylcholine [4]. The activity in these systems may report on reinforcing stimuli in the world or may reflect an expectation of future reward [5, 6,7,8].