Goto

Collaborating Authors








Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment

Neural Information Processing Systems

Such reward model serves as a proxy to human preference, and it is critical to guide the RL step towards improving the model quality. In this work, we argue that the SFT stage significantly benefits from learning a reward model as well. Instead of using the human demonstration data directly via supervised learning, we propose to leverage an Inverse Reinforcement Learning (IRL) technique to simultaneously build an reward model and a policy model. This approach leads to new SFT algorithms that are not only efficient to implement, but are robust to the presence of low-quality supervised learning data. Moreover, we discover a connection between the proposed IRL based approach, and a recent line of works called Self-Play Fine-tune (SPIN, Chen et al. [2024]).


An Inside Look at Lego's New Tech-Packed Smart Brick

WIRED

Lego's next release is a digital brick loaded with sensors that add new layers of interactivity to its play sets. WIRED got exclusive access to the Lego labs where the Smart Brick was born. The secretive division of 237 staff based here and in London, Boston, and Singapore is dedicated to thinking up what comes next for the world's largest toy brand. In front of me, on a plain white table, is a batch of prototypes of Lego's new Smart Brick, the final version of which is a small, sensor-laden 2-by-4 black brick with a big brain. No outsider has seen these prototypes, all of which represent stages of a journey Lego has been charting over the past eight years. Lego hopes this innovation, which lands in stores March 1, will safeguard the future of its plastic empire. The diminutive proportions of the finished Smart Brick belie the fact that the thing is exceedingly clever. Inside is a tiny custom chip running bespoke software that can communicate with onboard sensors to monitor and react to motion, orientation, and magnetic fields. It's also likely no exaggeration that the Smart Brick could represent the most radical product Lego has produced since Jens Nygaard Knudsen, the company's former longtime chief designer, created the minifigure nearly 50 years ago.