- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Middle East > Cyprus > Pafos > Paphos (0.04)
- Asia > Middle East > Jordan (0.04)
- Leisure & Entertainment > Games (0.67)
- Government > Tax (0.45)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > United States > California > Orange County > Irvine (0.14)
- Europe > Slovakia (0.04)
- Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)
- Europe > Greece (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.92)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (5 more...)
- Research Report > Experimental Study (0.92)
- Research Report > New Finding (0.67)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Leisure & Entertainment (0.93)
- Education (0.67)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
Such reward model serves as a proxy to human preference, and it is critical to guide the RL step towards improving the model quality. In this work, we argue that the SFT stage significantly benefits from learning a reward model as well. Instead of using the human demonstration data directly via supervised learning, we propose to leverage an Inverse Reinforcement Learning (IRL) technique to simultaneously build an reward model and a policy model. This approach leads to new SFT algorithms that are not only efficient to implement, but are robust to the presence of low-quality supervised learning data. Moreover, we discover a connection between the proposed IRL based approach, and a recent line of works called Self-Play Fine-tune (SPIN, Chen et al. [2024]).
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
- North America > United States > Texas > Brazos County > College Station (0.14)
- Asia > China > Hong Kong (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
An Inside Look at Lego's New Tech-Packed Smart Brick
Lego's next release is a digital brick loaded with sensors that add new layers of interactivity to its play sets. WIRED got exclusive access to the Lego labs where the Smart Brick was born. The secretive division of 237 staff based here and in London, Boston, and Singapore is dedicated to thinking up what comes next for the world's largest toy brand. In front of me, on a plain white table, is a batch of prototypes of Lego's new Smart Brick, the final version of which is a small, sensor-laden 2-by-4 black brick with a big brain. No outsider has seen these prototypes, all of which represent stages of a journey Lego has been charting over the past eight years. Lego hopes this innovation, which lands in stores March 1, will safeguard the future of its plastic empire. The diminutive proportions of the finished Smart Brick belie the fact that the thing is exceedingly clever. Inside is a tiny custom chip running bespoke software that can communicate with onboard sensors to monitor and react to motion, orientation, and magnetic fields. It's also likely no exaggeration that the Smart Brick could represent the most radical product Lego has produced since Jens Nygaard Knudsen, the company's former longtime chief designer, created the minifigure nearly 50 years ago.
- Asia > Singapore (0.24)
- North America > United States > California (0.04)
- Europe > United Kingdom (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Communications > Networks (0.47)
- Information Technology > Communications > Mobile (0.47)
- South America > Argentina (0.04)
- North America > United States > Virginia (0.04)
- South America > Uruguay (0.04)
- (5 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)