sloth
Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families
Polo, Felipe Maia, Somerstep, Seamus, Choshen, Leshem, Sun, Yuekai, Yurochkin, Mikhail
Scaling laws for large language models (LLMs) predict model performance based on parameters like size and training data. However, differences in training configurations and data processing across model families lead to significant variations in benchmark performance, making it difficult for a single scaling law to generalize across all LLMs. On the other hand, training family-specific scaling laws requires training models of varying sizes for every family. In this work, we propose Skills Scaling Laws (SSLaws, pronounced as Sloth), a novel scaling law that leverages publicly available benchmark data and assumes LLM performance is driven by low-dimensional latent skills, such as reasoning and instruction following. These latent skills are influenced by computational resources like model size and training tokens but with varying efficiencies across model families. Sloth exploits correlations across benchmarks to provide more accurate and interpretable predictions while alleviating the need to train multiple LLMs per family. We present both theoretical results on parameter identification and empirical evaluations on 12 prominent benchmarks, from Open LLM Leaderboard v1/v2, demonstrating that Sloth predicts LLM performance efficiently and offers insights into scaling behaviors for downstream tasks such as coding and emotional intelligence applications.
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York (0.04)
- North America > United States > Michigan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
What Happened When ChatGPT Got Hold of My Online Dating Profile - CNET
For the record, I don't own socks with sloths on them. I have three pairs with the CNET logo on them. ChatGPT thinks I might, though, and it also thinks this fact could get me matches on Hinge, or Bumble, or any dating app that has the audacity to ask me for a random fact about myself. Click to read more Love Syncs. Here's a random fact about me: When I tested how ChatGPT might handle rewriting my dating app profile, the experimental AI chatbot tried to turn me into a cringey manic pixie dream girl who forgets to water her "jungle" of houseplants, dances to her favorite "tunes" and is looking for "a fellow weirdo" to go on *shudders* "adventures" with.
- North America > United States > Nevada > Nye County (0.05)
- Europe > Italy > Tuscany (0.05)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
SLOTH: Structured Learning and Task-based Optimization for Time Series Forecasting on Hierarchies
Zhou, Fan, Pan, Chen, Ma, Lintao, Liu, Yu, Wang, Shiyu, Zhang, James, Zhu, Xinxin, Hu, Xuanwei, Hu, Yunhua, Zheng, Yangfei, Lei, Lei, Hu, Yun
Multivariate time series forecasting with hierarchical structure is widely used in real-world applications, e.g., sales predictions for the geographical hierarchy formed by cities, states, and countries. The hierarchical time series (HTS) forecasting includes two sub-tasks, i.e., forecasting and reconciliation. In the previous works, hierarchical information is only integrated in the reconciliation step to maintain coherency, but not in forecasting step for accuracy improvement. In this paper, we propose two novel tree-based feature integration mechanisms, i.e., top-down convolution and bottom-up attention to leverage the information of the hierarchical structure to improve the forecasting performance. Moreover, unlike most previous reconciliation methods which either rely on strong assumptions or focus on coherent constraints only,we utilize deep neural optimization networks, which not only achieve coherency without any assumptions, but also allow more flexible and realistic constraints to achieve task-based targets, e.g., lower under-estimation penalty and meaningful decision-making loss to facilitate the subsequent downstream tasks. Experiments on real-world datasets demonstrate that our tree-based feature integration mechanism achieves superior performances on hierarchical forecasting tasks compared to the state-of-the-art methods, and our neural optimization networks can be applied to real-world tasks effectively without any additional effort under coherence and task-based constraints
Look What ChatGPT Did to My Online Dating Profile - CNET
For the record, I don't own any socks with sloths on them. I have three pairs with the CNET logo on them. ChatGPT thinks I might, though, and it also thinks this fact could get me matches on Hinge, or Bumble, or any dating app that has the audacity to ask me for a random fact about myself. Click to read more Love Syncs. Here's a random fact about me: When I tested how ChatGPT might handle rewriting my dating app profile, the experimental AI chatbot tried to turn me into a cringey manic pixie dream girl who forgets to water her "jungle" of houseplants, dances to her favorite "tunes" and is looking for "a fellow weirdo" to go on *shudders* "adventures" with.
- North America > United States > Nevada > Nye County (0.05)
- Europe > Italy > Tuscany (0.05)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Downscaling Attack and Defense: Turning What You See Back Into What You Get
The resizing of images, which is typically a required part of preprocessing for computer vision systems, is vulnerable to attack. Images can be created such that the image is completely different at machine-vision scales than at other scales and the default settings for some common computer vision and machine learning systems are vulnerable. We show that defenses exist and are trivial to administer provided that defenders are aware of the threat. These attacks and defenses help to establish the role of input sanitization in machine learning.
- Asia > China (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)
Robot toys don't get any cooler than a real-life Transformer
The T9 is more than meets the eye... and more than meets most budgets. The dream robot of my childhood is right in front of me. A remote-control car drives on a table, stops and instantly transforms into a humanoid walking robot. It's a real-life Transformer... well, without the official Transformers licensing, that is. This mechanical specimen is called the T9, made by Robosen Robotics, and it's one of the many robots landing in 2020 to wow kids, and kids at heart.
- North America > United States > New York (0.05)
- Asia > China > Guangdong Province > Shenzhen (0.05)
How to Organize Data Labeling for Machine Learning: Approaches and Tools
If there was a data science hall of fame, it would have a section dedicated to labeling. The labelers' monument could be Atlas holding that large rock symbolizing their arduous, detail-laden responsibilities. ImageNet -- an image database -- would deserve its own stele. Just thinking about it makes you tired. While labeling is not launching a rocket into space, it's still seriously business. Labeling is an indispensable stage of data preprocessing in supervised learning. Historical data with predefined target attributes (values) is used for this model training style. An algorithm can only find target attributes if a human mapped them. Labelers must be extremely attentive because each mistake or inaccuracy negatively affects a dataset's quality and the overall performance of a predictive model. How to get a high-quality labeled dataset without getting grey hair?
Tarzan the swinging robot could be the future of farming
Some farmers already use drones to monitor their crops, but a team of researchers from Georgia Tech have created a far more interesting alternative. Instead of designing yet another drone, they created a robot inspired by Kristen Bell's favorite animal: the sloth. However, they named it "Tarzan" after the most recognizable character who moves by swinging from vine to vine. You see, their machine was designed to move like the fictional jungle dweller. Tarzan will be able to swing over crops using its 3D-printed claws and parallel guy-wires stretched over fields.
BBC uses hi-tech robots in new wildlife series
The Orangutan looked quite magnificent. From her inquisitive eyes to her distinctive orange fur, she was just the sort of creature nature lovers adore watching on TV. But a closer inspection revealed something a little different about her. That's because she is actually an undercover robot, fitted with high-definition cameras behind her glass eyes and used to infiltrate the animal kingdom. The orangutan, as well as an adorable wolf-cub, an utterly convincing meerkat and an incredible floating otter are among 34 animatronic beasts created for the BBC's new series, Spy In The Wild.
- Europe > United Kingdom > England > Greater London > London (0.05)
- Asia > India > Rajasthan (0.05)
- Antarctica (0.05)
- (2 more...)
- Media (0.68)
- Health & Medicine > Therapeutic Area (0.30)