homer
- Asia > Vietnam > Hanoi > Hanoi (0.05)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
HoMer: Addressing Heterogeneities by Modeling Sequential and Set-wise Contexts for CTR Prediction
Chen, Shuwei, Cui, Jiajun, Xu, Zhengqi, Zhang, Fan, Fan, Jiangke, Zhang, Teng, Wang, Xingxing
Click-through rate (CTR) prediction, which models behavior sequence and non-sequential features (e.g., user/item profiles or cross features) to infer user interest, underpins industrial recommender systems. However, most methods face three forms of heterogeneity that degrade predictive performance: (i) Feature Heterogeneity persists when limited sequence side features provide less granular interest representation compared to extensive non-sequential features, thereby impairing sequence modeling performance; (ii) Context Heterogeneity arises because a user's interest in an item will be influenced by other items, yet point-wise prediction neglects cross-item interaction context from the entire item set; (iii) Architecture Heterogeneity stems from the fragmented integration of specialized network modules, which compounds the model's effectiveness, efficiency and scalability in industrial deployments. To tackle the above limitations, we propose HoMer, a Homogeneous-Oriented TransforMer for modeling sequential and set-wise contexts. First, we align sequence side features with non-sequential features for accurate sequence modeling and fine-grained interest representation. Second, we shift the prediction paradigm from point-wise to set-wise, facilitating cross-item interaction in a highly parallel manner. Third, HoMer's unified encoder-decoder architecture achieves dual optimization through structural simplification and shared computation, ensuring computational efficiency while maintaining scalability with model size. Without arduous modification to the prediction pipeline, HoMer successfully scales up and outperforms our industrial baseline by 0.0099 in the AUC metric, and enhances online business metrics like CTR/RPM by 1.99%/2.46%. Additionally, HoMer saves 27% of GPU resources via preliminary engineering optimization, further validating its superiority and practicality.
- Europe > Austria > Vienna (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
HoMeR: Learning In-the-Wild Mobile Manipulation via Hybrid Imitation and Whole-Body Control
Sundaresan, Priya, Malhotra, Rhea, Miao, Phillip, Yang, Jingyun, Wu, Jimmy, Hu, Hengyuan, Antonova, Rika, Engelmann, Francis, Sadigh, Dorsa, Bohg, Jeannette
We introduce HoMeR, an imitation learning framework for mobile manipulation that combines whole-body control with hybrid action modes that handle both long-range and fine-grained motion, enabling effective performance on realistic in-the-wild tasks. At its core is a fast, kinematics-based whole-body controller that maps desired end-effector poses to coordinated motion across the mobile base and arm. Within this reduced end-effector action space, HoMeR learns to switch between absolute pose predictions for long-range movement and relative pose predictions for fine-grained manipulation, offloading low-level coordination to the controller and focusing learning on task-level decisions. We deploy HoMeR on a holonomic mobile manipulator with a 7-DoF arm in a real home. We compare HoMeR to baselines without hybrid actions or whole-body control across 3 simulated and 3 real household tasks such as opening cabinets, sweeping trash, and rearranging pillows. Across tasks, HoMeR achieves an overall success rate of 79.17% using just 20 demonstrations per task, outperforming the next best baseline by 29.17 on average. HoMeR is also compatible with vision-language models and can leverage their internet-scale priors to better generalize to novel object appearances, layouts, and cluttered scenes. In summary, HoMeR moves beyond tabletop settings and demonstrates a scalable path toward sample-efficient, generalizable manipulation in everyday indoor spaces. Code, videos, and supplementary material are available at: http://homer-manip.github.io
- Asia > Vietnam > Hanoi > Hanoi (0.05)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs
Song, Woomin, Oh, Seunghyuk, Mo, Sangwoo, Kim, Jaehyung, Yun, Sukmin, Ha, Jung-Woo, Shin, Jinwoo
Large language models (LLMs) have shown remarkable performance in various natural language processing tasks. However, a primary constraint they face is the context limit, i.e., the maximum number of tokens they can process. Previous works have explored architectural changes and modifications in positional encoding to relax the constraint, but they often require expensive training or do not address the computational demands of self-attention. In this paper, we present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations. HOMER uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks. Each chunk is then processed collectively, employing a hierarchical strategy that merges adjacent chunks at progressive transformer layers. A token reduction technique precedes each merging, ensuring memory usage efficiency. We also propose an optimized computational order reducing the memory requirement to logarithmically scale with respect to input length, making it especially favorable for environments with tight memory restrictions. Our experiments demonstrate the proposed method's superior performance and memory efficiency, enabling the broader use of LLMs in contexts requiring extended context. Code is available at https://github.com/alinlab/HOMER.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Michigan (0.04)
How Far Are We from Believable AI Agents? A Framework for Evaluating the Believability of Human Behavior Simulation
Xiao, Yang, Cheng, Yi, Fu, Jinlan, Wang, Jiashuo, Li, Wenjie, Liu, Pengfei
Human behavior simulation of AI agents necessitates the agents to possess a quality of believability, which is crucial as it facilitates users in establishing trust toward the agents and streamlines the fulfillment of the agents' goal. While recent advancements in Large Language Model (LLM) based agents have improved human behavior simulation, challenges inherent to LLMs (e.g., long context modeling) can undermine their believability. Consequently, evaluating AI agent believability becomes imperative. Unfortunately, prior research often neglects the negative impacts of LLM deficiencies. To address these gaps, we introduce two metrics for assessing LLM-based agent believability: consistency, and robustness, together with a benchmark, SimulateBench, with which, we evaluate the consistency and robustness of agents implemented with popular LLMs. We find that agents (i) struggle to accurately depict character information when presented with lengthy profile inputs; (ii) exhibit vulnerability to profile perturbations; and (iii) are significantly affected by certain key factors that impact their overall believability. Code and SimulateBench are public at https://github.com/GAIR-NLP/GPTMan.
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Hong Kong (0.04)
- (5 more...)
- Information Technology > Security & Privacy (0.67)
- Leisure & Entertainment > Games > Computer Games (0.46)
Relational Concept Based Models
Barbiero, Pietro, Giannini, Francesco, Ciravegna, Gabriele, Diligenti, Michelangelo, Marra, Giuseppe
The design of interpretable deep learning models working in relational domains poses an open challenge: interpretable deep learning methods, such as Concept-Based Models (CBMs), are not designed to solve relational problems, while relational models are not as interpretable as CBMs. To address this problem, we propose Relational Concept-Based Models, a family of relational deep learning methods providing interpretable task predictions. Our experiments, ranging from image classification to link prediction in knowledge graphs, show that relational CBMs (i) match generalization performance of existing relational black-boxes (as opposed to non-relational CBMs), (ii) support the generation of quantified concept-based explanations, (iii) effectively respond to test-time interventions, and (iv) withstand demanding settings including out-of-distribution scenarios, limited training data regimes, and scarce concept supervisions.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Vietnam > Hanoi > Hanoi (0.06)
- North America > United States (0.04)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
Could The Simpsons replace its voice actors with AI deepfakes?
In May 2015, The Simpsons voice actor Harry Shearer – who plays a number of key characters including, quite incredibly, both Mr Burns and Waylon Smithers – announced that he was leaving the show. By then, the animated series had been running for more than 25 years, and the pay of its vocal cast had risen from $30,000 an episode in 1998 to $400,000 an episode from 2008 onwards. But Fox, the producer of The Simpsons, was looking to cut costs – and was threatening to cancel the series unless the voice actors took a 30 per cent pay cut. Most of them agreed, but Shearer (who had been critical of the show's declining quality) refused to sign – after more than two decades, he wanted to break out of the golden handcuffs, and win back the freedom and the time to pursue his own work. Showrunner Al Jean said Shearer's iconic characters – who also include Principal Skinner, Ned Flanders and Otto Mann – would be recast.
- Europe > Belgium > Flanders (0.24)
- North America > United States > Pennsylvania (0.04)
- North America > Canada (0.04)
- Leisure & Entertainment (1.00)
- Media (0.69)
- Information Technology > Security & Privacy (0.51)
- Law > Intellectual Property & Technology Law (0.48)
Why transfer learning works or fails?
During the NIPS tutorial talk given in 2016, Andrew Ng said that transfer learning -- a subarea of machine learning where the model is learned and then deployed in related, yet different, areas -- will be the next driver of machine learning commercial success in the years to come. This statement would be hard to contest as avoiding learning large-scale models from scratch would significantly reduce the high computational and annotation efforts required for it and save data science practitioners lots of time, energy, and, ultimately, money. As an illustration of these latter words, consider Facebook's DeepFace algorithm that was the first to achieve a near-human performance in face verification back in 2014. The neural network behind it was trained on 4.4 million labeled faces -- an overwhelming amount of data that had to be collected, annotated, and then trained on for 3 full days without taking into account the time needed for fine-tuning. It won't be an exaggeration to say that most of the companies and research teams without Facebook's resources and deep learning engineers would have to put in months or even years of work to complete such a feat, with most of this time spent on collecting an annotated sample large enough to build such an accurate classifier.
Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning
Misra, Dipendra, Henaff, Mikael, Krishnamurthy, Akshay, Langford, John
We present an algorithm, HOMER, for exploration and reinforcement learning in rich observation environments that are summarizable by an unknown latent state space. The algorithm interleaves representation learning to identify a new notion of kinematic state abstraction with strategic exploration to reach new states using the learned abstraction. The algorithm provably explores the environment with sample complexity scaling polynomially in the number of latent states and the time horizon, and, crucially, with no dependence on the size of the observation space, which could be infinitely large. This exploration guarantee further enables sample-efficient global policy optimization for any reward function. On the computational side, we show that the algorithm can be implemented efficiently whenever certain supervised learning problems are tractable. Empirically, we evaluate HOMER on a challenging exploration problem, where we show that the algorithm is exponentially more sample efficient than standard reinforcement learning baselines.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > United States > Illinois (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)