eye view
LLMs are Not Just Next Token Predictors
Downes, Stephen M., Forber, Patrick, Grzankowski, Alex
LLMs are statistical models of language learning through stochastic gradient descent with a next token prediction objective. Prompting a popular view among AI modelers: LLMs are just next token predictors. While LLMs are engineered using next token prediction, and trained based on their success at this task, our view is that a reduction to just next token predictor sells LLMs short. Moreover, there are important explanations of LLM behavior and capabilities that are lost when we engage in this kind of reduction. In order to draw this out, we will make an analogy with a once prominent research program in biology explaining evolution and development from the genes eye view. LLMs are statistical models of language learning through stochastic gradient descent with a next token prediction objective. So, LLMs are'just next token predictors', a popular view among AI modelers, explicitly laid out by Shanahan (2024): "A great many tasks that demand intelligence in humans can be reduced to next-token prediction with a sufficiently performant model" (2024, 68), and "surely what they are doing is more than'just' next-token prediction? Well, it is an engineering fact that this is what an LLM does. The noteworthy thing is that next-token prediction is sufficient for solving previously unseen reasoning problems" (2024, 77).
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Utah (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Improving Bird's Eye View Semantic Segmentation by Task Decomposition
Zhao, Tianhao, Chen, Yongcan, Wu, Yu, Liu, Tianyang, Du, Bo, Xiao, Peilun, Qiu, Shi, Yang, Hongda, Li, Guozhen, Yang, Yi, Lin, Yutian
Semantic segmentation in bird's eye view (BEV) plays a crucial role in autonomous driving. Previous methods usually follow an end-to-end pipeline, directly predicting the BEV segmentation map from monocular RGB inputs. However, the challenge arises when the RGB inputs and BEV targets from distinct perspectives, making the direct point-to-point predicting hard to optimize. In this paper, we decompose the original BEV segmentation task into two stages, namely BEV map reconstruction and RGB-BEV feature alignment. In the first stage, we train a BEV autoencoder to reconstruct the BEV segmentation maps given corrupted noisy latent representation, which urges the decoder to learn fundamental knowledge of typical BEV patterns. The second stage involves mapping RGB input images into the BEV latent space of the first stage, directly optimizing the correlations between the two views at the feature level. Our approach simplifies the complexity of combining perception and generation into distinct steps, equipping the model to handle intricate and challenging scenes effectively. Besides, we propose to transform the BEV segmentation map from the Cartesian to the polar coordinate system to establish the column-wise correspondence between RGB images and BEV maps. Moreover, our method requires neither multi-scale features nor camera intrinsic parameters for depth estimation and saves computational overhead. Extensive experiments on nuScenes and Argoverse show the effectiveness and efficiency of our method. Code is available at https://github.com/happytianhao/TaDe.
Ring announces a new battery-powered doorbell with 3D motion detection and improved visuals
Ring has announced a refresh of its popular Battery Doorbell Plus outdoor camera. The Battery Doorbell Pro is an upgrade in nearly every way, as is usually the case when companies slap "Pro" at the end of a name. Ring says this new model is its "most advanced battery powered doorbell" ever and that it's packed with features that exceed even its wired doorbells. It boasts radar-powered 3D motion detection, which was also included with the company's Stick Up Cam Pro. Otherwise called "Bird's Eye View", this technology tracks an object's path through the camera's field of view so you can monitor where visitors are going and the route they took to get there.
- Energy > Energy Storage (1.00)
- Electrical Industrial Apparatus (1.00)
- Commercial Services & Supplies > Security & Alarm Services (0.74)
MotionBEV: Attention-Aware Online LiDAR Moving Object Segmentation with Bird's Eye View based Appearance and Motion Features
Zhou, Bo, Xie, Jiapeng, Pan, Yan, Wu, Jiajie, Lu, Chuanzhao
Identifying moving objects is an essential capability for autonomous systems, as it provides critical information for pose estimation, navigation, collision avoidance, and static map construction. In this paper, we present MotionBEV, a fast and accurate framework for LiDAR moving object segmentation, which segments moving objects with appearance and motion features in the bird's eye view (BEV) domain. Our approach converts 3D LiDAR scans into a 2D polar BEV representation to improve computational efficiency. Specifically, we learn appearance features with a simplified PointNet and compute motion features through the height differences of consecutive frames of point clouds projected onto vertical columns in the polar BEV coordinate system. We employ a dual-branch network bridged by the Appearance-Motion Co-attention Module (AMCM) to adaptively fuse the spatio-temporal information from appearance and motion features. Our approach achieves state-of-the-art performance on the SemanticKITTI-MOS benchmark. Furthermore, to demonstrate the practical effectiveness of our method, we provide a LiDAR-MOS dataset recorded by a solid-state LiDAR, which features non-repetitive scanning patterns and a small field of view.
Herd's Eye View: Improving Game AI Agent Learning with Collaborative Perception
Nash, Andrew, Vardy, Andrew, Churchill, David
We present a novel perception model named Herd's Eye View (HEV) that adopts a global perspective derived from multiple agents to boost the decision-making capabilities of reinforcement learning (RL) agents in multi-agent environments, specifically in the context of game AI. The HEV approach utilizes cooperative perception to empower RL agents with a global reasoning ability, enhancing their decision-making. We demonstrate the effectiveness of the HEV within simulated game environments and highlight its superior performance compared to traditional ego-centric perception models. This work contributes to cooperative perception and multi-agent reinforcement learning by offering a more realistic and efficient perspective for global coordination and decision-making within game environments. Moreover, our approach promotes broader AI applications beyond gaming by addressing constraints faced by AI in other fields such as robotics. The code is available at https://github.com/andrewnash/Herds-Eye-View
- Leisure & Entertainment > Games > Computer Games (0.69)
- Information Technology > Software (0.55)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
AI has a bird's eye view
What can we learn by looking down from above? To see a city from the sky is to see it as an eagle would. You can fly high up in the sky in breathtaking drone footage to reveal a landscape of hope and rich culture. Drone view is a powerful tool. A new method allows the creation of bird's eye views from a single frontal photo.
Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?
Building 3D perception systems for autonomous vehicles that do not rely on high-density LiDAR is a critical research problem because of the expense of LiDAR systems compared to cameras and other sensors. Recent research has developed a variety of camera-only methods, where features are differentiably "lifted" from the multi-camera images onto the 2D ground plane, yielding a "bird's eye view" (BEV) feature representation of the 3D space around the vehicle. This line of work has produced a variety of novel "lifting" methods, but we observe that other details in the training setups have shifted at the same time, making it unclear what really matters in top-performing methods. We also observe that using cameras alone is not a real-world constraint, considering that additional sensors like radar have been integrated into real vehicles for years already. We find that batch size and input resolution greatly affect performance, while lifting strategies have a more modest effect--even a simple parameter-free lifter works well.
Ring brings radar detection to its Spotlight Cam Pro
We've already seen Ring add Bird's Eye View -- its fancy 3D motion detection -- to its flagship security camera and its flagship outdoor light camera. Consequently, you get no prizes for guessing that the feature is now coming to the new Ring Spotlight Cam Pro. The new Pro Spotlight Cam is joined by a Spotlight Cam Plus, which offers a slightly nicer design than its predecessor. For the uninitiated, Birds Eye View is a system that offers users a top-down map of their area, showing the path a person took to your front door. It's designed to let you know if someone's been peering into your windows, or anywhere else, while on your porch.
Experimental Analysis of Trajectory Control Using Computer Vision and Artificial Intelligence for Autonomous Vehicles
Abbas, Ammar N., Irshad, Muhammad Asad, Ammar, Hossam Hassan
Perception of the lane boundaries is crucial for the tasks related to autonomous trajectory control. In this paper, several methodologies for lane detection are discussed with an experimental illustration: Hough transformation, Blob analysis, and Bird's eye view. Following the abstraction of lane marks from the boundary, the next approach is applying a control law based on the perception to control steering and speed control. In the following, a comparative analysis is made between an open-loop response, PID control, and a neural network control law through graphical statistics. To get the perception of the surrounding a wireless streaming camera connected to Raspberry Pi is used. After pre-processing the signal received by the camera the output is sent back to the Raspberry Pi that processes the input and communicates the control to the motors through Arduino via serial communication.
- Information Technology > Hardware (0.56)
- Transportation > Ground > Road (0.49)
- Information Technology > Robotics & Automation (0.49)