Goto

Collaborating Authors

 detach







Can Large Language Models Invent Algorithms to Improve Themselves?

Ishibashi, Yoichi, Yano, Taro, Oyamada, Masafumi

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have shown remarkable performance improvements and are rapidly gaining adoption in industry. However, the methods for improving LLMs are still designed by humans, which restricts the invention of new model-improving algorithms to human expertise and imagination. To address this, we propose the Self-Developing framework, which enables LLMs to autonomously generate and learn model-improvement algorithms. In this framework, the seed model generates, applies, and learns model-improving algorithms, continuously improving both the seed model and the algorithms themselves. In mathematical reasoning tasks, Self-Developing not only creates models that surpass the seed model but also consistently outperforms models created using human-designed algorithms. Additionally, these LLM-discovered algorithms demonstrate strong effectiveness, including transferability to out-of-domain models.


Shark PowerDetect 2-in-1 Robot Vacuum and Mop Review (2024)

WIRED

With its latest vacuum (most recently presented at IFA), Shark attempts to solve two major problems. The first is that in many cases, simply lifting the mop pads over the floor often isn't enough to keep the yucky wet mop pad from dragging on your nice clean carpet. That's why the newest Shark robot vacuum has a mop plate that automatically detaches when you're vacuuming. The second, and more interesting problem, is that robot vacuums tend to get stuck on little ledges or rugs in your house. That's why the Shark now has what I have been referring to as a "booty hitch," to hump itself over obstacles in its path.


Design of an End-effector with Application to Avocado Harvesting

Zhou, Jingzong, Song, Xiaoao, Karydis, Konstantinos

arXiv.org Artificial Intelligence

Robot-assisted fruit harvesting has been a critical research direction supporting sustainable crop production. One important determinant of system behavior and efficiency is the end-effector that comes in direct contact with the crop during harvesting and directly affects harvesting success. Harvesting avocados poses unique challenges not addressed by existing end-effectors (namely, they have uneven surfaces and irregular shapes grow on thick peduncles, and have a sturdy calyx attached). The work reported in this paper contributes a new end-effector design suitable for avocado picking. A rigid system design with a two-stage rotational motion is developed, to first grasp the avocado and then detach it from its peduncle. A force analysis is conducted to determine key design parameters. Preliminary experiments demonstrate the efficiency of the developed end-effector to pick and apply a moment to an avocado from a specific viewpoint (as compared to pulling it directly), and in-lab experiments show that the end-effector can grasp and retrieve avocados with a 100% success rate.


UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs

Li, Wenhao, Lin, Mingbao, Zhong, Yunshan, Yan, Shuicheng, Ji, Rongrong

arXiv.org Artificial Intelligence

Managing long texts is challenging for large language models (LLMs) due to limited context window sizes. This study introduces UIO-LLMs, an unbiased incremental optimization approach for memory-enhanced transformers under long-context settings. We initially conceptualize the process as a streamlined encoder-decoder framework where the weights-shared encoder and decoder respectively encapsulate a context segment into memories and leverage these memories to predict outputs of the subsequent segment. Subsequently, by treating our memory-enhanced transformers as fully-connected recurrent neural networks (RNNs), we refine the training process using the Truncated Backpropagation Through Time (TBPTT) algorithm, which incorporates innovative incremental optimization techniques. These techniques not only diminish time complexity but also address the bias in gradient computation through an unbiased optimization process. UIO-LLMs successfully handle long context, such as extending the context window of Llama2-7b-chat from 4K to 100K tokens with minimal 2% additional parameters, while keeping the inference cost nearly linear as context length increases.


Flexible and Efficient Surrogate Gradient Modeling with Forward Gradient Injection

Otte, Sebastian

arXiv.org Artificial Intelligence

Automatic differentiation is a key feature of present deep learning frameworks. Moreover, they typically provide various ways to specify custom gradients within the computation graph, which is of particular importance for defining surrogate gradients in the realms of non-differentiable operations such as the Heaviside function in spiking neural networks (SNNs). PyTorch, for example, allows the custom specification of the backward pass of an operation by overriding its backward method. Other frameworks provide comparable options. While these methods are common practice and usually work well, they also have several disadvantages such as limited flexibility, additional source code overhead, poor usability, or a potentially strong negative impact on the effectiveness of automatic model optimization procedures. In this paper, an alternative way to formulate surrogate gradients is presented, namely, forward gradient injection (FGI). FGI applies a simple but effective combination of basic standard operations to inject an arbitrary gradient shape into the computational graph directly within the forward pass. It is demonstrated that using FGI is straightforward and convenient. Moreover, it is shown that FGI can significantly increase the model performance in comparison to custom backward methods in SNNs when using TorchScript. These results are complemented with a general performance study on recurrent SNNs with TorchScript and torch.compile, revealing the potential for a training speedup of more than 7x and an inference speedup of more than 16x in comparison with pure PyTorch.