Goto

Collaborating Authors

 eth zurich




Apertus: a fully open, transparent, multilingual language model

AIHub

In July, EPFL, ETH Zurich, and the Swiss National Supercomputing Centre (CSCS) announced their joint initiative to build a large language model (LLM) . Now, this model is available and serves as a building block for developers and organisations for future applications such as chatbots, translation systems, or educational tools. The model is named Apertus - Latin for "open" - highlighting its distinctive feature: the entire development process, including its architecture, model weights, and training data and recipes, is openly accessible and fully documented. AI researchers, professionals, and experienced enthusiasts can either access the model through the strategic partner Swisscom or download it from Hugging Face - a platform for AI models and applications - and deploy it for their own projects. Apertus is freely available in two sizes - featuring 8 billion and 70 billion parameters, the smaller model being more appropriate for individual usage.


Open-source Swiss language model to be released this summer

AIHub

This summer, EPFL and ETH Zurich will release a large language model (LLM) developed on public infrastructure. Trained on the "Alps" supercomputer at the Swiss National Supercomputing Centre (CSCS), the new LLM marks a milestone in open-source AI and multilingual excellence. Earlier this month in Geneva, around 50 leading global initiatives and organisations dedicated to open-source LLMs and trustworthy AI convened at the International Open-Source LLM Builders Summit. Hosted by the AI centres of EPFL and ETH Zurich, the event marked a significant step in building a vibrant and collaborative international ecosystem for open foundation models. Open LLMs are increasingly viewed as credible alternatives to commercial systems, most of which are developed behind closed doors in the United States or China.


Evolving HPC services to enable ML workloads on HPE Cray EX

Schuppli, Stefano, Mohamed, Fawzi, Mendonça, Henrique, Mujkanovic, Nina, Palme, Elia, Conciatore, Dino, Drescher, Lukas, Gila, Miguel, Witlox, Pim, VandeVondele, Joost, Martinasso, Maxime, Schulthess, Thomas C., Hoefler, Torsten

arXiv.org Artificial Intelligence

The Alps Research Infrastructure leverages GH200 technology at scale, featuring 10,752 GPUs. Accessing Alps provides a significant computational advantage for researchers in Artificial Intelligence (AI) and Machine Learning (ML). While Alps serves a broad range of scientific communities, traditional HPC services alone are not sufficient to meet the dynamic needs of the ML community. This paper presents an initial investigation into extending HPC service capabilities to better support ML workloads. We identify key challenges and gaps we have observed since the early-access phase (2023) of Alps by the Swiss AI community and propose several technological enhancements. These include a user environment designed to facilitate the adoption of HPC for ML workloads, balancing performance with flexibility; a utility for rapid performance screening of ML applications during development; observability capabilities and data products for inspecting ongoing large-scale ML workloads; a utility to simplify the vetting of allocated nodes for compute readiness; a service plane infrastructure to deploy various types of workloads, including support and inference services; and a storage infrastructure tailored to the specific needs of ML workloads. These enhancements aim to facilitate the execution of ML workloads on HPC systems, increase system usability and resilience, and better align with the needs of the ML community. We also discuss our current approach to security aspects. This paper concludes by placing these proposals in the broader context of changes in the communities served by HPC infrastructure like ours.


Quadruped robot plays badminton with you using AI

FOX News

ANYmal-D combines robotics, artificial intelligence and sports, showing how advanced robots can take on dynamic, fast-paced games. At ETH Zurich's Robotic Systems Lab, engineers have created ANYmal-D, a four-legged robot that can play badminton with people. This project brings together robotics, artificial intelligence and sports, showing how advanced robots can take on dynamic, fast-paced games. ANYmal-D's design and abilities are opening up new possibilities for human-robot collaboration in sports and beyond. Sign up for my FREE CyberGuy Report Get my best tech tips, urgent security alerts, and exclusive deals delivered straight to your inbox.


World's tallest 3D-printed building is unveiled in Switzerland: Futuristic tower stands at almost 100ft tall - so, would you be brave enough to scale it?

Daily Mail - Science & tech

Among the charming centuries-old cottages, an elaborate white tower in Switzerland stands out like a sore thumb. To put that into perspective, that's more than six times the size of a double-decker bus! Known as Tor Alva (the'White Tower'), the gleaming white construction in the small village of Mulegns offers a new tourist attraction and cultural hub. Tor Alva is intended to emulate a layered cake – a tribute to the history of confectioners in the region – and also takes inspiration from filigree, an intricate metalwork technique used in making jewellery. Giovanni Netzer, founder of the Origen Cultural Foundation, which designed and built the tower with ETH Zurich, called it'a technical triumph'. 'It inspires the building sector, encourages sustainable tourism and offers new cultural space,' Mr Netzer said.


Cycles and collusion in congestion games under Q-learning

Carissimo, Cesare, Nagler, Jan, Nax, Heinrich

arXiv.org Artificial Intelligence

We investigate the dynamics of Q-learning in a class of generalized Braess paradox games. These games represent an important class of network routing games where the associated stage-game Nash equilibria do not constitute social optima. We provide a full convergence analysis of Q-learning with varying parameters and learning rates. A wide range of phenomena emerges, broadly either settling into Nash or cycling continuously in ways reminiscent of "Edgeworth cycles" (i.e. jumping suddenly from Nash toward social optimum and then deteriorating gradually back to Nash). Our results reveal an important incentive incompatibility when thinking in terms of a meta-game being played by the designers of the individual Q-learners who set their agents' parameters. Indeed, Nash equilibria of the meta-game are characterized by heterogeneous parameters, and resulting outcomes achieve little to no cooperation beyond Nash. In conclusion, we suggest a novel perspective for thinking about regulation and collusion, and discuss the implications of our results for Bertrand oligopoly pricing games.


Allocation for Omnidirectional Aerial Robots: Incorporating Power Dynamics

Cuniato, Eugenio, Allenspach, Mike, Stastny, Thomas, Oleynikova, Helen, Siegwart, Roland, Pantic, Michael

arXiv.org Artificial Intelligence

Tilt-rotor aerial robots are more dynamic and versatile than their fixed-rotor counterparts, since the thrust vector and body orientation are decoupled. However, the coordination of servomotors and propellers (the allocation problem) is not trivial, especially accounting for overactuation and actuator dynamics. We present and compare different methods of actuator allocation for tilt-rotor platforms, evaluating them on a real aerial robot performing dynamic trajectories. We extend the state-of-the-art geometric allocation into a differential allocation, which uses the platform's redundancy and does not suffer from singularities typical of the geometric solution. We expand it by incorporating actuator dynamics and introducing propeller limit curves. These improve the modeling of propeller limits, automatically balancing their usage and allowing the platform to selectively activate and deactivate propellers during flight. We show that actuator dynamics and limits make the tuning of the allocation not only easier, but also allow it to track more dynamic oscillating trajectories with angular velocities up to 4 rad/s, compared to 2.8 rad/s of geometric methods.


PokeFlex: Towards a Real-World Dataset of Deformable Objects for Robotic Manipulation

Obrist, Jan, Zamora, Miguel, Zheng, Hehui, Zarate, Juan, Katzschmann, Robert K., Coros, Stelian

arXiv.org Artificial Intelligence

Advancing robotic manipulation of deformable objects can enable automation of repetitive tasks across multiple industries, from food processing to textiles and healthcare. Yet robots struggle with the high dimensionality of deformable objects and their complex dynamics. While data-driven methods have shown potential for solving manipulation tasks, their application in the domain of deformable objects has been constrained by the lack of data. To address this, we propose PokeFlex, a pilot dataset featuring real-world 3D mesh data of actively deformed objects, together with the corresponding forces and torques applied by a robotic arm, using a simple poking strategy. Deformations are captured with a professional volumetric capture system that allows for complete 360-degree reconstruction. The PokeFlex dataset consists of five deformable objects with varying stiffness and shapes. Additionally, we leverage the PokeFlex dataset to train a vision model for online 3D mesh reconstruction from a single image and a template mesh. We refer readers to the supplementary material and to our website ( https://pokeflex-dataset.github.io/ ) for demos and examples of our dataset.