Goto

Collaborating Authors

 outpost


Kunlun Anomaly Troubleshooter: Enabling Kernel-Level Anomaly Detection and Causal Reasoning for Large Model Distributed Inference

Liu, Yuyang, Cai, Jingjing, Ren, Jiayi, Zhou, Peng, Zhang, Danyang, Du, Yin, Li, Shijian

arXiv.org Artificial Intelligence

Anomaly troubleshooting for large model distributed inference (LMDI) remains a critical challenge. Resolving anomalies such as inference performance degradation or latency jitter in distributed system demands significant manual efforts from domain experts, resulting in extremely time-consuming diagnosis processes with relatively low accuracy. In this paper, we introduce Kunlun Anomaly Troubleshooter (KAT), the first anomaly troubleshooting framework tailored for LMDI. KAT addresses this problem through two core innovations. First, KAT exploits the synchronicity and consistency of GPU workers, innovatively leverages function trace data to precisely detect kernel-level anomalies and associated hardware components at nanosecond resolution. Second, KAT integrates these detection results into a domain-adapted LLM, delivering systematic causal reasoning and natural language interpretation of complex anomaly symptoms. Evaluations conducted in Alibaba Cloud Service production environment indicate that KAT achieves over 0.884 precision and 0.936 recall in anomaly detection, providing detail anomaly insights that significantly narrow down the diagnostic scope and improve both the efficiency and success rate of troubleshooting.


OpenAI Poaches 3 Top Engineers From DeepMind

WIRED

OpenAI announced today it has hired three senior computer vision and machine learning engineers from rival Google DeepMind, all of whom will work in a newly opened OpenAI office in Zurich, Switzerland. OpenAI executives told staff in an internal memo on Tuesday that Lucas Beyer, Alexander Kolesnikov, and Xiaohua Zhai will be joining the company to work on multimodal AI, artificial intelligence models capable of performing tasks in different mediums ranging from images to audio. OpenAI has long been at the forefront of multimodal AI and released the first version of its text-to-image platform Dall-E in 2021. Its flagship chatbot ChatGPT, however, was initially only capable of interacting with text inputs. The company later added voice and image features as multimodal functionality became an increasingly important part of its product line and AI research.


What is Tower 22, the Jordan-based US outpost targeted in a drone strike?

Al Jazeera

The United States military announced on Sunday that three US soldiers were killed and at least 34 were wounded in a drone attack targeting Tower 22, a remote logistics outpost near the Jordan-Syrian border. The attack has elicited a strong reaction from Washington with President Joe Biden pledging to hold the attackers to account. The Islamic Resistance in Iraq, an umbrella group of Iran-backed armed groups in the region, claimed the attacks, saying it was in response to US support to Israel's war on Gaza, which has killed more than 26,000 people. Tower 22, which houses a small US logistics outpost, is located in Jordan's northeast close to the borders with Iraq and Syria. Public information about the outpost is limited.


The Download: ChatGPT gets even chattier, and recreating space on Earth

MIT Technology Review

The news: OpenAI has launched two new ways to interact with its flagship large language model in a major update. You can have a spoken conversation with the chatbot as if you were making a call, and it's also able to answer questions about images. How it works: The ability to talk to ChatGPT draws on two separate models. Whisper, OpenAI's existing speech-to-text model, converts what you say into text, which is then fed to the chatbot. And a new text-to-speech model converts ChatGPT's responses into spoken words.


'Kamikaze' drones attack US, coalition forces at Syria outpost; no Americans injured

FOX News

Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. Three one-way drones, sometimes called "kamikaze" drones, targeted a U.S. garrison at an outpost in Syria's Al-Tanf region U.S. Central Command said Friday, noting that no Americans were injured in the attack. Two members of the Syrian Free Army received medical attention after they were injured in the strike when one of the drones hit the compound. The other two drones were shot down by Coalition Forces, the U.S. military confirmed.


New company run by former NASA leader aims to build robotic outpost near the Moon

#artificialintelligence

A new startup run by a former acting NASA administrator hopes to capitalize on the recent zeal for lunar space exploration by building robotic outposts and spacecraft to send to space near the Moon. Their goal is to create a fleet of robotic helpers that can do a variety of tasks near the Moon, such as providing internet capabilities, collecting data, refueling spacecraft, and assembling structures in lunar space. The company called Quantum Space was formed in 2021. At the helm is Steve Jurczyk, who served as NASA's associate administrator beginning in 2018, before becoming the agency's acting administrator when President Biden was inaugurated. After retiring in May, Jurczyk decided to team up with three additional entrepreneurs and experts in the space industry to create this new company based out of Maryland.


Machine Learning at the Edge with AWS Outposts and Amazon SageMaker

#artificialintelligence

As customers continue to come up with new use-cases for machine learning, data gravity is as important as ever. Where latency and network connectivity is not an issue, generating data in one location (such as a manufacturing facility) and sending it to the cloud for inference is acceptable for some use-cases. With other critical use-cases, such as fraud detection for financial transactions, product quality in manufacturing, or analyzing video surveillance in real-time, customers are faced with the challenges that come with having to move that data to the cloud first. One of the challenges customers are facing with performing inference in the cloud is the lack of real-time inference and/or security requirements preventing user data to be sent or stored in the cloud. Tens of thousands of customers use Amazon SageMaker to accelerate their Machine Learning (ML) journey by helping data scientists and developers to prepare, build, train, and deploy machine learning models quickly.


NASA's Lunar Gateway will feature Canadian Space Agency robotics

Engadget

The Lunar Gateway, NASA's outpost that will orbit the moon as part of its upcoming Artemis program, will be equipped with external robotics from the Canadian Space Agency (CSA), NASA announced today. The culmination of an earlier partnership around Artemis, NASA's first major program to bring astronauts to the moon in half a century, CSA plans to build a "next-generation" robotic arm, the aptly named Canadarm3. That device will be able to reach many parts of the Gateway's exterior, giving astronauts an easy way to make repairs. Additionally, NASA says CSA will create robotic interfaces for Gateway modules, which will help with the installation of the outpost's first two scientific instruments. CSA aims to deliver the Candarm3 to the Gateway in 2026 via a commercial logistics supply flight.


The Plan to Turn Scrapped Rockets Into Space Stations

#artificialintelligence

In early October, a dead Soviet satellite and the abandoned upper stage of a Chinese rocket narrowly avoided a collision in low Earth orbit. If the objects had crashed, the impact would have blown them to bits and created thousands of new pieces of dangerous space debris. Only a few days prior, the European Space Agency had published its annual space environment report, which highlighted abandoned rocket bodies as one of the biggest threats to spacecraft. The best way to mitigate this risk is for launch providers to deorbit their rockets after they've delivered their payload. But if you ask Jeffrey Manber, that's a waste of a perfectly good giant metal tube.


Canada to contribute to NASA mission to put Gateway orbiter around moon

The Japan Times

OTTAWA - Canada will join NASA's space mission to put an orbiter around the moon in a few years, Prime Minister Justin Trudeau announced Thursday. "Canada is going to the moon," Trudeau told a press conference that included a live video link from the International Space Station with Canadian astronaut David Saint-Jacques. NASA plans to build a small space station, dubbed Gateway, in the moon's orbit by 2026. It will serve as a way-station for trips to and from the lunar surface, but will not be permanently crewed like the International Space Station (ISS), currently in Earth's orbit. According to the Canadian Space Agency, Gateway will provide living space for astronauts, a docking station for visiting spacecraft and research laboratories.