Goto

Collaborating Authors

 ong


Squared families: Searching beyond regular probability models

Tsuchida, Russell, Liu, Jiawei, Ong, Cheng Soon, Sejdinovic, Dino

arXiv.org Artificial Intelligence

We introduce squared families, which are families of probability densities obtained by squaring a linear transformation of a statistic. Squared families are singular, however their singularity can easily be handled so that they form regular models. After handling the singularity, squared families possess many convenient properties. Their Fisher information is a conformal transformation of the Hessian metric induced from a Bregman generator. The Bregman generator is the normalising constant, and yields a statistical divergence on the family. The normalising constant admits a helpful parameter-integral factorisation, meaning that only one parameter-independent integral needs to be computed for all normalising constants in the family, unlike in exponential families. Finally, the squared family kernel is the only integral that needs to be computed for the Fisher information, statistical divergence and normalising constant. We then describe how squared families are special in the broader class of $g$-families, which are obtained by applying a sufficiently regular function $g$ to a linear transformation of a statistic. After removing special singularities, positively homogeneous families and exponential families are the only $g$-families for which the Fisher information is a conformal transformation of the Hessian metric, where the generator depends on the parameter only through the normalising constant. Even-order monomial families also admit parameter-integral factorisations, unlike exponential families. We study parameter estimation and density estimation in squared families, in the well-specified and misspecified settings. We use a universal approximation property to show that squared families can learn sufficiently well-behaved target densities at a rate of $\mathcal{O}(N^{-1/2})+C n^{-1/4}$, where $N$ is the number of datapoints, $n$ is the number of parameters, and $C$ is some constant.


exLong: Generating Exceptional Behavior Tests with Large Language Models

Zhang, Jiyang, Liu, Yu, Nie, Pengyu, Li, Junyi Jessy, Gligoric, Milos

arXiv.org Artificial Intelligence

Many popular programming languages, including C#, Java, and Python, support exceptions. Exceptions are thrown during program execution if an unwanted event happens, e.g., a method is invoked with an illegal argument value. Software developers write exceptional behavior tests (EBTs) to check that their code detects unwanted events and throws appropriate exceptions. Prior research studies have shown the importance of EBTs, but those studies also highlighted that developers put most of their efforts on "happy paths", e.g., paths without unwanted events. To help developers fill the gap, we present the first framework, dubbed exLong, that automatically generates EBTs. exLong is a large language model instruction fine-tuned from CodeLlama and embeds reasoning about traces that lead to throw statements, conditional expressions that guard throw statements, and non-exceptional behavior tests that execute similar traces. We compare exLong with the state-of-the-art models for test generation (CAT-LM) and one of the strongest foundation models (GPT-4o), as well as with analysis-based tools for test generation (Randoop and EvoSuite). Our results show that exLong outperforms existing models and tools. Furthermore, we contributed several pull requests to open-source projects and 23 EBTs generated by exLong were already accepted.


The race to find new materials with AI needs more data. Meta is giving massive amounts away for free.

MIT Technology Review

"We're really firm believers that by contributing to the community and building upon open-source data models, the whole community moves further, faster," says Larry Zitnick, the lead researcher for the OMat project. Zitnick says the newOMat24 model will top the Matbench Discovery leaderboard, which ranks the best machine-learning models for materials science. Its data set will also be one of the biggest available. "Materials science is having a machine-learning revolution," says Shyue Ping Ong, a professor of nanoengineering at the University of California, San Diego, who was not involved in the project. Previously, scientists were limited to doing very accurate calculations of material properties on very small systems or doing less accurate calculations on very big systems, says Ong.


LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration

Zhao, Jun, Zu, Can, Xu, Hao, Lu, Yi, He, Wei, Ding, Yiwen, Gui, Tao, Zhang, Qi, Huang, Xuanjing

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated impressive performance in understanding language and executing complex reasoning tasks. However, LLMs with long context windows have been notorious for their expensive training costs and high inference latency. Even the most advanced models such as GPT-4 and Claude2 often make mistakes when processing inputs of over $100k$ tokens, a phenomenon also known as \textit{lost in the middle}. In this paper, we propose \textsc{LongAgent}, a method based on multi-agent collaboration, which scales LLMs (e.g., LLaMA) to a context of 128K and demonstrates potential superiority in long-text processing compared to GPT-4. In \textsc{LongAgent}, a leader is responsible for understanding user intent and directing team members to acquire information from documents. Due to members' hallucinations, it is non-trivial for a leader to obtain accurate information from the responses of dozens to hundreds of members. To address this, we develop an \textit{inter-member communication} mechanism to resolve response conflicts caused by hallucinations through information sharing. Our experimental results indicate that \textsc{LongAgent} offers a promising alternative for long-text processing. The agent team instantiated with LLaMA-7B achieves significant improvements in tasks such as 128k-long text retrieval, multi-hop question answering, compared to GPT-4.


The Anti-Drone Arms Race: Inside the Fight to Protect the World's Skies

TIME - Tech

On the top floor of a squat Singapore industrial estate, wedged between a railway depot and water reclamation plant, is a young security firm that's shooting for the stars. Well, shooting for anything beneath the stars that shouldn't be there, technically speaking. TRD is one of the world's leading purveyors of anti-drone technology--a burgeoning industry worth some $1.1 billion last year and projected to grow to $7.4 billion by 2032. "Anti-drone is the hot topic right now," says TRD CEO Sam Ong, a former officer in the Singapore Armour Corps, where he specialized in tank technology. "Unmanned warfare is taking center stage, especially in the Ukraine war."


Cybersecurity: Staying ahead of cybercriminals

#artificialintelligence

As financial institutions push out more digital products focused on speed and convenience, it creates additional points of vulnerability that fraudsters could exploit online. As a result, financial institutions are also expected to stay agile and deploy the latest technologies to protect their customers. In fact, the Movement Control Order (MCO) period last year presented a case study of what could happen as more financial transactions move online. Globally, a record-high number of scam and phishing sites were detected in 2020, according to Atlas VPN. "Propelled by the pandemic, there has been a significant shift towards digital transactions and real-time payments. This new normal has brought [not only] unprecedented efficiency and convenience but also an increase in payment-related fraud," says Abrar A Anwar, managing director and CEO of Standard Chartered Malaysia.


Engineers use graph networks to accurately predict properties of molecules and crystals

#artificialintelligence

IMAGE: This is a schematic illustration of MEGNet models. Nanoengineers at the University of California San Diego have developed new deep learning models that can accurately predict the properties of molecules and crystals. By enabling almost instantaneous property predictions, these deep learning models provide researchers the means to rapidly scan the nearly-infinite universe of compounds to discover potentially transformative materials for various technological applications, such as high-energy-density Li-ion batteries, warm-white LEDs, and better photovoltaics. To construct their models, a team led by nanoengineering professor Shyue Ping Ong at the UC San Diego Jacobs School of Engineering used a new deep learning framework called graph networks, developed by Google DeepMind, the brains behind AlphaGo and AlphaZero. Graph networks have the potential to expand the capabilities of existing AI technology to perform complicated learning and reasoning tasks with limited experience and knowledge--something that humans are good at.