AITopics

The vector notation adopted by GNU Octave plays a significant role as a tool for introspection, aligning itself with the vision of Kenneth E. Iverson. He believed that, just like mathematics, a programming language should be an effective thinking tool for representing and reasoning about problems we wish to address. This work aims to explore the use of vector notation in GNU Octave through the analysis of operators and functions, providing a closer alignment with mathematical notation and enhancing code efficiency. We will delve into fundamental concepts such as indexing, broadcasting, and function handles, and present case studies for a deeper understanding of these concepts. By adopting vector notation, GNU Octave becomes a powerful tool for mathematicians, scientists and engineers, enabling them to express and solve complex problems more effectively and intuitively.

matrix, notation, octave, (16 more...)

2410.19549

Country:

South America > Brazil (0.04)
North America > United States > Washington > King County > Redmond (0.04)
North America > United States > New York > Nassau County > Mineola (0.04)
(4 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)

Unbounded: A Generative Infinite Game of Character Life Simulation

Li, Jialu, Li, Yuanzhen, Wadhwa, Neal, Pritch, Yael, Jacobs, David E., Rubinstein, Michael, Bansal, Mohit, Ruiz, Nataniel

We introduce the concept of a generative infinite game, a video game that transcends the traditional boundaries of finite, hard-coded systems by using generative models. Inspired by James P. Carse's distinction between finite and infinite games, we leverage recent advances in generative AI to create Unbounded: a game of character life simulation that is fully encapsulated in generative models. Specifically, Unbounded draws inspiration from sandbox life simulations and allows you to interact with your autonomous virtual character in a virtual world by feeding, playing with and guiding it - with open-ended mechanics generated by an LLM, some of which can be emergent. In order to develop Unbounded, we propose technical innovations in both the LLM and visual generation domains. Specifically, we present: (1) a specialized, distilled large language model (LLM) that dynamically generates game mechanics, narratives, and character interactions in real-time, and (2) a new dynamic regional image prompt Adapter (IP-Adapter) for vision models that ensures consistent yet flexible visual generation of a character across multiple environments. We evaluate our system through both qualitative and quantitative analysis, showing significant improvements in character life simulation, user instruction following, narrative coherence, and visual consistency for both characters and the environments compared to traditional related approaches.

arxiv preprint arxiv, consistency, proceedings, (14 more...)

2410.18975

Country:

Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > North Carolina (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

VISAGE: Video Synthesis using Action Graphs for Surgery

Yeganeh, Yousef, Lazuardi, Rachmadio, Shamseddin, Amir, Dari, Emine, Thirani, Yash, Navab, Nassir, Farshad, Azade

Surgical data science (SDS) is a field that analyzes patient data before, during, and after surgery to improve surgical outcomes and skills. However, surgical data is scarce, heterogeneous, and complex, which limits the applicability of existing machine learning methods. In this work, we introduce the novel task of future video generation in laparoscopic surgery. This task can augment and enrich the existing surgical data and enable various applications, such as simulation, analysis, and robot-aided surgery. Ultimately, it involves not only understanding the current state of the operation but also accurately predicting the dynamic and often unpredictable nature of surgical procedures. Our proposed method, VISAGE (VIdeo Synthesis using Action Graphs for Surgery), leverages the power of action scene graphs to capture the sequential nature of laparoscopic procedures and utilizes diffusion models to synthesize temporally coherent video sequences. VISAGE predicts the future frames given only a single initial frame, and the action graph triplets. By incorporating domain-specific knowledge through the action graph, VISAGE ensures the generated videos adhere to the expected visual and motion patterns observed in real laparoscopic procedures. The results of our experiments demonstrate high-fidelity video generation for laparoscopy procedures, which enables various applications in SDS.

diffusion model, video, video generation, (12 more...)

2410.17751

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
South America > Peru > Lima Department > Lima Province > Lima (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Are High-Degree Representations Really Unnecessary in Equivariant Graph Neural Networks?

Cen, Jiacheng, Li, Anyi, Lin, Ning, Ren, Yuxiang, Wang, Zihe, Huang, Wenbing

Equivariant Graph Neural Networks (GNNs) that incorporate E(3) symmetry have achieved significant success in various scientific applications. As one of the most successful models, EGNN leverages a simple scalarization technique to perform equivariant message passing over only Cartesian vectors (i.e., 1st-degree steerable vectors), enjoying greater efficiency and efficacy compared to equivariant GNNs using higher-degree steerable vectors. This success suggests that higher-degree representations might be unnecessary. In this paper, we disprove this hypothesis by exploring the expressivity of equivariant GNNs on symmetric structures, including $k$-fold rotations and regular polyhedra. We theoretically demonstrate that equivariant GNNs will always degenerate to a zero function if the degree of the output representations is fixed to 1 or other specific values. Based on this theoretical insight, we propose HEGNN, a high-degree version of EGNN to increase the expressivity by incorporating high-degree steerable vectors while maintaining EGNN's efficiency through the scalarization trick. Our extensive experiments demonstrate that HEGNN not only aligns with our theoretical analyses on toy datasets consisting of symmetric structures, but also shows substantial improvements on more complicated datasets such as $N$-body and MD17. Our theoretical findings and empirical results potentially open up new possibilities for the research of equivariant GNNs.

graph, international conference, representation, (13 more...)

2410.11443

Country:

Asia > China > Guangxi Province > Nanning (0.04)
Asia > China > Beijing > Beijing (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Poli, Maxime, Chemla, Emmanuel, Dupoux, Emmanuel

Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach

Recent progress in Spoken Language Modeling has shown that learning language directly from speech is feasible. Generating speech through a pipeline that operates at the text level typically loses nuances, intonations, and non-verbal vocalizations. Modeling directly from speech opens up the path to more natural and expressive systems. On the other hand, speech-only systems require up to three orders of magnitude more data to catch up to their text-based counterparts in terms of their semantic abilities. We show that fine-tuning speech representation models on phoneme classification leads to more context-invariant representations, and language models trained on these units achieve comparable lexical comprehension to ones trained on hundred times more data.

computational linguistic, language modeling, representation, (10 more...)

2410.00025

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Singapore (0.04)
Asia > Georgia > Abkhazia (0.04)
(5 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning

Wang, Taowen, Liu, Yiyang, Liang, James Chenhao, zhao, junhan, Cui, Yiming, Mao, Yuning, Nie, Shaoliang, Liu, Jiahao, Feng, Fuli, Xu, Zenglin, Han, Cheng, Huang, Lifu, Wang, Qifan, Liu, Dongfang

Multimodal Large Language Models (MLLMs) demonstrate remarkable performance across a wide range of domains, with increasing emphasis on enhancing their zero-shot generalization capabilities for unseen tasks across various modalities. Instruction tuning has emerged as an effective strategy for achieving zero-shot generalization by finetuning pretrained models on diverse multimodal tasks. As the scale of MLLMs continues to grow, parameter-efficient finetuning becomes increasingly critical. However, most existing parameter-efficient approaches focus only on single modalities and often overlook the multimodal characteristics during finetuning. In this work, we introduce a novel Multimodal Prompt Tuning (M$^2$PT) approach for efficient instruction tuning of MLLMs. M$^2$PT effectively integrates visual and textual prompts into the vision encoder and language processor respectively during finetuning, facilitating the extraction and alignment of features across modalities. Empirical results on various multimodal evaluation datasets demonstrate the superior performance of our approach compared to several state-of-the-art baselines. A comprehensive set of ablation studies validates the effectiveness of our prompt design and the efficiency of our approach.

computational linguistic, groundtruth, wang, (15 more...)

2409.15657

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
North America > Canada > Ontario > Toronto (0.04)
(22 more...)

Genre: Research Report (1.00)

Industry: Government (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Mitra, Ayan, Gómez-Vargas, Isidro, Zarikas, Vasilios

Dark energy reconstruction analysis with artificial neural networks: Application on simulated Supernova Ia data from Rubin Observatory

In this paper, we present an analysis of Supernova Ia (SNIa) distance moduli $\mu(z)$ and dark energy using an Artificial Neural Network (ANN) reconstruction based on LSST simulated three-year SNIa data. The ANNs employed in this study utilize genetic algorithms for hyperparameter tuning and Monte Carlo Dropout for predictions. Our ANN reconstruction architecture is capable of modeling both the distance moduli and their associated statistical errors given redshift values. We compare the performance of the ANN-based reconstruction with two theoretical dark energy models: $\Lambda$CDM and Chevallier-Linder-Polarski (CPL). Bayesian analysis is conducted for these theoretical models using the LSST simulations and compared with observations from Pantheon and Pantheon+ SNIa real data. We demonstrate that our model-independent ANN reconstruction is consistent with both theoretical models. Performance metrics and statistical tests reveal that the ANN produces distance modulus estimates that align well with the LSST dataset and exhibit only minor discrepancies with $\Lambda$CDM and CPL.

astrophy, neural network, reconstruction, (12 more...)

doi: 10.1016/j.dark.2024.101706

2402.18124

Country:

North America > United States > Illinois > Champaign County > Urbana (0.14)
South America > Chile (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(6 more...)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

MIT Technology ReviewOct-29-2024, 13:00:00 GMT

Cultivating the next generation of AI innovators in a global tech hub

Today, the rewards of AI are mostly enjoyed by a few countries in what the Oxford Internet Institute dubs the "Compute North." These countries, such as the US, the U.K., France, Canada, and China, have dominated research and development, and built state of the art AI infrastructure capable of training foundational models. This should come as no surprise, as these countries are home to many of the world's top universities and large tech corporations. But this concentration of innovation comes at a cost for the billions of people who live outside these dominant countries and have different cultural backgrounds. Large language models (LLMs) are illustrative of this disparity.

ai innovator, cultivating, global tech hub, (9 more...)

MIT Technology Review

Country:

North America > Canada (0.26)
Europe > France (0.26)
Asia > China (0.26)
(3 more...)

Industry: Information Technology (0.54)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.80)

AIHubOct-29-2024, 10:07:00 GMT

AIhub monthly digest: October 2024 – Nobel Prizes, the AI Song Contest, and towards safe and reliable AI agents

Welcome to our monthly digest, where you can catch up with any AIhub stories you may have missed, peruse the latest news, recap recent events, and more. This month, we learn about research towards safe and reliable AI agent behaviour, discuss generative AI hype, congratulate the Nobel Prize winners in physics and chemistry, and take a tour of recent conferences. In the latest in our series of interviews featuring the AAAI/ACM SIGAI doctoral consortium participants, we heard from Pulkit Verma about his research on safe and reliable behavior of AI agents. He is currently investigating the minimal set of requirements in an AI system that would enable a user to assess and understand the limits of its safe operability. There has been a string of articles recently about the end of generative AI hype.

ai song contest, monthly digest, nobel prize, (6 more...)

AIHub

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.16)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
North America > United States > California > Santa Clara County > San Jose (0.05)
(2 more...)

Genre: Personal > Honors > Award (0.36)

Industry:

Media > Music (0.41)
Leisure & Entertainment (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)

arXiv.org Artificial IntelligenceOct-29-2024

SoccerGuard: Investigating Injury Risk Factors for Professional Soccer Players with Machine Learning

Bartels, Finn, Xing, Lu, Midoglu, Cise, Boeker, Matthias, Kirsten, Toralf, Halvorsen, Pål

We present SoccerGuard, a novel framework for predicting injuries in women's soccer using Machine Learning (ML). This framework can ingest data from multiple sources, including subjective wellness and training load reports from players, objective GPS sensor measurements, third-party player statistics, and injury reports verified by medical personnel. We experiment with a number of different settings related to synthetic data generation, input and output window sizes, and ML models for prediction. Our results show that, given the right configurations and feature combinations, injury event prediction can be undertaken with considerable accuracy. The optimal results are achieved when input windows are reduced and larger combined output windows are defined, in combination with an ideally balanced data set. The framework also includes a dashboard with a user-friendly Graphical User Interface (GUI) to support interactive analysis and visualization.

artificial intelligence, injury, machine learning, (17 more...)

2411.08901

Country:

North America (0.14)
Europe > Norway > Eastern Norway > Oslo (0.05)
Europe > Germany > Saxony > Leipzig (0.04)
(2 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)