Goto

Collaborating Authors

 style




Appendix AImplementationdetails

Neural Information Processing Systems

The encoder contains three linear layers with output size [d,k,k], each but the last layer is followed by batch normalization, witheps = 0.00005 and momentum=0.1,andtheReLU Thedecoder contains threelinear layers withoutput size [k,k,d] where each but the last layer contains a Batch normlization and the ReLu activation similar asabove. Following the standard linear evaluation procedure inself-supervised learning works (32;34),we used an one linear layer network as the linear decoder for the decoding accuracy. We used the neural activity dataset that is collected from two rhesus macaque monkeys (Chewie and Mihi). They were trained to move the computer cursor to reach a target on a screen.


Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style

Neural Information Processing Systems

Self-supervised representation learning has shown remarkable success in a number of domains. A common practice is to perform data augmentation via hand-crafted transformations intended to leave the semantics of the data invariant. We seek to understand the empirical success of this approach from a theoretical perspective. We formulate the augmentation process as a latent variable model by postulating a partition of the latent representation into a content component, which is assumed invariant to augmentation, and a style component, which is allowed to change. Unlike prior work on disentanglement and independent component analysis, we allow for both nontrivial statistical and causal dependencies in the latent space. We study the identifiability of the latent representation based on pairs of views of the observations and prove sufficient conditions that allow us to identify the invariant content partition up to an invertible mapping in both generative and discriminative settings. We find numerical simulations with dependent latent variables are consistent with our theory. Lastly, we introduce Causal3DIdent, a dataset of high-dimensional, visually complex images with rich causal dependencies, which we use to study the effect of data augmentations performed in practice.




Can Large Language Models Understand Symbolic Graphics Programs?

Qiu, Zeju, Liu, Weiyang, Feng, Haiwen, Liu, Zhen, Xiao, Tim Z., Collins, Katherine M., Tenenbaum, Joshua B., Weller, Adrian, Black, Michael J., Schölkopf, Bernhard

arXiv.org Artificial Intelligence

Assessing the capabilities of large language models (LLMs) is often challenging, in part, because it is hard to find tasks to which they have not been exposed during training. We take one step to address this challenge by turning to a new task: focusing on symbolic graphics programs, which are a popular representation for graphics content that procedurally generates visual data. LLMs have shown exciting promise towards program synthesis, but do they understand symbolic graphics programs? Unlike conventional programs, symbolic graphics programs can be translated to graphics content. Here, we characterize an LLM's understanding of symbolic programs in terms of their ability to answer questions related to the graphics content. This task is challenging as the questions are difficult to answer from the symbolic programs alone -- yet, they would be easy to answer from the corresponding graphics content as we verify through a human experiment. To understand symbolic programs, LLMs may need to possess the ability to imagine how the corresponding graphics content would look without directly accessing the rendered visual content. We use this task to evaluate LLMs by creating a large benchmark for the semantic understanding of symbolic graphics programs. This benchmark is built via program-graphics correspondence, hence requiring minimal human efforts. We evaluate current LLMs on our benchmark to elucidate a preliminary assessment of their ability to reason about visual scenes from programs. We find that this task distinguishes existing LLMs and models considered good at reasoning perform better. Lastly, we introduce Symbolic Instruction Tuning (SIT) to improve this ability. Specifically, we query GPT4-o with questions and images generated by symbolic programs. Such data are then used to finetune an LLM. We also find that SIT data can improve the general instruction following ability of LLMs.


AI Chatbot Writes 'In the Style of Nick Cave,' and Nick Cave is Heated – Rolling Stone

#artificialintelligence

Nick Cave, the Bad Seeds frontman whose songs are tinged with a healthy dose of death, forlorn love, and religion, is no fan of ChatGPT's lyrical ambitions. The popular AI bot has drawn both praise and concern for its ability to generate conversational and nuanced text responses in simple, clean sentences. Since its release in November by the artificial intelligence lab OpenAI, ChatGPT has written everything from sitcom scripts to literature essays to, now, rather convincing rock songs. This has left people worried about the ramifications for industries across the creative spectrum, and one of those people is Cave himself. In his latest The Red Hand Files newsletter, Cave took on the subject of AI generated music.


83

AI Magazine

Births are always interesting affairs. According to some, births are always traumatic-a shock to come from the womb to the world. The birth we give witness to here is that of a new society, the American Association for Artificial Intelligence-AAAI. It has not seemed to me traumatic, but rather almost wholly benign. In a world where not much is benign at the moment, such an event is devoutly to be cherished.


Reports on the 2004 AAAI Fall Symposia

AI Magazine

Several successful researchers discussed what they believed made them successful, and provided advice on how to play the funding game. The American Association for Artificial Intelligence presented its 2004 Fall Symposium Series Friday through Sunday, October 22-24 at the Hyatt Regency Crystal City in Arlington, Virginia, adjacent to Washington, DC. The symposium series was preceded by a one-day AI funding seminar. The topics of the eight symposia in the 2004 Fall Symposia Series were: (1) Achieving Human-Level Intelligence through Integrated Systems and Research; (2) Artificial Multiagent Learning; (3) Compositional Connectionism in Cognitive Science; (4) Dialogue Systems for Health Communications; (5) The Intersection of Cognitive Science and Robotics: From Interfaces to Intelligence; (6) Making Pen-Based Interaction Intelligent and Natural; (7) Real-Life Reinforcement Learning; and (8) Style and Meaning in Language, Art, Music, and Design. The symposium series was preceded on Thursday, October 21 by a one-day AI funding seminar, which was open to all registered attendees.