top prediction
Understanding Hidden Computations in Chain-of-Thought Reasoning
Chain-of-Thought (CoT) prompting has significantly enhanced the reasoning abilities of large language models. However, recent studies have shown that models can still perform complex reasoning tasks even when the CoT is replaced with filler(hidden) characters (e.g., "..."), leaving open questions about how models internally process and represent reasoning steps. In this paper, we investigate methods to decode these hidden characters in transformer models trained with filler CoT sequences. By analyzing layer-wise representations using the logit lens method and examining token rankings, we demonstrate that the hidden characters can be recovered without loss of performance. Our findings provide insights into the internal mechanisms of transformer models and open avenues for improving interpretability and transparency in language model reasoning.
- Information Technology > Artificial Intelligence > Machine Learning (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.52)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)
Characterizing stable regions in the residual stream of LLMs
Janiak, Jett, Karwowski, Jacek, Mangat, Chatrik Singh, Giglemiani, Giorgi, Petrova, Nora, Heimersheim, Stefan
We identify stable regions in the residual stream of Transformers, where the model's output remains insensitive to small activation changes, but exhibits high sensitivity at region boundaries. These regions emerge during training and become more defined as training progresses or model size increases. The regions appear to be much larger than previously studied polytopes. Our analysis suggests that these stable regions align with semantic distinctions, where similar prompts cluster within regions, and activations from the same region lead to similar next token predictions. This work provides a promising research direction for understanding the complexity of neural networks, shedding light on training dynamics, and advancing interpretability.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Middle East > Israel (0.04)
SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout
Banerjee, Ayan, Mathur, Nityanand, Lladós, Josep, Pal, Umapada, Dutta, Anjan
Generating VectorArt from text prompts is a challenging vision task, requiring diverse yet realistic depictions of the seen as well as unseen entities. However, existing research has been mostly limited to the generation of single objects, rather than comprehensive scenes comprising multiple elements. In response, this work introduces SVGCraft, a novel end-to-end framework for the creation of vector graphics depicting entire scenes from textual descriptions. Utilizing a pre-trained LLM for layout generation from text prompts, this framework introduces a technique for producing masked latents in specified bounding boxes for accurate object placement. It introduces a fusion mechanism for integrating attention maps and employs a diffusion U-Net for coherent composition, speeding up the drawing process. The resulting SVG is optimized using a pre-trained encoder and LPIPS loss with opacity modulation to maximize similarity. Additionally, this work explores the potential of primitive shapes in facilitating canvas completion in constrained environments. Through both qualitative and quantitative assessments, SVGCraft is demonstrated to surpass prior works in abstraction, recognizability, and detail, as evidenced by its performance metrics (CLIP-T: 0.4563, Cosine Similarity: 0.6342, Confusion: 0.66, Aesthetic: 6.7832). The code will be available at github.com/SVGCraft.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)
- North America > United States > New York (0.04)
- (4 more...)
DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models
Xing, Ximing, Wang, Chuang, Zhou, Haitao, Zhang, Jing, Yu, Qian, Xu, Dong
Even though trained mainly on images, we discover that pretrained diffusion models show impressive power in guiding sketch synthesis. In this paper, we present DiffSketcher, an innovative algorithm that creates \textit{vectorized} free-hand sketches using natural language input. DiffSketcher is developed based on a pre-trained text-to-image diffusion model. It performs the task by directly optimizing a set of B\'ezier curves with an extended version of the score distillation sampling (SDS) loss, which allows us to use a raster-level diffusion model as a prior for optimizing a parametric vectorized sketch generator. Furthermore, we explore attention maps embedded in the diffusion model for effective stroke initialization to speed up the generation process. The generated sketches demonstrate multiple levels of abstraction while maintaining recognizability, underlying structure, and essential visual details of the subject drawn. Our experiments show that DiffSketcher achieves greater quality than prior work. The code and demo of DiffSketcher can be found at https://ximinng.github.io/DiffSketcher-project/.
- Asia > China > Hong Kong (0.04)
- Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)
- North America > United States > Rocky Mountains (0.04)
- (2 more...)
A Multilingual Perspective Towards the Evaluation of Attribution Methods in Natural Language Inference
Zaman, Kerem, Belinkov, Yonatan
Most evaluations of attribution methods focus on the English language. In this work, we present a multilingual approach for evaluating attribution methods for the Natural Language Inference (NLI) task in terms of faithfulness and plausibility. First, we introduce a novel cross-lingual strategy to measure faithfulness based on word alignments, which eliminates the drawbacks of erasure-based evaluations.We then perform a comprehensive evaluation of attribution methods, considering different output mechanisms and aggregation methods. Finally, we augment the XNLI dataset with highlight-based explanations, providing a multilingual NLI dataset with highlights, to support future exNLP studies. Our results show that attribution methods performing best for plausibility and faithfulness are different.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > Israel (0.04)
- North America > Dominican Republic (0.04)
- (7 more...)
Increasing Textual Context Size Boosts Medical Image-Text Matching
Pretrained image-text matching models, such as OpenAI's CLIP [1], use natural language processing (NLP) approaches to find semantic relations between images and textual descriptions. This emerging technology has seen rapid adoption in the general domain, and increasing interest in the medical domain [2, 3] where medical imaging data often includes images paired with textual descriptions. For example, MIMIC-CXR[4] is a dataset that consists of chest radiographs along with free-text radiology reports. This dataset paved the way for works like BioViL [2] which used the images and the captions provided in the dataset to train an image-text matching model for chest X-Rays and chest related diseases. ROCO [5] is a dataset containing radiology images from publications available in the PubMed biomedical paper repository. ROCO includes several medical imaging modalities beyond X-Ray, such as CT, Ultrasound and MRI.
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.05)
- Europe > Switzerland (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners
Zhang, Renrui, Hu, Xiangfei, Li, Bohao, Huang, Siyuan, Deng, Hanqiu, Li, Hongsheng, Qiao, Yu, Gao, Peng
Visual recognition in low-data regimes requires deep neural networks to learn generalized representations from limited training samples. Recently, CLIP-based methods have shown promising few-shot performance benefited from the contrastive language-image pre-training. We then question, if the more diverse pre-training knowledge can be cascaded to further assist few-shot representation learning. In this paper, we propose CaFo, a Cascade of Foundation models that incorporates diverse prior knowledge of various pre-training paradigms for better few-shot learning. Our CaFo incorporates CLIP's language-contrastive knowledge, DINO's vision-contrastive knowledge, DALL-E's vision-generative knowledge, and GPT-3's language-generative knowledge. Specifically, CaFo works by 'Prompt, Generate, then Cache'. Firstly, we leverage GPT-3 to produce textual inputs for prompting CLIP with rich downstream linguistic semantics. Then, we generate synthetic images via DALL-E to expand the few-shot training data without any manpower. At last, we introduce a learnable cache model to adaptively blend the predictions from CLIP and DINO. By such collaboration, CaFo can fully unleash the potential of different pre-training methods and unify them to perform state-of-the-art for few-shot classification. Code is available at https://github.com/ZrrSkywalker/CaFo.
- Asia > China > Shanghai > Shanghai (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (5 more...)
Visual Classification via Description from Large Language Models
Vision-language models (VLMs) such as CLIP have shown promising performance on a variety of recognition tasks using the standard zero-shot classification procedure - computing similarity between the query image and the embedded words for each category. By only using the category name, they neglect to make use of the rich context of additional information that language affords. The procedure gives no intermediate understanding of why a category is chosen, and furthermore provides no mechanism for adjusting the criteria used towards this decision. We present an alternative framework for classification with VLMs, which we call classification by description. We ask VLMs to check for descriptive features rather than broad categories: to find a tiger, look for its stripes; its claws; and more. By basing decisions on these descriptors, we can provide additional cues that encourage using the features we want to be used. In the process, we can get a clear idea of what features the model uses to construct its decision; it gains some level of inherent explainability. We query large language models (e.g., GPT-3) for these descriptors to obtain them in a scalable way. Extensive experiments show our framework has numerous advantages past interpretability. We show improvements in accuracy on ImageNet across distribution shifts; demonstrate the ability to adapt VLMs to recognize concepts unseen during training; and illustrate how descriptors can be edited to effectively mitigate bias compared to the baseline. Why does a person recognize a hen in Fig.1?
- Asia > Japan (0.04)
- Africa > West Africa (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (6 more...)
- Leisure & Entertainment (0.93)
- Transportation (0.68)
Neural-Symbolic Models for Logical Queries on Knowledge Graphs
Zhu, Zhaocheng, Galkin, Mikhail, Zhang, Zuobai, Tang, Jian
Answering complex first-order logic (FOL) queries on knowledge graphs is a fundamental task for multi-hop reasoning. Traditional symbolic methods traverse a complete knowledge graph to extract the answers, which provides good interpretation for each step. Recent neural methods learn geometric embeddings for complex queries. These methods can generalize to incomplete knowledge graphs, but their reasoning process is hard to interpret. In this paper, we propose Graph Neural Network Query Executor (GNN-QE), a neural-symbolic model that enjoys the advantages of both worlds. GNN-QE decomposes a complex FOL query into relation projections and logical operations over fuzzy sets, which provides interpretability for intermediate variables. To reason about the missing links, GNN-QE adapts a graph neural network from knowledge graph completion to execute the relation projections, and models the logical operations with product fuzzy logic. Experiments on 3 datasets show that GNN-QE significantly improves over previous state-of-the-art models in answering FOL queries. Meanwhile, GNN-QE can predict the number of answers without explicit supervision, and provide visualizations for intermediate variables.
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > Washington > Spokane County (0.14)
- North America > United States > South Carolina > Greenville County (0.14)
- (70 more...)
- Research Report (0.70)
- Personal > Honors (0.68)
- Media > Television (1.00)
- Media > Music (1.00)
- Media > Film (1.00)
- (6 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.86)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.57)
2022 Top Predictions for AI in Finance
It is no secret that AI has played a major role in the ongoing democratization of investing. My prediction for next year and beyond is that the major growth we've seen in retail investing will continue at a rapid pace – and AI will continue to fuel that growth. AI has helped to level the playing field for investors. Today you don't have to be a high-net-worth (HNW) investor to get personalized financial advice, there is a chatbot for that. These AI-driven chatbots will only continue to get smarter. Machine learning can now sift through various financial accounts and profiles for a user and provide a snapshot of recommended to-dos on a dashboard. This will continue to gain traction in the decade ahead. AI has also helped to simplify the client onboarding process, while also enhancing the customer experience. Going forward, as the retail investing trend continues to grow expect AI to play a larger role in risk assessment, risk management, and fraud detection. This will enable businesses to scale and keep up with heavy volatility.