mathematical operation
Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering
Gupta, Akash, Storkey, Amos, Lapata, Mirella
Large Multimodal Models (LMMs) often rely on in-context learning (ICL) to perform new tasks with minimal supervision. However, ICL performance, especially in smaller LMMs, is inconsistent and does not always improve monotonically with increasing examples. We hypothesize that this occurs due to the LMM being overwhelmed by additional information present in the image embeddings, which is not required for the downstream task. To address this, we propose a meta-learning approach that provides an alternative for inducing few-shot capabilities in LMMs, using a fixed set of soft prompts that are distilled from task-relevant image features and can be adapted at test time using a few examples. To facilitate this distillation, we introduce an attention-mapper module that can be easily integrated with the popular LLaVA v1.5 architecture and is jointly learned with soft prompts, enabling task adaptation in LMMs under low-data regimes with just a few gradient steps. Evaluation on the VL-ICL Bench shows that our method consistently outperforms ICL and related prompt-tuning approaches, even under image perturbations, improving task induction and reasoning across visual question answering tasks.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > Dominican Republic (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.61)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Controlling AI's Growing Energy Needs
The huge amount of energy required to train artificial intelligence (AI) is becoming a concern. To train the large language model (LLM) powering Chat GPT-3, for example, almost 1,300 megawatt hours of energy was used, according to an estimate by researchers from Google and the University of California, Berkeley, a similar quantity of energy to what is used by 130 American homes in one year. Furthermore, an analysis by OpenAI suggests that the amount of power needed to train AI models has been growing exponentially since 2012, doubling roughly every 3.4 months as the models become bigger and more sophisticated. However, our energy production capacity is not increasing as steeply, and doing so is likely to further contribute to global warming: generating electricity is the single biggest contributor to climate change given that coal, oil, and gas are still widely used to generate electricity, compared to cleaner energy sources. "At this rate, we are running into a brick wall in terms of the ability to scale up machine learning networks," said Menachem Stern, a theoretical physicist at the AMOLF research institute in the Netherlands.
- North America > United States > California > Alameda County > Berkeley (0.25)
- Europe > Netherlands (0.25)
- North America > United States > Pennsylvania (0.05)
- (2 more...)
- Energy (1.00)
- Information Technology > Hardware (0.30)
Multi-Operational Mathematical Derivations in Latent Space
Valentino, Marco, Meadows, Jordan, Zhang, Lan, Freitas, André
This paper investigates the possibility of approximating multiple mathematical operations in latent space for expression derivation. To this end, we introduce different multi-operational representation paradigms, modelling mathematical operations as explicit geometric transformations. By leveraging a symbolic engine, we construct a large-scale dataset comprising 1.7M derivation steps stemming from 61K premises and 6 operators, analysing the properties of each paradigm when instantiated with state-of-the-art neural encoders. Specifically, we investigate how different encoding mechanisms can approximate equational reasoning in latent space, exploring the trade-off between learning different operators and specialising within single operations, as well as the ability to support multi-step derivations and out-of-distribution generalisation. Our empirical analysis reveals that the multi-operational paradigm is crucial for disentangling different operators, while discriminating the conclusions for a single operation is achievable in the original expression encoder. Moreover, we show that architectural choices can heavily affect the training dynamics, structural organisation, and generalisation of the latent space, resulting in significant variations across paradigms and classes of encoders.
- Asia > Middle East > Jordan (0.05)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
- (3 more...)
What the evolution of our own brains can tell us about the future of AI
The explosive growth in artificial intelligence in recent years -- crowned with the meteoric rise of generative AI chatbots like ChatGPT -- has seen the technology take on many tasks that, formerly, only human minds could handle. But despite their increasingly capable linguistic computations, these machine learning systems remain surprisingly inept at making the sorts of cognitive leaps and logical deductions that even the average teenager can consistently get right. In this week's Hitting the Books excerpt, A Brief History of Intelligence: Evolution, AI, and the Five Breakthroughs That Made Our Brains, AI entrepreneur Max Bennett explores the quizzical gap in computer competency by exploring the development of the organic machine AIs are modeled after: the human brain. Focusing on the five evolutionary "breakthroughs," amidst myriad genetic dead ends and unsuccessful offshoots, that led our species to our modern minds, Bennett also shows that the same advancements that took humanity eons to evolve can be adapted to help guide development of the AI technologies of tomorrow. In the excerpt below, we take a look at how generative AI systems like GPT-3 are built to mimic the predictive functions of the neocortex, but still can't quite get a grasp on the vagaries of human speech.
- North America > United States > Illinois > Cook County > Chicago (0.06)
- North America > United States > New York (0.05)
- Pacific Ocean (0.04)
Privacy-Preserving Encrypted Low-Dose CT Denoising
Yang, Ziyuan, Huangfu, Huijie, Ran, Maosong, Wang, Zhiwen, Yu, Hui, Zhang, Yi
Deep learning (DL) has made significant advancements in tomographic imaging, particularly in low-dose computed tomography (LDCT) denoising. A recent trend involves servers training powerful models with large amounts of self-collected private data and providing application programming interfaces (APIs) for users, such as Chat-GPT. To avoid model leakage, users are required to upload their data to the server model, but this way raises public concerns about the potential risk of privacy disclosure, especially for medical data. Hence, to alleviate related concerns, in this paper, we propose to directly denoise LDCT in the encrypted domain to achieve privacy-preserving cloud services without exposing private data to the server. To this end, we employ homomorphic encryption to encrypt private LDCT data, which is then transferred to the server model trained with plaintext LDCT for further denoising. However, since traditional operations, such as convolution and linear transformation, in DL methods cannot be directly used in the encrypted domain, we transform the fundamental mathematic operations in the plaintext domain into the operations in the encrypted domain. In addition, we present two interactive frameworks for linear and nonlinear models in this paper, both of which can achieve lossless operating. In this way, the proposed methods can achieve two merits, the data privacy is well protected and the server model is free from the risk of model leakage. Moreover, we provide theoretical proof to validate the lossless property of our framework. Finally, experiments were conducted to demonstrate that the transferred contents are well protected and cannot be reconstructed. The code will be released once the paper is accepted.
- Asia > China > Sichuan Province > Chengdu (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Czechia > Prague (0.04)
- Asia > Singapore (0.04)
Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding
Cai, Mu, Huang, Zeyi, Li, Yuheng, Wang, Haohan, Lee, Yong Jae
Recently, large language models (LLMs) have made significant advancements in natural language understanding and generation. However, their potential in computer vision remains largely unexplored. In this paper, we introduce a new, exploratory approach that enables LLMs to process images using the Scalable Vector Graphics (SVG) format. By leveraging the XML-based textual descriptions of SVG representations instead of raster images, we aim to bridge the gap between the visual and textual modalities, allowing LLMs to directly understand and manipulate images without the need for parameterized visual components. Our method facilitates simple image classification, generation, and in-context learning using only LLM capabilities. We demonstrate the promise of our approach across discriminative and generative tasks, highlighting its (i) robustness against distribution shift, (ii) substantial improvements achieved by tapping into the in-context learning abilities of LLMs, and (iii) image understanding and generation capabilities with human guidance. Our code, data, and models can be found here https://github.com/mu-cai/svg-llm.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > Austria > Vienna (0.04)
APTx: better activation function than MISH, SWISH, and ReLU's variants used in deep learning
Activation Functions introduce non-linearity in the deep neural networks. This nonlinearity helps the neural networks learn faster and efficiently from the dataset. In deep learning, many activation functions are developed and used based on the type of problem statement. ReLU's variants, SWISH, and MISH are goto activation functions. MISH function is considered having similar or even better performance than SWISH, and much better than ReLU. In this paper, we propose an activation function named APTx which behaves similar to MISH, but requires lesser mathematical operations to compute. The lesser computational requirements of APTx does speed up the model training, and thus also reduces the hardware requirement for the deep learning model.
- Asia > Singapore (0.05)
- Asia > India > Uttar Pradesh (0.05)
Python for Data Science: A Look at the Top Libraries
Python is a popular language for data science due to its powerful libraries and tools for data manipulation, visualization, machine learning, and statistical analysis. In this listicle, we will introduce some of the top Python libraries for data science and provide a quick and cool way to get started with them. NumPy is a library for working with large, multi-dimensional arrays and matrices of numerical data. It provides functions for performing mathematical operations on arrays, such as linear algebra, statistical analysis, and random number generation. It provides functions for reading in data from various sources, cleaning and wrangling data, and performing aggregations and transformations. Matplotlib is a library for creating static, animated, and interactive visualizations in Python.
Interviewing a Deep Learning Model trained to predict stocks' overperformance probability
Daniele: Hi, Deep Learning Model; very lovely to meet you. Deep Learning Model: Hi Daniele, I cannot say it is a pleasure -- not sure what that means -- but this interaction is undoubtedly an outlier for me. But please call me 43420a6962c2. Daniele: Oh, ok, interesting name, I guess. Ok, 43420a6962c2, let's get cracking with this interview.
How DeepMind's AlphaTensor AI Devised a Faster Matrix Multiplication & More Latest News - Up Jobs
After growing a man-made intelligence that may obtain superhuman mastery of video games like chess and go, along with one other AI that may predict how proteins fold themselves in three-dimensional area, the researchers over at DeepMind have completed it once more -- this time utilizing a deep studying AI mannequin to effectively clear up a elementary arithmetic downside, whereas beating a 50-year-old document besides. In a weblog put up from earlier this month, the DeepMind group introduces AlphaTensor, an AI system that's designed for locating new and extra environment friendly algorithms for fixing essential mathematical operations -- on this case, matrix multiplication. Whether they're used to course of or compress pictures or video, recognizing spoken instructions, or working simulations to foretell the climate, matrix multiplication underpins a lot of recent computing. So it's little surprise that consultants and firms everywhere in the world are continuously in search of extra environment friendly methods to enhance the algorithms for fixing these mathematical operations behind such duties. Matrix multiplication is without doubt one of the easiest mathematical operations in algebra, the place particular person numbers which might be organized in grids -- or matrices -- are multiplied collectively after which added in particular manner with the intention to generate a new matrix.