Goto

Collaborating Authors

 guide


A Practitioner's Guide to Real-World Continual Multimodal Pretraining

Neural Information Processing Systems

Multimodal foundation models serve numerous applications at the intersection of vision and language. Still, despite being pretrained on extensive data, they become outdated over time.To keep models updated, research into continual pretraining mainly explores scenarios with either (1) infrequent, indiscriminate updates on large-scale new data, or (2) frequent, sample-level updates.However, practical model deployment often operates in the gap between these two limit cases, as real-world applications demand adaptation to specific subdomains, tasks or concepts --- spread over the entire, varying life cycle of a model. In this work, we complement current perspectives on continual pretraining through a research test bed and offer comprehensive guidance for effective continual model updates in such scenarios.We first introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements, constructed over 63 datasets with diverse visual and semantic coverage.Using FoMo-in-Flux, we explore the complex landscape of practical continual pretraining through multiple perspectives: (1) data mixtures and stream orderings that emulate real-world deployment settings, (2) methods ranging from simple fine-tuning and traditional continual learning strategies to parameter-efficient updates and model merging, (3) meta-learning-rate schedules and mechanistic design choices, and (4) model and compute scaling. Together, our insights provide a practitioner's guide to continual multimodal pretraining for real-world deployment.


A Hitchhiker's Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning

Neural Information Processing Systems

Explainability in artificial intelligence is crucial for restoring trust, particularly in areas like face forgery detection, where viewers often struggle to distinguish between real and fabricated content. Vision and Large Language Models (VLLM) bridge computer vision and natural language, offering numerous applications driven by strong common-sense reasoning. Despite their success in various tasks, the potential of vision and language remains underexplored in face forgery detection, where they hold promise for enhancing explainability by leveraging the intrinsic reasoning capabilities of language to analyse fine-grained manipulation areas. For that reason, few works have recently started to frame the problem of deepfake detection as a Visual Question Answering (VQA) task, nevertheless omitting the realistic and informative open-ended multi-label setting. With the rapid advances in the field of VLLM, an exponential rise of investigations in that direction is expected.


The Game Designer Playing Through His Own Psyche

The New Yorker

A little more than a decade ago, the video-game designer Davey Wreden experienced a crippling success. In October, 2013, he and a collaborator, William Pugh, released the Stanley Parable HD, a polished and expanded version of a prototype that Wreden had developed in college, and which he had made available, free of charge, two years before. Wreden and Pugh hoped that they might sell fifty thousand or so copies of the new version in the course of its lifetime. They sold that many on the first day. Wreden was twenty-five years old, and he had everything he'd ever wanted: money, success, recognition.


Last Year's Sci-Fi Was More Genre-Bending Than Ever

WIRED

The Best American Science Fiction and Fantasy 2022, which collects 20 of the best fantasy and science fiction stories of the past year, features a wide range of characters and settings. Guest editor Rebecca Roanhorse made the final selections for this year's volume. "This is not your father's science fiction and fantasy collection," Roanhorse says in Episode 538 of the Geek's Guide to the Galaxy podcast. "I'm excited to see what people are writing, and where the genre is going, and what sort of new voices can be discovered, and how far we can push boundaries and still tell universal stories." P. Djèlí Clark's genre-bending "If the Martians Have Magic" features Haitian priests battling the alien invaders from The War of the Worlds. "I always think my stories are too weird," Clark says.


Hot Off the Press: The Chatbot Buyer's Guide for 2023

#artificialintelligence

Chatbots and conversational AI have been gaining acceptance as essential pieces of successful customer service and employee support strategies. If your organisation doesn't have at least one of these solutions already, it's likely you are planning to deploy one soon or are exploring the possibility of adding one to your 2023 strategy. Unfortunately, as adoption of this technology is increasing so is the oversaturation of the market with poor performing chatbot products. Now many live chat, CRM, and contact centre vendors are attempting to jump on the conversational AI bandwagon with their own'add-on bots'. This is creating both confusion for buyers and a starker divide between vendors selling add-on bots and vendors that are true conversational AI specialists.


Springer has released 65 Machine Learning and Data books for free

#artificialintelligence

Springer has released hundreds of free books on a wide range of topics to the general public. The list, which includes 408 books in total, covers a wide range of scientific and technological topics. In order to save you some time, I have created one list of all the books (65 in number) that are relevant to the data and Machine Learning field. Among the books, you will find those dealing with the mathematical side of the domain (Algebra, Statistics, and more), along with more advanced books on Deep Learning and other advanced topics. You also could find some good books in various programming languages such as Python, R, MATLAB, etc.


Uncovering Hidden Meaning: A Beginner's Guide to Latent Semantic Analysis

#artificialintelligence

If you have ever worked with text data, you have likely encountered the challenge of dealing with high-dimensional and sparse data. One popular solution to this problem is latent semantic analysis (LSA), also known as latent semantic indexing (LSI). LSA is a technique for extracting latent (hidden) semantics from a collection of documents or text data. It does this by mapping the documents into a lower-dimensional space, where the relationships between the documents and the underlying concepts they represent can be more easily understood. One of the key benefits of LSA is that it can handle large amounts of data efficiently and is robust to noise and sparse data.


The Beginner's Guide to Midjourney: How to Start Creating Your Own AI Art

#artificialintelligence

In this article, I will guide you through the steps you need to take to be able to start creating your own images and give some tips for the start. If you don't have any experience with Midjourney yet, this is just for you and might save you some troubles along the way. I will go straight to the point so that you can start creating your AI Art right after you finish this article. The most confusing thing is that Midjourney is not an app or a dedicated website. It's a bot on a server on the communication platform named Discord, mainly known to gamers.



Buy Python Machine Learning: A Practical Beginner's Guide to Understanding Machine Learning, Deep Learning and Neural Networks with Python, Scikit-Learn, Tensorflow and Keras Book Online at Low Prices in India

#artificialintelligence

Amazon.in - Buy Python Machine Learning: A Practical Beginner's Guide to Understanding Machine Learning, Deep Learning and Neural Networks with Python, Scikit-Learn, Tensorflow and Keras book online at best prices in India on Amazon.in. Read Python Machine Learning: A Practical Beginner's Guide to Understanding Machine Learning, Deep Learning and Neural Networks with Python, Scikit-Learn, Tensorflow and Keras book reviews & author details and more at Amazon.in. Free delivery on qualified orders.