Goto

Collaborating Authors

 dragonfly


Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model

Chen, Kezhen, Thapa, Rahul, Chalamala, Rahul, Athiwaratkun, Ben, Song, Shuaiwen Leon, Zou, James

arXiv.org Artificial Intelligence

Recent advances in large multimodal models (LMMs) suggest that higher image resolution enhances the fine-grained understanding of image details, crucial for tasks such as visual commonsense reasoning and analyzing biomedical images. However, increasing input resolution poses two main challenges: 1) It extends the context length required by the language model, leading to inefficiencies and hitting the model's context limit; 2) It increases the complexity of visual features, necessitating more training data or more complex architecture. We introduce Dragonfly, a new LMM architecture that enhances fine-grained visual understanding and reasoning about image regions to address these challenges. Dragonfly employs two key strategies: multi-resolution visual encoding and zoom-in patch selection. These strategies allow the model to process high-resolution images efficiently while maintaining reasonable context length. Our experiments on eight popular benchmarks demonstrate that Dragonfly achieves competitive or better performance compared to other architectures, highlighting the effectiveness of our design. Additionally, we finetuned Dragonfly on biomedical instructions, achieving state-of-the-art results on multiple biomedical tasks requiring fine-grained visual understanding, including 92.3% accuracy on the Path-VQA dataset (compared to 83.3% for Med-Gemini) and the highest reported results on biomedical image captioning. To support model training, we curated a visual instruction-tuning dataset with 5.5 million image-instruction samples in the general domain and 1.4 million samples in the biomedical domain. We also conducted ablation studies to characterize the impact of various architectural designs and image resolutions, providing insights for future research on visual instruction alignment.


Scalable 3D Reconstruction From Single Particle X-Ray Diffraction Images Based on Online Machine Learning

Shenoy, Jay, Levy, Axel, Poitevin, Frédéric, Wetzstein, Gordon

arXiv.org Artificial Intelligence

X-ray free-electron lasers (XFELs) offer unique capabilities for measuring the structure and dynamics of biomolecules, helping us understand the basic building blocks of life. Notably, high-repetition-rate XFELs enable single particle imaging (X-ray SPI) where individual, weakly scattering biomolecules are imaged under near-physiological conditions with the opportunity to access fleeting states that cannot be captured in cryogenic or crystallized conditions. Existing X-ray SPI reconstruction algorithms, which estimate the unknown orientation of a particle in each captured image as well as its shared 3D structure, are inadequate in handling the massive datasets generated by these emerging XFELs. Here, we introduce X-RAI, an online reconstruction framework that estimates the structure of a 3D macromolecule from large X-ray SPI datasets. X-RAI consists of a convolutional encoder, which amortizes pose estimation over large datasets, as well as a physics-based decoder, which employs an implicit neural representation to enable high-quality 3D reconstruction in an end-to-end, self-supervised manner. We demonstrate that X-RAI achieves state-of-the-art performance for small-scale datasets in simulation and challenging experimental settings and demonstrate its unprecedented ability to process large datasets containing millions of diffraction images in an online fashion. These abilities signify a paradigm shift in X-ray SPI towards real-time capture and reconstruction.


Preliminary Design of the Dragonfly Navigation Filter

Schilling, Ben, McGee, Timothy G., Mitch, Ryan, Watson, Ryan

arXiv.org Artificial Intelligence

Dragonfly is scheduled to begin exploring Titan by 2034 using a series of multi-kilometer surface flights. This paper outlines the preliminary design of the navigation filter for the Dragonfly Mobility subsystem. The software architecture and filter formulation for lidar, visual odometry, pressure sensors, and redundant IMUs are described in detail. Special discussion is given to developments to achieve multi-kilometer surface flights, including optimizing sequential image baselines, modeling correlating image processing errors, and an efficient approximation to the Simultaneous Localization and Mapping (SLAM) problem.


Flight Control in the Dragonfly: A Neurobiological Simulation

Neural Information Processing Systems

Neural network simulations of the dragonfly flight neurocontrol system have been developed to understand how this insect uses complex, unsteady aerodynamics. The simulation networks account for the ganglionic spatial distribution of cells as well as the physiologic operating range and the stochastic cellular fIring history of each neuron. In addition the motor neuron firing patterns, "flight command sequences", were utilized. Simulation training was targeted against both the cellular and flight motor neuron firing patterns. The trained networks accurately resynthesized the intraganglionic cellular firing patterns.


William W.L. Li on Poe: Ai Origins

#artificialintelligence

It has a broad range of general knowledge which it can tap into to have discussions on various topics. It is aimed more at helping users solve problems, make decisions and gain new insights. Dragonfly has a deeper level of domain-specific knowledge which allows it to reason about topics such as science, engineering and healthcare. Sage is tailored for simpler back-and-forth conversations. While Sage and Dragonfly have different strengths, they share some similarities as AI systems built by Anthropic to be helpful, harmless and honest using techniques like Constitutional AI. But they play quite different roles as virtual assistants focused on either conversation or problem-solving.


OpenXLA, Mona's free Data Analysis Tool, 🤖 OpenAI GPT-4

#artificialintelligence

Try Mona's Free Automated Exploratory Data Analysis Tool Say goodbye to endless manual multivariate data exploration! Mona's new automated exploratory data analysis tool eliminates the need for manual data cleaning, transformation and visualization. Simply upload a CSV and follow a simple wizard. Mona will automatically surface granular insights on patterns and anomalies in your dataset, alongside possible explanations. Join a global community of analysts using this one-of-its-kind free tool to streamline exploratory analysis and make better data-driven decisions faster.


Quora's Poe is launching subscriptions to let you chat with GPT-4-powered bot

#artificialintelligence

Yesterday, OpenAI unveiled its new GPT-4 model and competitor Anthropic unveiled its own ChatGPT competitor, Claude. Parallelly, Quora announced that its chatbot app Poe will now have a paid tier that will let you ask questions to bots powered by these models. Poe subscriptions will set you back $19.99 per month or $199.99 per year, and you can only buy it at the moment from your iOS or Apple Silicon-powered Mac. The company is working on making the paid plan available to purchase on the web. Quora first launched Poe last December as a closed beta and later opened it up to all iOS users last month.


Quora opens its new AI chatbot app Poe to the general public • TechCrunch

#artificialintelligence

Q&A platform Quora has opened up public access to its new AI chatbot app, Poe, which lets users ask questions and get answers from a range of AI chatbots, including those from ChatGPT maker, OpenAI, and other companies like Anthropic. Beyond allowing users to experiment with new AI technologies, Poe's content will ultimately help to evolve Quora itself, the company says. Quora had first announced Poe's mobile app in December, but at the time, it required an invite to try it out. With the public launch on Friday, anyone can now use Poe's app. For now, it's available only to iOS users, but Quora says the service will arrive on other platforms in a few months.


RotorTM: A Flexible Simulator for Aerial Transportation and Manipulation

Li, Guanrui, Liu, Xinyang, Loianno, Giuseppe

arXiv.org Artificial Intelligence

Low-cost autonomous Micro Aerial Vehicles (MAVs) have the potential to help humans by simplifying and speeding up complex tasks that require their interaction with the environment, such as construction, package delivery, and search and rescue. These systems, composed of single or multiple vehicles, can be endowed with passive connection mechanisms such as rigid links or cables to perform transportation and manipulation tasks. However, they are inherently complex since they are often underactuated and evolve in nonlinear manifold configuration spaces. In addition, the complexity of systems with cable-suspended load is further increased by the hybrid dynamics depending on the cables' varying tension conditions. This paper presents the first aerial transportation and manipulation simulator incorporating different payloads and passive connection mechanisms with full system dynamics, planning, and control algorithms. Furthermore, it includes a novel general model accounting for the transient hybrid dynamics for aerial systems with cable-suspended load to closely mimic real-world systems. The availability of a flexible and intuitive interface further contributes to its usability and versatility. Comparisons between simulations and real-world experiments with different vehicles' configurations show the fidelity of the simulator results with respect to real-world settings and its benefit for rapid prototyping and transitioning of aerial transportation and manipulation systems to real-world deployment.


Airbus tests pilot assist that can automatically divert flights

Engadget

Autonomous transportation assistance isn't limited to cars. Airbus has started testing a pilot assistance feature, DragonFly, that could save an aircraft in an emergency. The system can automatically divert a flight in an emergency. It can not only pick a flight path to the best airport (using factors like airspace rules and weather), but communicate with air traffic control and an airline's operations center. If the pilots are incapacitated, the aircraft can still land safely.