Pacific Ocean
ReefGlider: A highly maneuverable vectored buoyancy engine based underwater robot
Macauley, Kevin, Cai, Levi, Adamczyk, Peter, Girdhar, Yogesh
There exists a capability gap in the design of currently available autonomous underwater vehicles (AUV). Most AUVs use a set of thrusters, and optionally control surfaces, to control their depth and pose. AUVs utilizing thrusters can be highly maneuverable, making them well-suited to operate in complex environments such as in close-proximity to coral reefs. However, they are inherently power-inefficient and produce significant noise and disturbance. Underwater gliders, on the other hand, use changes in buoyancy and center of mass, in combination with a control surface to move around. They are extremely power efficient but not very maneuverable. Gliders are designed for long-range missions that do not require precision maneuvering. Furthermore, since gliders only activate the buoyancy engine for small time intervals, they do not disturb the environment and can also be used for passive acoustic observations. In this paper we present ReefGlider, a novel AUV that uses only buoyancy for control but is still highly maneuverable from additional buoyancy control devices. ReefGlider bridges the gap between the capabilities of thruster-driven AUVs and gliders. These combined characteristics make ReefGlider ideal for tasks such as long-term visual and acoustic monitoring of coral reefs. We present the overall design and implementation of the system, as well as provide analysis of some of its capabilities.
CloudSense: A Model for Cloud Type Identification using Machine Learning from Radar data
Nizar, Mehzooz, Ambuj, Jha K., Singh, Manmeet, B, Vaisakh S., Pandithurai, G.
The knowledge of type of precipitating cloud is crucial for radar based quantitative estimates of precipitation. We propose a novel model called CloudSense which uses machine learning to accurately identify the type of precipitating clouds over the complex terrain locations in the Western Ghats (WGs) of India. CloudSense uses vertical reflectivity profiles collected during July-August 2018 from an X-band radar to classify clouds into four categories namely stratiform,mixed stratiform-convective,convective and shallow clouds. The machine learning(ML) model used in CloudSense was trained using a dataset balanced by Synthetic Minority Oversampling Technique (SMOTE), with features selected based on physical characteristics relevant to different cloud types. Among various ML models evaluated Light Gradient Boosting Machine (LightGBM) demonstrate superior performance in classifying cloud types with a BAC of 0.8 and F1-Score of 0.82. CloudSense generated results are also compared against conventional radar algorithms and we find that CloudSense performs better than radar algorithms. For 200 samples tested, the radar algorithm achieved a BAC of 0.69 and F1-Score of 0.68, whereas CloudSense achieved a BAC and F1-Score of 0.77. Our results show that ML based approach can provide more accurate cloud detection and classification which would be useful to improve precipitation estimates over the complex terrain of the WG.
Real-Time Pill Identification for the Visually Impaired Using Deep Learning
Dang, Bo, Zhao, Wenchao, Li, Yufeng, Ma, Danqing, Yu, Qixuan, Zhu, Elly Yijun
The prevalence of mobile technology offers unique opportunities for addressing healthcare challenges, especially for individuals with visual impairments. This paper explores the development and implementation of a deep learning-based mobile application designed to assist blind and visually impaired individuals in real-time pill identification. Utilizing the YOLO framework, the application aims to accurately recognize and differentiate between various pill types through real-time image processing on mobile devices. The system incorporates Text-to- Speech (TTS) to provide immediate auditory feedback, enhancing usability and independence for visually impaired users. Our study evaluates the application's effectiveness in terms of detection accuracy and user experience, highlighting its potential to improve medication management and safety among the visually impaired community. Keywords-Deep Learning; YOLO Framework; Mobile Application; Visual Impairment; Pill Identification; Healthcare
Unified Locational Differential Privacy Framework
Priyanshu, Aman, Maurya, Yash, Ganesh, Suriya, Tran, Vy
Aggregating statistics over geographical regions is important for many applications, such as analyzing income, election results, and disease spread. However, the sensitive nature of this data necessitates strong privacy protections to safeguard individuals. In this work, we present a unified locational differential privacy (DP) framework to enable private aggregation of various data types, including one-hot encoded, boolean, float, and integer arrays, over geographical regions. Our framework employs local DP mechanisms such as randomized response, the exponential mechanism, and the Gaussian mechanism. We evaluate our approach on four datasets representing significant location data aggregation scenarios. Results demonstrate the utility of our framework in providing formal DP guarantees while enabling geographical data analysis.
ImageInWords: Unlocking Hyper-Detailed Image Descriptions
Garg, Roopal, Burns, Andrea, Ayan, Burcu Karagol, Bitton, Yonatan, Montgomery, Ceslee, Onoe, Yasumasa, Bunner, Andrew, Krishna, Ranjay, Baldridge, Jason, Soricut, Radu
Despite the longstanding adage "an image is worth a thousand words," creating accurate and hyper-detailed image descriptions for training Vision-Language models remains challenging. Current datasets typically have web-scraped descriptions that are short, low-granularity, and often contain details unrelated to the visual content. As a result, models trained on such data generate descriptions replete with missing information, visual inconsistencies, and hallucinations. To address these issues, we introduce ImageInWords (IIW), a carefully designed human-in-the-loop annotation framework for curating hyper-detailed image descriptions and a new dataset resulting from this process. We validate the framework through evaluations focused on the quality of the dataset and its utility for fine-tuning with considerations for readability, comprehensiveness, specificity, hallucinations, and human-likeness. Our dataset significantly improves across these dimensions compared to recently released datasets (+66%) and GPT-4V outputs (+48%). Furthermore, models fine-tuned with IIW data excel by +31% against prior work along the same human evaluation dimensions. Given our fine-tuned models, we also evaluate text-to-image generation and vision-language reasoning. Our model's descriptions can generate images closest to the original, as judged by both automated and human metrics. We also find our model produces more compositionally rich descriptions, outperforming the best baseline by up to 6% on ARO, SVO-Probes, and Winoground datasets.
Fake Artificial Intelligence Generated Contents (FAIGC): A Survey of Theories, Detection Methods, and Opportunities
Yu, Xiaomin, Wang, Yezhaohui, Chen, Yanfang, Tao, Zhen, Xi, Dinghao, Song, Shichao, Niu, Simin, Li, Zhiyu
In recent years, generative artificial intelligence models, represented by Large Language Models (LLMs) and Diffusion Models (DMs), have revolutionized content production methods. These artificial intelligence-generated content (AIGC) have become deeply embedded in various aspects of daily life and work. However, these technologies have also led to the emergence of Fake Artificial Intelligence Generated Content (FAIGC), posing new challenges in distinguishing genuine information. It is crucial to recognize that AIGC technology is akin to a double-edged sword; its potent generative capabilities, while beneficial, also pose risks for the creation and dissemination of FAIGC. In this survey, We propose a new taxonomy that provides a more comprehensive breakdown of the space of FAIGC methods today. Next, we explore the modalities and generative technologies of FAIGC. We introduce FAIGC detection methods and summarize the related benchmark from various perspectives. Finally, we discuss outstanding challenges and promising areas for future research.
Chronos: Learning the Language of Time Series
Ansari, Abdul Fatir, Stella, Lorenzo, Turkmen, Caner, Zhang, Xiyuan, Mercado, Pedro, Shen, Huibin, Shchur, Oleksandr, Rangapuram, Syama Sundar, Arango, Sebastian Pineda, Kapoor, Shubham, Zschiegner, Jasper, Maddix, Danielle C., Wang, Hao, Mahoney, Michael W., Torkkola, Kari, Wilson, Andrew Gordon, Bohlke-Schneider, Michael, Wang, Yuyang
We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M parameters) on a large collection of publicly available datasets, complemented by a synthetic dataset that we generated via Gaussian processes to improve generalization. In a comprehensive benchmark consisting of 42 datasets, and comprising both classical local models and deep learning methods, we show that Chronos models: (a) significantly outperform other methods on datasets that were part of the training corpus; and (b) have comparable and occasionally superior zero-shot performance on new datasets, relative to methods that were trained specifically on them. Our results demonstrate that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks, positioning pretrained models as a viable tool to greatly simplify forecasting pipelines.
Evaluating the effectiveness of predicting covariates in LSTM Networks for Time Series Forecasting
Autoregressive Recurrent Neural Networks are widely employed in time-series forecasting tasks, demonstrating effectiveness in univariate and certain multivariate scenarios. However, their inherent structure does not readily accommodate the integration of future, time-dependent covariates. A proposed solution, outlined by Salinas et al 2019, suggests forecasting both covariates and the target variable in a multivariate framework. In this study, we conducted comprehensive tests on publicly available time-series datasets, artificially introducing highly correlated covariates to future time-step values. Our evaluation aimed to assess the performance of an LSTM network when considering these covariates and compare it against a univariate baseline. As part of this study we introduce a novel approach using seasonal time segments in combination with an RNN architecture, which is both simple and extremely effective over long forecast horizons with comparable performance to many state of the art architectures. Our findings from the results of more than 120 models reveal that under certain conditions jointly training covariates with target variables can improve overall performance of the model, but often there exists a significant performance disparity between multivariate and univariate predictions. Surprisingly, even when provided with covariates informing the network about future target values, multivariate predictions exhibited inferior performance. In essence, compelling the network to predict multiple values can prove detrimental to model performance, even in the presence of informative covariates. These results suggest that LSTM architectures may not be suitable for forecasting tasks where predicting covariates would typically be expected to enhance model accuracy.
Is Mamba Effective for Time Series Forecasting?
Wang, Zihan, Kong, Fanheng, Feng, Shi, Wang, Ming, Yang, Xiaocui, Zhao, Han, Wang, Daling, Zhang, Yifei
In the realm of time series forecasting (TSF), it is imperative for models to adeptly discern and distill hidden patterns within historical time series data to forecast future states. Transformer-based models exhibit formidable efficacy in TSF, primarily attributed to their advantage in apprehending these patterns. However, the quadratic complexity of the Transformer leads to low computational efficiency and high costs, which somewhat hinders the deployment of the TSF model in real-world scenarios. Recently, Mamba, a selective state space model, has gained traction due to its ability to process dependencies in sequences while maintaining near-linear complexity. For TSF tasks, these characteristics enable Mamba to comprehend hidden patterns as the Transformer and reduce computational overhead compared to the Transformer. Therefore, we propose a Mamba-based model named Simple-Mamba (S-Mamba) for TSF. Specifically, we tokenize the time points of each variate autonomously via a linear layer. A bidirectional Mamba layer is utilized to extract inter-variate correlations and a Feed-Forward Network is set to learn temporal dependencies. Finally, the generation of forecast outcomes through a linear mapping layer. Experiments on thirteen public datasets prove that S-Mamba maintains low computational overhead and achieves leading performance. Furthermore, we conduct extensive experiments to explore Mamba's potential in TSF tasks. Our code is available at https://github.com/wzhwzhwzh0921/S-D-Mamba.
CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments
Huang, Kaixuan, Qu, Yuanhao, Cousins, Henry, Johnson, William A., Yin, Di, Shah, Mihir, Zhou, Denny, Altman, Russ, Wang, Mengdi, Cong, Le
The introduction of genome engineering technology has transformed biomedical research, making it possible to make precise changes to genetic information. However, creating an efficient gene-editing system requires a deep understanding of CRISPR technology, and the complex experimental systems under investigation. While Large Language Models (LLMs) have shown promise in various tasks, they often lack specific knowledge and struggle to accurately solve biological design problems. In this work, we introduce CRISPR-GPT, an LLM agent augmented with domain knowledge and external tools to automate and enhance the design process of CRISPR-based gene-editing experiments. CRISPR-GPT leverages the reasoning ability of LLMs to facilitate the process of selecting CRISPR systems, designing guide RNAs, recommending cellular delivery methods, drafting protocols, and designing validation experiments to confirm editing outcomes. We showcase the potential of CRISPR-GPT for assisting non-expert researchers with gene-editing experiments from scratch and validate the agent's effectiveness in a real-world use case. Furthermore, we explore the ethical and regulatory considerations associated with automated gene-editing design, highlighting the need for responsible and transparent use of these tools. Our work aims to bridge the gap between beginner biological researchers and CRISPR genome engineering techniques, and demonstrate the potential of LLM agents in facilitating complex biological discovery tasks.