Africa
New leak suggests Grok 3.5 is coming soon from xAI
The Grok 3.5 launch is imminent, according to references to the model spotted in some lines of code. The latest version of Grok doesn't have an official launch date, but that could be coming any day now since mention of Grok 3.5 has been showing up in xAI's latest web build, according to TestingCatalog, which posted screenshots on X. The AI news site also noticed a reference to an API, and said it might also be preparing a screen-sharing feature for Grok's voice mode on iOS. This would be similar to OpenAI's voice mode with ChatGPT and Google's Gemini Live features. In late April, xAI founder Elon Musk posted that an early beta release of Grok 3.5 would be available to SuperGrok subscribers, starting the countdown for a wider release.
Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance
Neal Jean, Sang Michael Xie, Stefano Ermon
Large amounts of labeled data are typically required to train deep learning models. For many real-world problems, however, acquiring additional data can be expensive or even impossible. We present semi-supervised deep kernel learning (SSDKL), a semi-supervised regression model based on minimizing predictive variance in the posterior regularization framework. SSDKL combines the hierarchical representation learning of neural networks with the probabilistic modeling capabilities of Gaussian processes. By leveraging unlabeled data, we show improvements on a diverse set of real-world regression tasks over supervised deep kernel learning and semi-supervised methods such as VAT and mean teacher adapted for regression.
Geo-Diverse Safety Alignment Da Yin
Content Warning: This paper may contain examples of harmful contents by nature. In the rapidly evolving field of Large Language Models (LLMs), ensuring safety is a crucial and widely discussed topic. However, existing works often overlook the geo-diversity of cultural and legal standards across the world.
SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection
Synthetic Aperture Radar (SAR) object detection has gained significant attention recently due to its irreplaceable all-weather imaging capabilities. However, this research field suffers from both limited public datasets (mostly comprising <2K images with only mono-category objects) and inaccessible source code. To tackle these challenges, we establish a new benchmark dataset and an open-source method for large-scale SAR object detection. Our dataset, SARDet-100K, is a result of intense surveying, collecting, and standardizing 10 existing SAR detection datasets, providing a large-scale and diverse dataset for research purposes. To the best of our knowledge, SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created. With this high-quality dataset, we conducted comprehensive experiments and uncovered a crucial challenge in SAR object detection: the substantial disparities between the pretraining on RGB datasets and finetuning on SAR datasets in terms of both data domain and model structure. To bridge these gaps, we propose a novel Multi-Stage with Filter Augmentation (MSFA) pretraining framework that tackles the problems from the perspective of data input, domain transition, and model migration. The proposed MSFA method significantly enhances the performance of SAR object detection models while demonstrating exellent generalizability and flexibility across diverse models. This work aims to pave the way for further advancements in SAR object detection.
Unlocking the Potential of Global Human Expertise Elliot Meyerson 1 Olivier Francon 1 Darren Sargent
Solving societal problems on a global scale requires the collection and processing of ideas and methods from diverse sets of international experts. As the number and diversity of human experts increase, so does the likelihood that elements in this collective knowledge can be combined and refined to discover novel and better solutions. However, it is difficult to identify, combine, and refine complementary information in an increasingly large and diverse knowledge base. This paper argues that artificial intelligence (AI) can play a crucial role in this process. An evolutionary AI framework, termed RHEA, fills this role by distilling knowledge from diverse models created by human experts into equivalent neural networks, which are then recombined and refined in a population-based search. The framework was implemented in a formal synthetic domain, demonstrating that it is transparent and systematic. It was then applied to the results of the XPRIZE Pandemic Response Challenge, in which over 100 teams of experts across 23 countries submitted models based on diverse methodologies to predict COVID-19 cases and suggest non-pharmaceutical intervention policies for 235 nations, states, and regions across the globe. Building upon this expert knowledge, by recombining and refining the 169 resulting policy suggestion models, RHEA discovered a broader and more effective set of policies than either AI or human experts alone, as evaluated based on real-world data. The results thus suggest that AI can play a crucial role in realizing the potential of human expertise in global problem-solving.
A Appendix
A.1 Creation of the Multimodal Web Document Dataset A.1.1 Collecting of a Large Number of HTML Files Our data collection process begins by considering the 25 most recent Common Crawl It contains webpages spanning from February 2020 to January/February 2023. This process yields a total of 41.2 billion documents. Selection of English content To identify non-English content, we apply the FastText classifier (Joulin et al., 2017) to the extracted text, e ectively filtering out 63.6% of the documents. Early text deduplication Often, a set of URLs is crawled repeatedly across di erent Common Crawl snapshots. However, the content of these websites may vary as web administrators make changes over time. Hence, at this stage, we refrain from deduplicating documents based on their URLs. Instead, we perform MinHash (Broder, 1997) deduplication with 16 hashes calculated over 5-grams. To further refine the data, we eliminate documents containing substantial proportions of repeated paragraphs and n-grams, employing the methodology described in MassiveText (Rae et al., 2022).
Learning Infinitesimal Generators of Continuous Symmetries from Data
Exploiting symmetry inherent in data can significantly improve the sample efficiency of a learning procedure and the generalization of learned models. When data clearly reveals underlying symmetry, leveraging this symmetry can naturally inform the design of model architectures or learning strategies. Yet, in numerous real-world scenarios, identifying the specific symmetry within a given data distribution often proves ambiguous. To tackle this, some existing works learn symmetry in a data-driven manner, parameterizing and learning expected symmetry through data. However, these methods often rely on explicit knowledge, such as pre-defined Lie groups, which are typically restricted to linear or affine transformations.
D: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages, Yi Zhou
Existing benchmarks for evaluating LLMs' cultural sensitivities are limited to a single language or collected from online sources such as Wikipedia, which do not reflect the mundane everyday lifestyles of diverse regions. That is, information about the food people eat for their birthday celebrations, spices they typically use, musical instruments youngsters play, or the sports they practice in school is common cultural knowledge but uncommon in easily collected online sources, especially for underrepresented cultures.
UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild
Achieving machine autonomy and human control often represent divergent objectives in the design of interactive AI systems. Visual generative foundation models such as Stable Diffusion show promise in navigating these goals, especially when prompted with arbitrary languages. However, they often fall short in generating images with spatial, structural, or geometric controls. The integration of such controls, which can accommodate various visual conditions in a single unified model, remains an unaddressed challenge. In response, we introduce UniControl, a new generative foundation model that consolidates a wide array of controllable condition-to-image (C2I) tasks within a singular framework, while still allowing for arbitrary language prompts. UniControl enables pixel-level-precise image generation, where visual conditions primarily influence the generated structures and language prompts guide the style and context. To equip UniControl with the capacity to handle diverse visual conditions, we augment pretrained text-to-image diffusion models and introduce a task-aware HyperNet to modulate the diffusion models, enabling the adaptation to different C2I tasks simultaneously. Trained on nine unique C2I tasks, UniControl demonstrates impressive zero-shot generation abilities with unseen visual conditions. Experimental results show that UniControl often surpasses the performance of single-task-controlled methods of comparable model sizes.