Springfield
Monet: Mixture of Monosemantic Experts for Transformers
Park, Jungwoo, Ahn, Young Jin, Kim, Kee-Eung, Kang, Jaewoo
Understanding the internal computations of large language models (LLMs) is crucial for aligning them with human values and preventing undesirable behaviors like toxic content generation. However, mechanistic interpretability is hindered by polysemanticity -- where individual neurons respond to multiple, unrelated concepts. While Sparse Autoencoders (SAEs) have attempted to disentangle these features through sparse dictionary learning, they have compromised LLM performance due to reliance on post-hoc reconstruction loss. To address this issue, we introduce Mixture of Monosemantic Experts for Transformers (Monet) architecture, which incorporates sparse dictionary learning directly into end-to-end Mixture-of-Experts pretraining. Our novel expert decomposition method enables scaling the expert count to 262,144 per layer while total parameters scale proportionally to the square root of the number of experts. Our analyses demonstrate mutual exclusivity of knowledge across experts and showcase the parametric knowledge encapsulated within individual experts. Moreover, Monet allows knowledge manipulation over domains, languages, and toxicity mitigation without degrading general performance. Our pursuit of transparent LLMs highlights the potential of scaling expert counts to enhance mechanistic interpretability and directly resect the internal knowledge to fundamentally adjust model behavior. The source code and pretrained checkpoints are available at https://github.com/dmis-lab/Monet.
Outlier-robust Kalman Filtering through Generalised Bayes
Duran-Martin, Gerardo, Altamirano, Matias, Shestopaloff, Alexander Y., Sรกnchez-Betancourt, Leandro, Knoblauch, Jeremias, Jones, Matt, Briol, Franรงois-Xavier, Murphy, Kevin
We derive a novel, provably robust, and closed-form Bayesian update rule for online filtering in state-space models in the presence of outliers and misspecified measurement models. Our method combines generalised Bayesian inference with filtering methods such as the extended and ensemble Kalman filter. We use the former to show robustness and the latter to ensure computational efficiency in the case of nonlinear models. Our method matches or outperforms other robust filtering methods (such as those based on variational Bayes) at a much lower computational cost. We show this empirically on a range of filtering problems with outlier measurements, such as object tracking, state estimation in high-dimensional chaotic systems, and online learning of neural networks.
Trainable Loss Weights in Super-Resolution
Mellatshahi, Arash Chaichi, Kasaei, Shohreh
In recent years, limited research has discussed the loss function in the super-resolution process. The majority of those studies have only used perceptual similarity conventionally. This is while the development of appropriate loss can improve the quality of other methods as well. In this article, a new weighting method for pixel-wise loss is proposed. With the help of this method, it is possible to use trainable weights based on the general structure of the image and its perceptual features while maintaining the advantages of pixel-wise loss. Also, a criterion for comparing weights of loss is introduced so that the weights can be estimated directly by a convolutional neural network. In addition, in this article, the expectation-maximization method is used for the simultaneous estimation super-resolution network and weighting network. In addition, a new activation function, called "FixedSum", is introduced which can keep the sum of all components of vector constants while keeping the output components between zero and one. As experimental results shows, weighted loss by the proposed method leads to better results than the unweighted loss and weighted loss based on uncertainty in both signal-to-noise and perceptual similarity senses on the state-of-the-art networks. Code is available online.
Large Language Models are Diverse Role-Players for Summarization Evaluation
Wu, Ning, Gong, Ming, Shou, Linjun, Liang, Shining, Jiang, Daxin
Text summarization has a wide range of applications in many scenarios. The evaluation of the quality of the generated text is a complex problem. A big challenge to language evaluation is that there is a clear divergence between existing metrics and human evaluation. A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal. Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions. In this paper, we propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects. First, we propose to model objective and subjective dimensions of generated text based on roleplayers prompting mechanism. Furthermore, we introduce a context-based prompting mechanism that is able to generate dynamic roleplayer profiles based on input context. Finally, we design a multi-roleplayer prompting technology based on batch prompting and integrate multiple outputs into the final evaluation results. Experimental results on three real datasets for summarization show that our model is highly competitive and has a very high consistency with human annotators.
Differentiable Rendering for Synthetic Aperture Radar Imagery
Wilmanski, Michael, Tamir, Jonathan
There is rising interest in differentiable rendering, which allows explicitly modeling geometric priors and constraints in optimization pipelines using first-order methods such as backpropagation. Incorporating such domain knowledge can lead to deep neural networks that are trained more robustly and with limited data, as well as the capability to solve ill-posed inverse problems. Existing efforts in differentiable rendering have focused on imagery from electro-optical sensors, particularly conventional RGB-imagery. In this work, we propose an approach for differentiable rendering of Synthetic Aperture Radar (SAR) imagery, which combines methods from 3D computer graphics with neural rendering. We demonstrate the approach on the inverse graphics problem of 3D Object Reconstruction from limited SAR imagery using high-fidelity simulated SAR data.
Knowledge Distilled Ensemble Model for sEMG-based Silent Speech Interface
Lai, Wenqiang, Yang, Qihan, Mao, Ye, Sun, Endong, Ye, Jiangnan
Abstract--Voice disorders affect millions of people worldwide. Our findings also shed light on an endto-end system for portable, practical equipment. Most recently, deep learningbased methods have thrived and significantly improved over I. AlterEgo, utilising CNN, proposed a product that did not require users explicitly mouth their Normal communication is not always possible. According speech with pronounced, apparent facial movements [10]. Diseases that lead to method to classify the International Radiotelephony language impairments include brain injuries (e.g., aphasia, Spelling Alphabet with a commercially off-the-shelf (COTS) apraxia, and dysarthria) and voice disorders, where there are device.
Acoustic Beamforming for Object-relative Distance Estimation and Control in Unmanned Air Vehicles using Propulsion System Noise
Sharma, Alisha, Geder, Jason, Lingevitch, Joseph, Martin, Theodore, Lofaro, Daniel, Sofge, Donald
Unmanned air vehicles often produce significant noise from their propulsion systems. Using this broadband signal as "acoustic illumination" for an auxiliary sensing system could make vehicles more robust at a minimal cost. We present an acoustic beamforming-based algorithm that estimates object-relative distance with a small two-microphone array using the generated propulsion system noise of a vehicle. We demonstrate this approach in several closed-loop distance feedback control tests with a mounted quad-rotor vehicle in a noisy environment and show accurate object-relative distance estimates more than 2x further than the baseline channel-based approach. We conclude that this approach is robust to several practical vehicle and noise situations and shows promise for use in more complex operating environments.
Pentagon goes on AI hiring spree to bring machine learning capabilities to the battlefield
'The Five' discuss how AI generated images are getting harder to distinguish from reality and how the Dalai Lama asked a young boy to suck his tongue. The Pentagon is hiring data scientists, technologists and engineers as part of its effort to incorporate artificial intelligence into the machinery used to wage war. The Defense Department has posted several AI jobs on USAjobs.gov over the last few weeks, including many with salaries well into six figures. One of the higher paying jobs advertised in the last few weeks is for a senior technologist for "cognitive and decision science" at the U.S. Navy's Point Loma Complex in San Diego. That job starts at $170,000 and could pay as much as $212,000 year for someone who can help insert "cutting-edge technology" into Navy weaponry and equipment.
Toward Defining a Domain Complexity Measure Across Domains
Doctor, Katarina, Task, Christine, Kildebeck, Eric, Kejriwal, Mayank, Holder, Lawrence, Leong, Russell
Artificial Intelligence (AI) systems planned for deployment in real-world applications frequently are researched and developed in closed simulation environments where all variables are controlled and known to the simulator or labeled benchmark datasets are used. Transition from these simulators, testbeds, and benchmark datasets to more open-world domains poses significant challenges to AI systems, including significant increases in the complexity of the domain and the inclusion of real-world novelties; the open-world environment contains numerous out-of-distribution elements that are not part in the AI systems' training set. Here, we propose a path to a general, domain-independent measure of domain complexity level. We distinguish two aspects of domain complexity: intrinsic and extrinsic. The intrinsic domain complexity is the complexity that exists by itself without any action or interaction from an AI agent performing a task on that domain. This is an agent-independent aspect of the domain complexity. The extrinsic domain complexity is agent- and task-dependent. Intrinsic and extrinsic elements combined capture the overall complexity of the domain. We frame the components that define and impact domain complexity levels in a domain-independent light. Domain-independent measures of complexity could enable quantitative predictions of the difficulty posed to AI systems when transitioning from one testbed or environment to another, when facing out-of-distribution data in open-world tasks, and when navigating the rapidly expanding solution and search spaces encountered in open-world domains.
Data Scientist at Novetta - Springfield, Virginia
Accenture Federal Services delivers a range of innovative, tech-enabled services for the U.S. Federal Government to address the complex, sensitive challenges of national security and intelligence missions. Refer a qualified candidate and earn up to $20K. Accenture Federal Services is seeking a Data Scientist to analyze, design, code and test multiple components of application code across one or more clients. Compensation for roles at Accenture Federal Services varies depending on a wide array of factors including but not limited to the specific office location, role, skill set and level of experience. As required by local law, Accenture Federal Services provides a reasonable range of compensation for roles that may be hired in California, Colorado, New York City or Washington as set forth below and information on benefits offered is here.