salina
Explainability-Driven Dimensionality Reduction for Hyperspectral Imaging
Hyperspectral imaging (HSI) provides rich spectral information for precise material classification and analysis; however, its high dimensionality introduces a computational burden and redundancy, making dimensionality reduction essential. We present an exploratory study into the application of post-hoc explainability methods in a model--driven framework for band selection, which reduces the spectral dimension while preserving predictive performance. A trained classifier is probed with explanations to quantify each band's contribution to its decisions. We then perform deletion--insertion evaluations, recording confidence changes as ranked bands are removed or reintroduced, and aggregate these signals into influence scores. Selecting the highest--influence bands yields compact spectral subsets that maintain accuracy and improve efficiency. Experiments on two public benchmarks (Pavia University and Salinas) demonstrate that classifiers trained on as few as 30 selected bands match or exceed full--spectrum baselines while reducing computational requirements. The resulting subsets align with physically meaningful, highly discriminative wavelength regions, indicating that model--aligned, explanation-guided band selection is a principled route to effective dimensionality reduction for HSI.
SALINA: Towards Sustainable Live Sonar Analytics in Wild Ecosystems
Xu, Chi, Qian, Rongsheng, Fang, Hao, Ma, Xiaoqiang, Atlas, William I., Liu, Jiangchuan, Spoljaric, Mark A.
Sonar radar captures visual representations of underwater objects and structures using sound wave reflections, making it essential for exploration, mapping, and continuous surveillance in wild ecosystems. Real-time analysis of sonar data is crucial for time-sensitive applications, including environmental anomaly detection and in-season fishery management, where rapid decision-making is needed. However, the lack of both relevant datasets and pre-trained DNN models, coupled with resource limitations in wild environments, hinders the effective deployment and continuous operation of live sonar analytics. We present SALINA, a sustainable live sonar analytics system designed to address these challenges. SALINA enables real-time processing of acoustic sonar data with spatial and temporal adaptations, and features energy-efficient operation through a robust energy management module. Deployed for six months at two inland rivers in British Columbia, Canada, SALINA provided continuous 24/7 underwater monitoring, supporting fishery stewardship and wildlife restoration efforts. Through extensive real-world testing, SALINA demonstrated an up to 9.5% improvement in average precision and a 10.1% increase in tracking metrics. The energy management module successfully handled extreme weather, preventing outages and reducing contingency costs. These results offer valuable insights for long-term deployment of acoustic data systems in the wild.
Superpixel-based and Spatially-regularized Diffusion Learning for Unsupervised Hyperspectral Image Clustering
Cui, Kangning, Li, Ruoning, Polk, Sam L., Lin, Yinyi, Zhang, Hongsheng, Murphy, James M., Plemmons, Robert J., Chan, Raymond H.
Hyperspectral images (HSIs) provide exceptional spatial and spectral resolution of a scene, crucial for various remote sensing applications. However, the high dimensionality, presence of noise and outliers, and the need for precise labels of HSIs present significant challenges to HSIs analysis, motivating the development of performant HSI clustering algorithms. This paper introduces a novel unsupervised HSI clustering algorithm, Superpixel-based and Spatially-regularized Diffusion Learning (S2DL), which addresses these challenges by incorporating rich spatial information encoded in HSIs into diffusion geometry-based clustering. S2DL employs the Entropy Rate Superpixel (ERS) segmentation technique to partition an image into superpixels, then constructs a spatially-regularized diffusion graph using the most representative high-density pixels. This approach reduces computational burden while preserving accuracy. Cluster modes, serving as exemplars for underlying cluster structure, are identified as the highest-density pixels farthest in diffusion distance from other highest-density pixels. These modes guide the labeling of the remaining representative pixels from ERS superpixels. Finally, majority voting is applied to the labels assigned within each superpixel to propagate labels to the rest of the image. This spatial-spectral approach simultaneously simplifies graph construction, reduces computational cost, and improves clustering performance. S2DL's performance is illustrated with extensive experiments on three publicly available, real-world HSIs: Indian Pines, Salinas, and Salinas A. Additionally, we apply S2DL to landscape-scale, unsupervised mangrove species mapping in the Mai Po Nature Reserve, Hong Kong, using a Gaofen-5 HSI. The success of S2DL in these diverse numerical experiments indicates its efficacy on a wide range of important unsupervised remote sensing analysis tasks.
Open-Source NLP is a Gift from God for Tech Start-ups
Natural Language Process (NLP) is a subfield of phonetics, software engineering, and AI concerned about the connections between PCs and human language. The objective is to make a PC to do "getting" the items in records, including the logical subtleties of the language inside them. The NLP can then precisely extricate data and experiences contained in the archives as well as sort and coordinate the actual reports. Take, for instance, Megatron 530B, which was made and delivered by Microsoft and Nvidia together. Microsoft and Nvidia say that they saw somewhere in the range of 113 and 126 teraflops each second for every GPU while preparing Megatron 530B, which would put the preparation cost in the large numbers of dollars. Induction and really running the prepared model – is another test.
Open source NLP is fueling a new wave of startups
Let the OSS Enterprise newsletter guide your open source journey! Large language models capable of writing poems, summaries, and computer code are driving the demand for "natural language processing (NLP) as a service." As these models become more capable -- and accessible, relatively speaking -- appetite in the enterprise for them is growing. According to a 2021 survey from John Snow Labs and Gradient Flow, 60% of tech leaders indicated that their NLP budgets grew by at least 10% compared to 2020, while a third -- 33% -- said that their spending climbed by more than 30%. Well-resourced providers like OpenAI, Cohere, and AI21 Labs are reaping the benefits.
SaLinA: Sequential Learning of Agents
Denoyer, Ludovic, de la Fuente, Alfredo, Duong, Song, Gaya, Jean-Baptiste, Kamienny, Pierre-Alexandre, Thompson, Daniel H.
SaLinA is a simple library that makes implementing complex sequential learning models easy, including reinforcement learning algorithms. It is built as an extension of PyTorch: algorithms coded with \SALINA{} can be understood in few minutes by PyTorch users and modified easily. Moreover, SaLinA naturally works with multiple CPUs and GPUs at train and test time, thus being a good fit for the large-scale training use cases. In comparison to existing RL libraries, SaLinA has a very low adoption cost and capture a large variety of settings (model-based RL, batch RL, hierarchical RL, multi-agent RL, etc.). But SaLinA does not only target RL practitioners, it aims at providing sequential learning capabilities to any deep learning programmer.
NLPCloud.io helps devs add language processing smarts to their apps – TechCrunch
While visual'no code' tools are helping businesses get more out of computing without the need for armies of in-house techies to configure software on behalf of other staff, access to the most powerful tech tools -- at the'deep tech' AI coal face -- still requires some expert help (and/or costly in-house expertise). This is where bootstrapping French startup, NLPCloud.io, is plying a trade in MLOps/AIOps -- or'compute platform as a service' (being as it runs the queries on its own servers) -- with a focus on natural language processing (NLP), as its name suggests. Developments in artificial intelligence have, in recent years, led to impressive advances in the field of NLP -- a technology that can help businesses scale their capacity to intelligently grapple with all sorts of communications by automating tasks like Named Entity Recognition, sentiment-analysis, text classification, summarization, question answering, and Part-Of-Speech tagging, freeing up (human) staff to focus on more complex/nuanced work. OpenAI built a text generator so good, it's considered too dangerous to release Production ready (pre-trained) NLP models for English are readily available'out of the box'. There are also dedicated open source frameworks offering help with training models.
Spatially regularized active diffusion learning for high-dimensional images
An active learning algorithm for the classification of high-dimensional images is proposed in which spatially-regularized nonlinear diffusion geometry is used to characterize cluster cores. The proposed method samples from estimated cluster cores in order to generate a small but potent set of training labels which propagate to the remainder of the dataset via the underlying diffusion process. By spatially regularizing the rich, high-dimensional spectral information of the image to efficiently estimate the most significant and influential points in the data, our approach avoids redundancy in the training dataset. This allows it to produce high-accuracy labelings with a very small number of training labels. The proposed algorithm admits an efficient numerical implementation that scales essentially linearly in the number of data points under a suitable data model and enjoys state-of-the-art performance on real hyperspectral images.
Deep Clustering With Intra-class Distance Constraint for Hyperspectral Images
Sun, Jinguang, Wang, Wanli, Wei, Xian, Fang, Li, Tang, Xiaoliang, Xu, Yusheng, Yu, Hui, Yao, Wei
The high dimensionality of hyperspectral images often results in the degradation of clustering performance. Due to the powerful ability of deep feature extraction and non-linear feature representation, the clustering algorithm based on deep learning has become a hot research topic in the field of hyperspectral remote sensing. However, most deep clustering algorithms for hyperspectral images utilize deep neural networks as feature extractor without considering prior knowledge constraints that are suitable for clustering. To solve this problem, we propose an intra-class distance constrained deep clustering algorithm for high-dimensional hyperspectral images. The proposed algorithm constrains the feature mapping procedure of the auto-encoder network by intra-class distance so that raw images are transformed from the original high-dimensional space to the low-dimensional feature space that is more conducive to clustering. Furthermore, the related learning process is treated as a joint optimization problem of deep feature extraction and clustering. Experimental results demonstrate the intense competitiveness of the proposed algorithm in comparison with state-of-the-art clustering methods of hyperspectral images.
Hyperspectral Image Classification with Deep Metric Learning and Conditional Random Field
Liang, Yi, Zhao, Xin, Guo, Alan J. X., Zhu, Fei
To improve the classification performance in the context of hyperspectral image processing, many works have been developed based on two common strategies, namely the spatial-spectral information integration and the utilization of neural networks. However, both strategies typically require more training data than the classical algorithms, aggregating the shortage of labeled samples. In this paper, we propose a novel framework that organically combines an existing spectrum-based deep metric learning model and the conditional random field algorithm. The deep metric learning model is supervised by center loss, and is used to produce spectrum-based features that gather more tightly within classes in Euclidean space. The conditional random field with Gaussian edge potentials, which is firstly proposed for image segmentation problem, is utilized to jointly account for both the geometry distance of two pixels and the Euclidean distance between their corresponding features extracted by the deep metric learning model. The final predictions are given by the conditional random field. Generally, the proposed framework is trained by spectra pixels at the deep metric learning stage, and utilizes the half handcrafted spatial features at the conditional random field stage. This settlement alleviates the shortage of training data to some extent. Experiments on two real hyperspectral images demonstrate the advantages of the proposed method in terms of both classification accuracy and computation cost.