ground-truth data
Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization
Estimating the homography between two images is crucial for mid-or high-level vision tasks, such as image stitching and fusion. However, using supervised learning methods is often challenging or costly due to the difficulty of collecting ground-truth data. In response, unsupervised learning approaches have emerged. Most early methods, though, assume that the given image pairs are from the same camera or have minor lighting differences. Consequently, while these methods perform effectively under such conditions, they generally fail when input image pairs come from different domains, referred to as multimodal image pairs.To address these limitations, we propose AltO, an unsupervised learning framework for estimating homography in multimodal image pairs. Our method employs a two-phase alternating optimization framework, similar to Expectation-Maximization (EM), where one phase reduces the geometry gap and the other addresses the modality gap. To handle these gaps, we use Barlow Twins loss for the modality gap and propose an extended version, Geometry Barlow Twins, for the geometry gap. As a result, we demonstrate that our method, AltO, can be trained on multimodal datasets without any ground-truth data. It not only outperforms other unsupervised methods but is also compatible with various architectures of homography estimators.The source code can be found at: https://github.com/songsang7/AltO
Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization
Estimating the homography between two images is crucial for mid- or high-level vision tasks, such as image stitching and fusion. However, using supervised learning methods is often challenging or costly due to the difficulty of collecting ground-truth data. In response, unsupervised learning approaches have emerged. Most early methods, though, assume that the given image pairs are from the same camera or have minor lighting differences. Consequently, while these methods perform effectively under such conditions, they generally fail when input image pairs come from different domains, referred to as multimodal image pairs.To address these limitations, we propose AltO, an unsupervised learning framework for estimating homography in multimodal image pairs. Our method employs a two-phase alternating optimization framework, similar to Expectation-Maximization (EM), where one phase reduces the geometry gap and the other addresses the modality gap.
GAN-enhanced Simulation-driven DNN Testing in Absence of Ground Truth
Attaoui, Mohammed, Pastore, Fabrizio
The generation of synthetic inputs via simulators driven by search algorithms is essential for cost-effective testing of Deep Neural Network (DNN) components for safety-critical systems. However, in many applications, simulators are unable to produce the ground-truth data needed for automated test oracles and to guide the search process. To tackle this issue, we propose an approach for the generation of inputs for computer vision DNNs that integrates a generative network to ensure simulator fidelity and employs heuristic-based search fitnesses that leverage transformation consistency, noise resistance, surprise adequacy, and uncertainty estimation. We compare the performance of our fitnesses with that of a traditional fitness function leveraging ground truth; further, we assess how the integration of a GAN not leveraging the ground truth impacts on test and retraining effectiveness. Our results suggest that leveraging transformation consistency is the best option to generate inputs for both DNN testing and retraining; it maximizes input diversity, spots the inputs leading to worse DNN performance, and leads to best DNN performance after retraining. Besides enabling simulator-based testing in the absence of ground truth, our findings pave the way for testing solutions that replace costly simulators with diffusion and large language models, which might be more affordable than simulators, but cannot generate ground-truth data.
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada (0.04)
- Europe > Poland (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Information Technology > Robotics & Automation (0.68)
- Automobiles & Trucks (0.68)
- Transportation > Ground > Road (0.46)
- Transportation > Air (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS
Nguyen, Tuan Nam, Akti, Seymanur, Pham, Ngoc Quan, Waibel, Alexander
Previous approaches on accent conversion (AC) mainly aimed at making non-native speech sound more native while maintaining the original content and speaker identity. However, non-native speakers sometimes have pronunciation issues, which can make it difficult for listeners to understand them. Hence, we developed a new AC approach that not only focuses on accent conversion but also improves pronunciation of non-native accented speaker. By providing the non-native audio and the corresponding transcript, we generate the ideal ground-truth audio with native-like pronunciation with original duration and prosody. This ground-truth data aids the model in learning a direct mapping between accented and native speech. We utilize the end-to-end VITS framework to achieve high-quality waveform reconstruction for the AC task. As a result, our system not only produces audio that closely resembles native accents and while retaining the original speaker's identity but also improve pronunciation, as demonstrated by evaluation results.
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
A Scientific Machine Learning Approach for Predicting and Forecasting Battery Degradation in Electric Vehicles
Murgai, Sharv, Bhagwat, Hrishikesh, Dandekar, Raj Abhijit, Dandekar, Rajat, Panat, Sreedath
Carbon emissions are rising at an alarming rate, posing a significant threat to global efforts to mitigate climate change. Electric vehicles have emerged as a promising solution, but their reliance on lithium-ion batteries introduces the critical challenge of battery degradation. Accurate prediction and forecasting of battery degradation over both short and long time spans are essential for optimizing performance, extending battery life, and ensuring effective long-term energy management. This directly influences the reliability, safety, and sustainability of EVs, supporting their widespread adoption and aligning with key UN SDGs. In this paper, we present a novel approach to the prediction and long-term forecasting of battery degradation using Scientific Machine Learning framework which integrates domain knowledge with neural networks, offering more interpretable and scientifically grounded solutions for both predicting short-term battery health and forecasting degradation over extended periods. This hybrid approach captures both known and unknown degradation dynamics, improving predictive accuracy while reducing data requirements. We incorporate ground-truth data to inform our models, ensuring that both the predictions and forecasts reflect practical conditions. The model achieved MSE of 9.90 with the UDE and 11.55 with the NeuralODE, in experimental data, a loss of 1.6986 with the UDE, and a MSE of 2.49 in the NeuralODE, demonstrating the enhanced precision of our approach. This integration of data-driven insights with SciML's strengths in interpretability and scalability allows for robust battery management. By enhancing battery longevity and minimizing waste, our approach contributes to the sustainability of energy systems and accelerates the global transition toward cleaner, more responsible energy solutions, aligning with the UN's SDG agenda.
- Asia > China (0.04)
- North America > United States > California > Santa Clara County > Cupertino (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Transportation > Ground > Road (1.00)
- Transportation > Electric Vehicle (1.00)
- Energy > Energy Storage (1.00)
- (2 more...)
Fine-grained large-scale content recommendations for MSX sellers
Singh, Manpreet, Pasricha, Ravdeep, Kondapalli, Ravi Prasad, R, Kiran, Singh, Nitish, Agarwalla, Akshita, R, Manoj, Prabhakar, Manish, Boué, Laurent
One of the most critical tasks of Microsoft sellers is to meticulously track and nurture potential business opportunities through proactive engagement and tailored solutions. Recommender systems play a central role to help sellers achieve their goals. In this paper, we present a content recommendation model which surfaces various types of content (technical documentation, comparison with competitor products, customer success stories etc.) that sellers can share with their customers or use for their own self-learning. The model operates at the opportunity level which is the lowest possible granularity and the most relevant one for sellers. It is based on semantic matching between metadata from the contents and carefully selected attributes of the opportunities. Considering the volume of seller-managed opportunities in organizations such as Microsoft, we show how to perform efficient semantic matching over a very large number of opportunity-content combinations. The main challenge is to ensure that the top-5 relevant contents for each opportunity are recommended out of a total of $\approx 40,000$ published contents. We achieve this target through an extensive comparison of different model architectures and feature selection. Finally, we further examine the quality of the recommendations in a quantitative manner using a combination of human domain experts as well as by using the recently proposed "LLM as a judge" framework.
Learning IMM Filter Parameters from Measurements using Gradient Descent
Brandenburger, André, Hoffmann, Folker, Charlish, Alexander
The performance of data fusion and tracking algorithms often depends on parameters that not only describe the sensor system, but can also be task-specific. While for the sensor system tuning these variables is time-consuming and mostly requires expert knowledge, intrinsic parameters of targets under track can even be completely unobservable until the system is deployed. With state-of-the-art sensor systems growing more and more complex, the number of parameters naturally increases, necessitating the automatic optimization of the model variables. In this paper, the parameters of an interacting multiple model (IMM) filter are optimized solely using measurements, thus without necessity for any ground-truth data. The resulting method is evaluated through an ablation study on simulated data, where the trained model manages to match the performance of a filter parametrized with ground-truth values.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.41)
Interpreting wealth distribution via poverty map inference using multimodal data
Espín-Noboa, Lisette, Kertész, János, Karsai, Márton
Poverty maps are essential tools for governments and NGOs to track socioeconomic changes and adequately allocate infrastructure and services in places in need. Sensor and online crowd-sourced data combined with machine learning methods have provided a recent breakthrough in poverty map inference. However, these methods do not capture local wealth fluctuations, and are not optimized to produce accountable results that guarantee accurate predictions to all sub-populations. Here, we propose a pipeline of machine learning models to infer the mean and standard deviation of wealth across multiple geographically clustered populated places, and illustrate their performance in Sierra Leone and Uganda. These models leverage seven independent and freely available feature sources based on satellite images, and metadata collected via online crowd-sourcing and social media. Our models show that combined metadata features are the best predictors of wealth in rural areas, outperforming image-based models, which are the best for predicting the highest wealth quintiles. Our results recover the local mean and variation of wealth, and correctly capture the positive yet non-monotonous correlation between them. We further demonstrate the capabilities and limitations of model transfer across countries and the effects of data recency and other biases. Our methodology provides open tools to build towards more transparent and interpretable models to help governments and NGOs to make informed decisions based on data availability, urbanization level, and poverty thresholds.
- Africa > Uganda (0.25)
- Africa > Sierra Leone (0.25)
- North America > United States > Texas > Travis County > Austin (0.05)
- (7 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Banking & Finance (1.00)
- Information Technology > Services (0.93)
- Government (0.93)
Enhancing Product Safety in E-Commerce with NLP
Halder, Kishaloy, Krapac, Josip, Goryunov, Dmitry, Brew, Anthony, Lyra, Matti, Dizdari, Alsida, Gillett, William, Renahy, Adrien, Tang, Sinan
Ensuring safety of the products offered to the customers is of paramount importance to any e- commerce platform. Despite stringent quality and safety checking of products listed on these platforms, occasionally customers might receive a product that can pose a safety issue arising out of its use. In this paper, we present an innovative mechanism of how a large scale multinational e-commerce platform, Zalando, uses Natural Language Processing techniques to assist timely investigation of the potentially unsafe products mined directly from customer written claims in unstructured plain text. We systematically describe the types of safety issues that concern Zalando customers. We demonstrate how we map this core business problem into a supervised text classification problem with highly imbalanced, noisy, multilingual data in a AI-in-the-loop setup with a focus on Key Performance Indicator (KPI) driven evaluation. Finally, we present detailed ablation studies to show a comprehensive comparison between different classification techniques. We conclude the work with how this NLP model was deployed.
- Europe > Switzerland (0.05)
- Europe > Belgium (0.05)
- Europe > Netherlands > Limburg > Maastricht (0.04)
- (5 more...)
Cooperate or Compete: A New Perspective on Training of Generative Networks
Babu, Ch. Sobhan, Guravannavar, Ravindra, Hulgeri, Arvind
GANs have two competing modules: the generator module is trained to generate new examples, and the discriminator module is trained to discriminate real examples from generated examples. The training procedure of GAN is modeled as a finitely repeated simultaneous game. Each module tries to increase its performance at every repetition of the base game (at every batch of training data) in a non-cooperative manner. We observed that each module can perform better and learn faster if training is modeled as an infinitely repeated simultaneous game. At every repetition of the base game (at every batch of training data) the stronger module (whose performance is increased or remains the same compared to the previous batch of training data) cooperates with the weaker module (whose performance is decreased compared to the previous batch of training data) and only the weaker module is allowed to increase its performance.
- North America > United States (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > India > Telangana > Hyderabad (0.04)
- Asia > India > Maharashtra > Pune (0.04)