Pacific Ocean
Data Analyst at Honor - Remote
Honor exists to expand the world's capacity to care. We're combining high tech with high-touch to deliver better home care for aging adults, better jobs for Care Professionals, and entirely new offerings to support the aging journey, at scale. Founded in 2014, and now a Series E funded "Unicorn" valued at over $1B, Honor leads the world's largest home care network with the most advanced care platform. Our August 2021 acquisition of Home Instead has created a global company that's revolutionizing how society cares for older adults, their families, and Care Professionals. The Honor Care Platform combines local care and the most advanced technology to bring the highest quality care to more aging adults.
Automatic detection of aerial survey ground control points based on Yolov5-OBB
Chuanxiang, Cheng, Jia, Yang, Chao, Wang, Zhi, Zheng, Xiaopeng, Li, Di, Dong, Mengxia, Chang, Zhiheng, Zhuang
The use of ground control points (GCPs) for georeferencing is the most common strategy in unmanned aerial vehicle (UAV) photogrammetry, but at the same time their collection represents the most time-consuming and expensive part of UAV campaigns. Recently, deep learning has been rapidly developed in the field of small object detection. In this letter, to automatically extract coordinates information of ground control points (GCPs) by detecting GCP-markers in UAV images, we propose a solution that uses a deep learning-based architecture, YOLOv5-OBB, combined with a confidence threshold filtering algorithm and an optimal ranking algorithm. We applied our proposed method to a dataset collected by DJI Phantom 4 Pro drone and obtained good detection performance with the mean Average Precision (AP) of 0.832 and the highest AP of 0.982 for the cross-type GCP-markers. The proposed method can be a promising tool for future implementation of the end-to-end aerial triangulation process.
Data Games: A Game-Theoretic Approach to Swarm Robotic Data Collection
Akcin, Oguzhan, Li, Po-han, Agarwal, Shubhankar, Chinchali, Sandeep
Fleets of networked autonomous vehicles (AVs) collect terabytes of sensory data, which is often transmitted to central servers (the ''cloud'') for training machine learning (ML) models. Ideally, these fleets should upload all their data, especially from rare operating contexts, in order to train robust ML models. However, this is infeasible due to prohibitive network bandwidth and data labeling costs. Instead, we propose a cooperative data sampling strategy where geo-distributed AVs collaborate to collect a diverse ML training dataset in the cloud. Since the AVs have a shared objective but minimal information about each other's local data distribution and perception model, we can naturally cast cooperative data collection as an $N$-player mathematical game. We show that our cooperative sampling strategy uses minimal information to converge to a centralized oracle policy with complete information about all AVs. Moreover, we theoretically characterize the performance benefits of our game-theoretic strategy compared to greedy sampling. Finally, we experimentally demonstrate that our method outperforms standard benchmarks by up to $21.9\%$ on 4 perception datasets, including for autonomous driving in adverse weather conditions. Crucially, our experimental results on real-world datasets closely align with our theoretical guarantees.
CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification
Kim, Seungone, Joo, Se June, Jang, Yul, Chae, Hyungjoo, Yeo, Jinyoung
Chain-of-thought (CoT) prompting enables large language models (LLMs) to solve complex reasoning tasks by generating an explanation before the final prediction. Despite it's promising ability, a critical downside of CoT prompting is that the performance is greatly affected by the factuality of the generated explanation. To improve the correctness of the explanations, fine-tuning language models with explanation data is needed. However, there exists only a few datasets that can be used for such approaches, and no data collection tool for building them. Thus, we introduce CoTEVer, a tool-kit for annotating the factual correctness of generated explanations and collecting revision data of wrong explanations. Figure 1: Example of Explanation Verification and Answer Furthermore, we suggest several use cases Verification of GPT-3's output. Explanation Verification where the data collected with CoTEVer can requires additional knowledge which makes it be utilized for enhancing the faithfulness of hard for annotators to intuitively write a revised explanation explanations. Our toolkit is publicly available and answer.
Eigenvector University 2023 - Eigenvector
Eigenvector Research, Inc. is pleased to announce our 17th annual Eigenvector University. EigenU 2023 includes 16 short courses in chemical data science, i.e. chemometrics. This includes mathematical, statistical, machine learning and artificial intelligence methods as applied to problems in the analysis of data from chemistry and the life sciences. The courses are held in Seattle, USA at the Washington Athletic Club. EigenU also includes a Workshop Dinner, and a PowerUser Tips, Tricks & Poster Session.
TrafFormer: A Transformer Model for Predicting Long-term Traffic
Tedjopurnomo, David Alexander, Choudhury, Farhana M., Qin, A. K.
Traffic prediction is a flourishing research field due to its importance in human mobility in the urban space. Despite this, existing studies only focus on short-term prediction of up to few hours in advance, with most being up to one hour only. Long-term traffic prediction can enable more comprehensive, informed, and proactive measures against traffic congestion and is therefore an important task to explore. In this paper, we explore the task of long-term traffic prediction; where we predict traffic up to 24 hours in advance. We note the weaknesses of existing models--which are based on recurrent structures--for long-term traffic prediction and propose a modified Transformer model "TrafFormer". Experiments comparing our model with existing hybrid neural network models show the superiority of our model.
Evaluation of drain, a deep-learning approach to rain retrieval from gpm passive microwave radiometer
Viltard, Nicolas, Sambath, Vibolroth, Lepetit, Pierre, Martini, Audrey, Barthès, Laurent, Mallet, Cécile
LATMOS-IPSL, Université Paris-Saclay, UVSQ, CNRS, 78280, Guyancourt, France *Météo-France, Avenue Coriolis, Toulouse Abstract-- Retrieval of rain from Passive Microwave from about 52,000 images to about 103,000 allowing us radiometers data has been a challenge ever since the to build a training database of 70,000 images for training launch of the first Defense Meteorological Satellite and 33,000 images for validation. Enormous progress has been years 2014 to 2018 and a few months from 2020 and made since the launch of the Tropical Rainfall 2021 are used but the whole year 2019 was kept separate Measuring Mission (TRMM) in 1997 but until for the performance assessment (test) and most results recently the data were processed pixel-by-pixel or presented hereafter are computed for that year. Deep large database is meant to dampen the effects of learning has obtained remarkable improvement in seasonal and interannual variability of rain. the computer vision field, and offers a whole new Second, DRAIN retrieves now a set of 99 quantiles way to tackle the rain retrieval problem. The Global instead of a simple averaged rain rate as in [1]. These Precipitation Measurement (GPM) Core satellite quantiles represent the probability that the rain rate is carries similarly to TRMM, a passive microwave below a certain threshold.
LightCTS: A Lightweight Framework for Correlated Time Series Forecasting
Lai, Zhichen, Zhang, Dalin, Li, Huan, Jensen, Christian S., Lu, Hua, Zhao, Yan
Correlated time series (CTS) forecasting plays an essential role in many practical applications, such as traffic management and server load control. Many deep learning models have been proposed to improve the accuracy of CTS forecasting. However, while models have become increasingly complex and computationally intensive, they struggle to improve accuracy. Pursuing a different direction, this study aims instead to enable much more efficient, lightweight models that preserve accuracy while being able to be deployed on resource-constrained devices. To achieve this goal, we characterize popular CTS forecasting models and yield two observations that indicate directions for lightweight CTS forecasting. On this basis, we propose the LightCTS framework that adopts plain stacking of temporal and spatial operators instead of alternate stacking that is much more computationally expensive. Moreover, LightCTS features light temporal and spatial operator modules, called L-TCN and GL-Former, that offer improved computational efficiency without compromising their feature extraction capabilities. LightCTS also encompasses a last-shot compression scheme to reduce redundant temporal features and speed up subsequent computations. Experiments with single-step and multi-step forecasting benchmark datasets show that LightCTS is capable of nearly state-of-the-art accuracy at much reduced computational and storage overheads.
The ROOTS Search Tool: Data Transparency for LLMs
Piktus, Aleksandra, Akiki, Christopher, Villegas, Paulo, Laurençon, Hugo, Dupont, Gérard, Luccioni, Alexandra Sasha, Jernite, Yacine, Rogers, Anna
ROOTS is a 1.6TB multilingual text corpus developed for the training of BLOOM, currently the largest language model explicitly accompanied by commensurate data governance efforts. In continuation of these efforts, we present the ROOTS Search Tool: a search engine over the entire ROOTS corpus offering both fuzzy and exact search capabilities. ROOTS is the largest corpus to date that can be investigated this way. The ROOTS Search Tool is open-sourced and available on Hugging Face Spaces. We describe our implementation and the possible use cases of our tool.
Topic-Selective Graph Network for Topic-Focused Summarization
Due to the success of the pre-trained language model (PLM), existing PLM-based summarization models show their powerful generative capability. However, these models are trained on general-purpose summarization datasets, leading to generated summaries failing to satisfy the needs of different readers. To generate summaries with topics, many efforts have been made on topic-focused summarization. However, these works generate a summary only guided by a prompt comprising topic words. Despite their success, these methods still ignore the disturbance of sentences with non-relevant topics and only conduct cross-interaction between tokens by attention module. To address this issue, we propose a topic-arc recognition objective and topic-selective graph network. First, the topic-arc recognition objective is used to model training, which endows the capability to discriminate topics for the model. Moreover, the topic-selective graph network can conduct topic-guided cross-interaction on sentences based on the results of topic-arc recognition. In the experiments, we conduct extensive evaluations on NEWTS and COVIDET datasets. Results show that our methods achieve state-of-the-art performance.