Goto

Collaborating Authors

 Narita


WxC-Bench: A Novel Dataset for Weather and Climate Downstream Tasks

arXiv.org Artificial Intelligence

High-quality machine learning (ML)-ready datasets play a foundational role in developing new artificial intelligence (AI) models or fine-tuning existing models for scientific applications such as weather and climate analysis. Unfortunately, despite the growing development of new deep learning models for weather and climate, there is a scarcity of curated, pre-processed machine learning (ML)-ready datasets. Curating such high-quality datasets for developing new models is challenging particularly because the modality of the input data varies significantly for different downstream tasks addressing different atmospheric scales (spatial and temporal). Here we introduce WxC-Bench (Weather and Climate Bench), a multi-modal dataset designed to support the development of generalizable AI models for downstream use-cases in weather and climate research. WxC-Bench is designed as a dataset of datasets for developing ML-models for a complex weather and climate system, addressing selected downstream tasks as machine learning phenomenon. WxC-Bench encompasses several atmospheric processes from meso-$\beta$ (20 - 200 km) scale to synoptic scales (2500 km), such as aviation turbulence, hurricane intensity and track monitoring, weather analog search, gravity wave parameterization, and natural language report generation. We provide a comprehensive description of the dataset and also present a technical validation for baseline analysis. The dataset and code to prepare the ML-ready data have been made publicly available on Hugging Face -- https://huggingface.co/datasets/nasa-impact/WxC-Bench


Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

arXiv.org Artificial Intelligence

Offering a promising solution to the scalability challenges associated with human evaluation, the LLM-as-a-judge paradigm is rapidly gaining traction as an approach to evaluating large language models (LLMs). However, there are still many open questions about the strengths and weaknesses of this paradigm, and what potential biases it may hold. In this paper, we present a comprehensive study of the performance of various LLMs acting as judges. We leverage TriviaQA as a benchmark for assessing objective knowledge reasoning of LLMs and evaluate them alongside human annotations which we found to have a high inter-annotator agreement. Our study includes 9 judge models and 9 exam taker models -- both base and instruction-tuned. We assess the judge model's alignment across different model sizes, families, and judge prompts. Among other results, our research rediscovers the importance of using Cohen's kappa as a metric of alignment as opposed to simple percent agreement, showing that judges with high percent agreement can still assign vastly different scores. We find that both Llama-3 70B and GPT-4 Turbo have an excellent alignment with humans, but in terms of ranking exam taker models, they are outperformed by both JudgeLM-7B and the lexical judge Contains, which have up to 34 points lower human alignment. Through error analysis and various other studies, including the effects of instruction length and leniency bias, we hope to provide valuable lessons for using LLMs as judges in the future.


'A portal to a different world': a gamer's guide to visiting Japan

The Guardian

The experience of travelling in Japan is simultaneously overwhelming and freeing. The world feels bigger out there, gilded by how mainstream video game culture is in comparison with the west. It doesn't feel like a subculture; it is ordinary. For example, I walked into a FamilyMart for a snack one afternoon, and found a Legend of Zelda: Tears of the Kingdom promotional mushroom tart (which was delicious). The little bright-green payphones along the streets are the very same as those used in the Resident Services in Animal Crossing.


Practical Commercial 5G Standalone (SA) Uplink Throughput Prediction

arXiv.org Artificial Intelligence

The introduction of the 5G New Radio optimal for commercial 5G SA environment or has some (NR) network brings a huge uplift in uplink throughput over impracticality for real-world implementation. Mainly there the legacy 4G Long-Term Evolution (LTE) network, with the are two main approaches to tackle this problem; application promised target peak uplink throughput of 10 Gbps [1], which layer approach, where packet loss and delay are used for the should be sufficient for any multimedia application for many prediction [5][6][7], and physical layer approach, where RF years to come. However, that kind of throughput is highly and low-level parameters are used for the prediction. Some experimental and only achievable in the laboratory with a examples of the physical layer approaches are [8][9][10], controlled environment. One of the latest 5G RF Modem to be where the neural network had been used to accurately predict announced was the Qualcomm Snapdragon X75, which will the uplink throughput. However, many of the parameters be used by smartphone manufacturers from late-2023 onward, such as Resource Block Allocation (RB) and Transmission only supports the peak uplink throughput of 3.5 Gbps [2]. The Power (Tx Power) can't be accessed without modifying the latest showcase by Advanced Info Service (AIS), the Thailand's smartphone and installing specialized software, which renders largest Mobile Network Operator (MNO), and ZTE, one them impractical. Furthermore, frequency bands and duplex of the largest Radio Access Network (RAN) manufacturers, schemes are not taken into account.


Is GPT-3 a Good Data Annotator?

arXiv.org Artificial Intelligence

Evaluations show that GPT-3 has gained The democratization of artificial intelligence (AI) through pretraining a surprisingly wide range of (Garvey, 2018; Rubeis et al., 2022) aims to provide knowledge, which can be transferred to downstream access to AI technologies to all members of tasks through knowledge distillation (Kim society, including individuals, small-and mediumsized et al., 2022). We present some examples in Appendix enterprises (SMEs), academic research labs, A.12. Due to the model architecture and and nonprofit organizations. Achieving this goal is pretraining tasks designed for auto-regressive generation, crucial for the promotion of innovation, economic GPT-3 is capable of generating human-like growth, and fairness and equality. As typical AI text and performing a broad array of NLP tasks, models are usually data-hungry, one significant obstacle such as machine translation, summarization, and of AI democratization is the preparation of question-answering.


Airlines scramble to rejig schedules amid U.S. 5G rollout concerns

The Japan Times

Major international airlines rushed on Tuesday to rejig or cancel flights to the United States on the eve of a 5G wireless rollout that triggered safety concerns, despite two wireless carriers saying they will delay parts of the deployment. The Federal Aviation Administration has warned that potential 5G interference could affect height readings that play a key role in bad-weather landings on some jets and airlines say the Boeing 777 is among models initially in the spotlight. Despite an announcement by AT&T and Verizon that they would delay turning on some 5G towers near airports, several airlines still canceled flights. Others said more cancellations were likely unless the FAA issued new formal guidance in the wake of the wireless announcements. The world's largest operator of the Boeing 777, Dubai's Emirates, said it would suspend flights to nine U.S. destinations from Jan. 19, the planned date for the deployment of 5G wireless services.


Narita and Haneda airports start wider use of facial recognition

The Japan Times

Chiba – Japan's Narita and Haneda airports on Monday started the full-scale use of facial recognition, allowing international travelers to check in baggage and pass security checkpoints without showing passports or flight tickets. With the "Face Express" system aimed at speeding up the boarding process and providing a touchless experience for passengers, travelers need to have their photos taken at check-in when they register their passports and boarding passes upon arriving at the airports. After registering necessary data with special terminals, cameras at baggage check-in, security checkpoint entrances and boarding gates will automatically verify passengers' identity and allow them to pass through, Narita International Airport Corp. said. "The procedure (for boarding) ended quickly and the gate opened smoothly," said company employee Susumu Hayakawa, 29, before traveling on a Japan Airlines flight to Chicago from Narita Airport near Tokyo. The system fully came into service after Narita Airport started trialing the use of facial recognition in April, only involving airport staff and not actual travelers. It will also lead to reduced physical contact between travelers, machines, and airport and flight staff, helping to prevent the spread of virus infections, the airport operator has said.


What to know about the EU's facial recognition regulation

#artificialintelligence

The European Commission's (EC) proposed Artificial Intelligence (AI) regulation – a much-awaited piece of legislation – is out. While this text must still go through consultations within the EU before its adoption, the proposal already provides a good sense of how the EU considers the development of AI within the years to come: by following a risk-based approach to regulation. Other use-cases such as FRT for authentication processes are not part of the list of high-level risks and thus should require a lighter level of regulation. While technology providers have to maintain the highest level of performance and accuracy of their systems, this necessary step isn't the most critical to prevent harm. The EC doesn't detail any threshold of accuracy to meet, but rather requires a robust and documented risk-mitigation process designed to prevent harm.


AI-equipped guide panels make Tokyo area train station debuts

The Japan Times

Electronic panels equipped with artificial intelligence debuted Tuesday at major train stations in the Tokyo area to provide tourist and transfer information for a trial period, with the railway operator aiming to use them to make up for a future labor shortage. East Japan Railway Co. set up 30 panels at six stations in Tokyo and neighboring Chiba Prefecture for the demonstration, which lasts through late January. As a measure against the coronavirus, users do not have to touch the panels directly to operate them and some can automatically measure a passenger's temperature. Available in Japanese, English, Chinese and Korean, the displays can respond to voice questions and finger movements. They are installed at Shinjuku, Shinagawa, Ikebukuro and Takanawa Gateway stations in Tokyo as well as at two locations in Chiba, Kaihinmakuhari and the Airport Terminal 2 station at Narita Airport.


CES 2020: A smart city oasis

Robohub

Like the city that hosts the Consumer Electronics Show (CES) there is a lot of noise on the show floor. Sifting through the lights, sounds and people can be an arduous task even for the most experienced CES attendees. Hidden past the North Hall of the Las Vegas Convention Center (LVCC) is a walkway to a tech oasis housed in the Westgate Hotel. This new area hosting SmartCity/IoT innovations is reminiscent of the old Eureka Park complete with folding tables and ballroom carpeting. The fact that such enterprises require their own area separate from the main halls of the LVCC and the startup pavilions of the Sands Hotel is an indication of how urbanization is being redefined by artificial intelligence.