AITopics | Wang, Hongyu

Collaborating Authors

Wang, Hongyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Ma, Shuming, Wang, Hongyu, Ma, Lingxiao, Wang, Lei, Wang, Wenhui, Huang, Shaohan, Dong, Li, Wang, Ruiping, Xue, Jilong, Wei, Furu

arXiv.org Artificial IntelligenceFeb-27-2024

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2402.17764

Country:

North America > United States > Hawaii (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Energy (0.36)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

PLE-SLAM: A Visual-Inertial SLAM Based on Point-Line Features and Efficient IMU Initialization

He, Jiaming, Li, Mingrui, Wang, Yangyang, Wang, Hongyu

arXiv.org Artificial IntelligenceJan-5-2024

Visual-inertial SLAM is crucial in various fields, such as aerial vehicles, industrial robots, and autonomous driving. The fusion of camera and inertial measurement unit (IMU) makes up for the shortcomings of a signal sensor, which significantly improves the accuracy and robustness of localization in challenging environments. This article presents PLE-SLAM, an accurate and real-time visual-inertial SLAM algorithm based on point-line features and efficient IMU initialization. First, we use parallel computing methods to extract features and compute descriptors to ensure real-time performance. Adjacent short line segments are merged into long line segments, and isolated short line segments are directly deleted. Second, a rotation-translation-decoupled initialization method is extended to use both points and lines. Gyroscope bias is optimized by tightly coupling IMU measurements and image observations. Accelerometer bias and gravity direction are solved by an analytical method for efficiency. To improve the system's intelligence in handling complex environments, a scheme of leveraging semantic information and geometric constraints to eliminate dynamic features and A solution for loop detection and closed-loop frame pose estimation using CNN and GNN are integrated into the system. All networks are accelerated to ensure real-time performance. The experiment results on public datasets illustrate that PLE-SLAM is one of the state-of-the-art visual-inertial SLAM systems.

machine learning, natural language, real time system, (21 more...)

arXiv.org Artificial Intelligence

2401.01081

Country: Asia > China (0.14)

Genre: Research Report (0.69)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
(3 more...)

Add feedback

DDN-SLAM: Real-time Dense Dynamic Neural Implicit SLAM with Joint Semantic Encoding

Li, Mingrui, He, Jiaming, Jiang, Guangan, Wang, Hongyu

arXiv.org Artificial IntelligenceJan-3-2024

We propose DDN-SLAM, a real-time dense neural implicit These SLAM systems outperform traditional SLAM methods semantic SLAM system designed for dynamic scenes. in terms of texture details, memory consumption, noise While existing neural implicit SLAM systems perform well handling, and outlier processing. in static scenes, they often encounter challenges in realworld Although current neural implicit SLAM systems have environments with dynamic interferences, leading to achieved good reconstruction results in static scenes [8, ineffective tracking and mapping. DDN-SLAM utilizes the 24, 40, 78], many real-world environments are often affected priors provided by the deep semantic system, combined with by dynamic objects, especially in applications such conditional probability fields, for segmentation.By constructing as robotics or autonomous driving, which involve complex depth-guided static masks and employing joint physical environments and may also have low-texture areas multi-resolution hashing encoding, we ensure fast hole filling or significant changes in lighting and viewing angles. Current and high-quality mapping while mitigating the effects neural implicit SLAM systems are unable to achieve of dynamic information interference. To enhance tracking effective tracking and reliable reconstruction in such environments.

artificial intelligence, computer vision, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2401.01545

Country:

Asia > China (0.14)
Europe > Switzerland (0.14)

Genre: Research Report (0.82)

Industry:

Transportation (0.34)
Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

End-User Puppeteering of Expressive Movements

Wang, Hongyu, Martelaro, Nikolas

arXiv.org Artificial IntelligenceDec-3-2023

The end-user programming of social robot behavior is usually limited by a predefined set of movements. We are proposing a puppeteering robotic interface that provides a more intuitive method of programming robot expressive movements. As the user manipulates the puppet of a robot, the actual robot replicates the movements, providing real-time visual feedback. Through this proposed interface, even with limited training, a novice user can design and program expressive movements efficiently. We present our preliminary user study results in this extended abstract.

artificial intelligence, participant, robot, (15 more...)

arXiv.org Artificial Intelligence

2207.12544

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre:

Research Report (0.82)
Questionnaire & Opinion Survey (0.71)

Technology: Information Technology > Artificial Intelligence > Robots > Robots in the Home (0.37)

Add feedback

Feature Extractor Stacking for Cross-domain Few-shot Learning

Wang, Hongyu, Frank, Eibe, Pfahringer, Bernhard, Mayo, Michael, Holmes, Geoffrey

arXiv.org Artificial IntelligenceOct-24-2023

Cross-domain few-shot learning (CDFSL) addresses learning problems where knowledge needs to be transferred from one or more source domains into an instance-scarce target domain with an explicitly different distribution. Recently published CDFSL methods generally construct a universal model that combines knowledge of multiple source domains into one feature extractor. This enables efficient inference but necessitates re-computation of the extractor whenever a new source domain is added. Some of these methods are also incompatible with heterogeneous source domain extractor architectures. We propose feature extractor stacking (FES), a new CDFSL method for combining information from a collection of extractors, that can utilise heterogeneous pretrained extractors out of the box and does not maintain a universal model that needs to be re-computed when its extractor collection is updated. We present the basic FES algorithm, which is inspired by the classic stacked generalisation approach, and also introduce two variants: convolutional FES (ConFES) and regularised FES (ReFES). Given a target-domain task, these algorithms fine-tune each extractor independently, use cross-validation to extract training data for stacked generalisation from the support set, and learn a simple linear stacking classifier from this data. We evaluate our FES methods on the well-known Meta-Dataset benchmark, targeting image classification with convolutional neural networks, and show that they can achieve state-of-the-art performance.

artificial intelligence, cross-domain few-shot learning, machine learning, (1 more...)

arXiv.org Artificial Intelligence

2205.05831

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback

BitNet: Scaling 1-bit Transformers for Large Language Models

Wang, Hongyu, Ma, Shuming, Dong, Li, Huang, Shaohan, Wang, Huaijie, Ma, Lingxiao, Yang, Fan, Wang, Ruiping, Wu, Yi, Wei, Furu

arXiv.org Artificial IntelligenceOct-17-2023

The increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption. In this work, we introduce BitNet, a scalable and stable 1-bit Transformer architecture designed for large language models. Specifically, we introduce BitLinear as a drop-in replacement of the nn.Linear layer in order to train 1-bit weights from scratch. Experimental results on language modeling show that BitNet achieves competitive performance while substantially reducing memory footprint and energy consumption, compared to state-of-the-art 8-bit quantization methods and FP16 Transformer baselines. Furthermore, BitNet exhibits a scaling law akin to full-precision Transformers, suggesting its potential for effective scaling to even larger language models while maintaining efficiency and performance benefits.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.11453

Country:

North America > United States > Hawaii (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.82)

Industry: Energy (0.91)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine

Zhang, Chenrui, Liu, Lin, Wang, Jinpeng, Wang, Chuyuan, Sun, Xiao, Wang, Hongyu, Cai, Mingchen

arXiv.org Artificial IntelligenceAug-23-2023

As an effective tool for eliciting the power of Large Language Models (LLMs), prompting has recently demonstrated unprecedented abilities across a variety of complex tasks. To further improve the performance, prompt ensemble has attracted substantial interest for tackling the hallucination and instability of LLMs. However, existing methods usually adopt a two-stage paradigm, which requires a pre-prepared set of prompts with substantial manual effort, and is unable to perform directed optimization for different weak learners. In this paper, we propose a simple, universal, and automatic method named PREFER (Pompt Ensemble learning via Feedback-Reflect-Refine) to address the stated limitations. Specifically, given the fact that weak learners are supposed to focus on hard examples during boosting, PREFER builds a feedback mechanism for reflecting on the inadequacies of existing weak learners. Based on this, the LLM is required to automatically synthesize new prompts for iterative refinement. Moreover, to enhance stability of the prompt effect evaluation, we propose a novel prompt bagging method involving forward and backward thinking, which is superior to majority voting and is beneficial for both feedback and weight calculation in boosting. Extensive experiments demonstrate that our PREFER achieves state-of-the-art performance in multiple types of tasks by a significant margin. We have made our code publicly available.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2308.12033

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

TorchScale: Transformers at Scale

Ma, Shuming, Wang, Hongyu, Huang, Shaohan, Wang, Wenhui, Chi, Zewen, Dong, Li, Benhaim, Alon, Patra, Barun, Chaudhary, Vishrav, Song, Xia, Wei, Furu

arXiv.org Artificial IntelligenceNov-23-2022

Large Transformers have achieved state-of-the-art performance across many tasks. Most open-source libraries on scaling Transformers focus on improving training or inference with better parallelization. In this work, we present TorchScale, an open-source toolkit that allows researchers and developers to scale up Transformers efficiently and effectively. TorchScale has the implementation of several modeling techniques, which can improve modeling generality and capability, as well as training stability and efficiency. Experimental results on language modeling and neural machine translation demonstrate that TorchScale can successfully scale Transformers to different sizes without tears. The library is available at https://aka.ms/torchscale.

artificial intelligence, natural language, transformer, (15 more...)

arXiv.org Artificial Intelligence

2211.13184

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

The Graph-Based Behavior-Aware Recommendation for Interactive News

Ma, Mingyuan, Na, Sen, Wang, Hongyu, Chen, Congzhou, Xu, Jin

arXiv.org Machine LearningMay-20-2021

Interactive news recommendation has been launched and attracted much attention recently. In this scenario, user's behavior evolves from single click behavior to multiple behaviors including like, comment, share etc. However, most of the existing methods still use single click behavior as the unique criterion of judging user's preferences. Further, although heterogeneous graphs have been applied in different areas, a proper way to construct a heterogeneous graph for interactive news data with an appropriate learning mechanism on it is still desired. To address the above concerns, we propose a graph-based behavior-aware network, which simultaneously considers six different types of behaviors as well as user's demand on the news diversity. We have three main steps. First, we build an interaction behavior graph for multi-level and multi-category data. Second, we apply DeepWalk on the behavior graph to obtain entity semantics, then build a graph-based convolutional neural network called G-CNN to learn news representations, and an attention-based LSTM to learn behavior sequence representations. Third, we introduce core and coritivity features for the behavior graph, which measure the concentration degree of user's interests. These features affect the trade-off between accuracy and diversity of our personalized recommendation system. Taking these features into account, our system finally achieves recommending news to different users at their different levels of concentration degrees.

deep learning, neural network, representation, (18 more...)

arXiv.org Machine Learning

1812.00002

Country:

Asia > China (0.28)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (1.00)

Industry:

Media (0.93)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Classification of Smoking and Calling using Deep Learning

Wang, Miaowei, Mohacey, Alexander William, Wang, Hongyu, Apfel, James

arXiv.org Artificial IntelligenceDec-14-2020

Since 2014, very deep convolutional neural networks have been proposed and become the must-have weapon for champions in all kinds of competition. In this report, a pipeline is introduced to perform the classification of smoking and calling by modifying the pretrained inception V3. Brightness enhancing based on deep learning is implemented to improve the classification of this classification task along with other useful training tricks. Based on the quality and quantity results, it can be concluded that this pipeline with small biased samples is practical and useful with high accuracy.

classification, deep learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

2012.08026

Country: North America > United States (1.00)

Genre: Research Report (0.40)

Industry:

Government > Regional Government > North America Government > United States Government (0.72)
Health & Medicine > Therapeutic Area (0.70)
Health & Medicine > Public Health (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback