Overview
Neural Optimal Control using Learned System Dynamics
We study the problem of generating control laws for systems with unknown dynamics. Our approach is to represent the controller and the value function with neural networks, and to train them using loss functions adapted from the Hamilton-Jacobi-Bellman (HJB) equations. In the absence of a known dynamics model, our method first learns the state transitions from data collected by interacting with the system in an offline process. The learned transition function is then integrated to the HJB equations and used to forward simulate the control signals produced by our controller in a feedback loop. In contrast to trajectory optimization methods that optimize the controller for a single initial state, our controller can generate near-optimal control signals for initial states from a large portion of the state space. Compared to recent model-based reinforcement learning algorithms, we show that our method is more sample efficient and trains faster by an order of magnitude. We demonstrate our method in a number of tasks, including the control of a quadrotor with 12 state variables.
Unsupervised Deep Learning for IoT Time Series
Liu, Ya, Zhou, Yingjie, Yang, Kai, Wang, Xin
IoT time series analysis has found numerous applications in a wide variety of areas, ranging from health informatics to network security. Nevertheless, the complex spatial temporal dynamics and high dimensionality of IoT time series make the analysis increasingly challenging. In recent years, the powerful feature extraction and representation learning capabilities of deep learning (DL) have provided an effective means for IoT time series analysis. However, few existing surveys on time series have systematically discussed unsupervised DL-based methods. To fill this void, we investigate unsupervised deep learning for IoT time series, i.e., unsupervised anomaly detection and clustering, under a unified framework. We also discuss the application scenarios, public datasets, existing challenges, and future research directions in this area.
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Liang, Paul Pu, Zadeh, Amir, Morency, Louis-Philippe
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining three key principles of modality heterogeneity, connections, and interactions that have driven subsequent innovations, and propose a taxonomy of six core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy.
CLAM: Selective Clarification for Ambiguous Questions with Generative Language Models
Kuhn, Lorenz, Gal, Yarin, Farquhar, Sebastian
Users often ask dialogue systems ambiguous questions that require clarification. We show that current language models rarely ask users to clarify ambiguous questions and instead provide incorrect answers. To address this, we introduce CLAM: a framework for getting language models to selectively ask for clarification about ambiguous user questions. In particular, we show that we can prompt language models to detect whether a given question is ambiguous, generate an appropriate clarifying question to ask the user, and give a final answer after receiving clarification. We also show that we can simulate users by providing language models with privileged information. This lets us automatically evaluate multi-turn clarification dialogues. Finally, CLAM significantly improves language models' accuracy on mixed ambiguous and unambiguous questions relative to SotA.
Emerging Trends with AI in Radiology in 2023
One of the ongoing trends with the development of artificial intelligence (AI) modalities in radiology is an increasing emphasis on improving efficiencies with non-interpretative tasks. In a recent interview, Sonia Gupta, MD said the combination of AI and natural language processing can help retrieve and analyze key historical points from a patient's electronic medical record (EMR). Dr. Gupta emphasized that this technology could help streamline patient triage, radiology reporting and the management of incidental finding follow-up. "(With) incidental finding follow-up, AI reads the report and detects the findings that need follow-up … and can also insert the follow-up recommendations for you into the report. Again, this is all non-interpretative and natural language processing-based type of AI. I think that is a really great opportunity we can all utilize," maintained Dr. Gupta, an abdominal radiologist, a board member of the American Board of Artificial Intelligence in Medicine, and the chief medical officer of Change Healthcare.
Seeing the Fruit for the Leaves: Towards Automated Apple Fruitlet Thinning
Qureshi, Ans, Loh, Neville, Kwon, Young Min, Smith, David, Gee, Trevor, Bachelor, Oliver, McCulloch, Josh, Nejati, Mahla, Lim, JongYoon, Green, Richard, Ahn, Ho Seok, MacDonald, Bruce, Williams, Henry
Following a global trend, the lack of reliable access to skilled labour is causing critical issues for the effective management of apple orchards. One of the primary challenges is maintaining skilled human operators capable of making precise fruitlet thinning decisions. Thinning requires accurately measuring the true crop load for individual apple trees to provide optimal thinning decisions on an individual basis. A challenging task due to the dense foliage obscuring the fruitlets within the tree structure. This paper presents the initial design, implementation, and evaluation details of the vision system for an automatic apple fruitlet thinning robot to meet this need. The platform consists of a UR5 robotic arm and stereo cameras which enable it to look around the leaves to map the precise number and size of the fruitlets on the apple branches. We show that this platform can measure the fruitlet load on the apple tree to with 84% accuracy in a real-world commercial apple orchard while being 87% precise.
Dense RGB SLAM with Neural Implicit Maps
Li, Heng, Gu, Xiaodong, Yuan, Weihao, Yang, Luwei, Dong, Zilong, Tan, Ping
There is an emerging trend of using neural implicit functions for map representation in Simultaneous Localization and Mapping (SLAM). Some pioneer works have achieved encouraging results on RGB-D SLAM. In this paper, we present a dense RGB SLAM method with neural implicit map representation. To reach this challenging goal without depth input, we introduce a hierarchical feature volume to facilitate the implicit map decoder. This design effectively fuses shape cues across different scales to facilitate map reconstruction. Our method simultaneously solves the camera motion and the neural implicit map by matching the rendered and input video frames. To facilitate optimization, we further propose a photometric warping loss in the spirit of multi-view stereo to better constrain the camera pose and scene geometry. We evaluate our method on commonly used benchmarks and compare it with modern RGB and RGB-D SLAM systems. Our method achieves favorable results than previous methods and even surpasses some recent RGB-D SLAM methods.The code is at poptree.github.io/DIM-SLAM/.
Satisficing Paths and Independent Multi-Agent Reinforcement Learning in Stochastic Games
Yongacoglu, Bora, Arslan, Gürdal, Yüksel, Serdar
In multi-agent reinforcement learning (MARL), independent learners are those that do not observe the actions of other agents in the system. Due to the decentralization of information, it is challenging to design independent learners that drive play to equilibrium. This paper investigates the feasibility of using satisficing dynamics to guide independent learners to approximate equilibrium in stochastic games. For $\epsilon \geq 0$, an $\epsilon$-satisficing policy update rule is any rule that instructs the agent to not change its policy when it is $\epsilon$-best-responding to the policies of the remaining players; $\epsilon$-satisficing paths are defined to be sequences of joint policies obtained when each agent uses some $\epsilon$-satisficing policy update rule to select its next policy. We establish structural results on the existence of $\epsilon$-satisficing paths into $\epsilon$-equilibrium in both symmetric $N$-player games and general stochastic games with two players. We then present an independent learning algorithm for $N$-player symmetric games and give high probability guarantees of convergence to $\epsilon$-equilibrium under self-play. This guarantee is made using symmetry alone, leveraging the previously unexploited structure of $\epsilon$-satisficing paths.
What is Microsoft's Approach to AI?
At Microsoft, we believe artificial intelligence (AI) is the defining technology of our time. We have been on the forefront of cutting-edge research in AI and integrating these powerful, innovative AI technologies into our products and services to help customers do more. Microsoft AI, powered by Azure, provides billions of intelligent experiences every day in Windows, Xbox, Microsoft 365, Teams, Azure AI, Power Platform, Dynamics 365 and Microsoft Defender. Our AI tools and technologies are designed to benefit everyone at every level in every organization. They are used in workplaces, home offices, academic institutions, research labs and manufacturing facilities around the world, and they are helping everyone from scientists and salespeople to farmers, software developers and security practitioners.
AI in HCI Design and User Experience
The use of AI/ML capabilities for improving HCI/UX work and delivering better UX in solutions is becoming a trend (Abbas et al., 2022; Wu et al., 2019; Nikiforova et al., 2021) and creates many new opportunities for HCI/UX professionals (Holmquist, 2017; Yang et al., 2020). Some even speculate "AI/ML is the new UX" (Yang et al., 2018). Researchers proposed that AI can perform as an assistant, collaborator, researcher, or facilitator (Bertão & Joo, 2021; Main & Grierson, 2020). AI technology will change the role of designers in the design process and generate an opportunity for creative collaboration between AI and designers (McCormack et al., 2020). Also, companies are moving fast to adopt AI for improving customer experience (CX).