Goto

Collaborating Authors

 chatting


Robot Talk Episode 125 – Chatting with robots, with Gabriel Skantze

Robohub

Gabriel Skantze is a Professor of Speech Communication and Technology at KTH Royal Institute of Technology. He specializes in conversational systems and leads several research projects on conversational AI and human-robot interaction. His work focuses on computational models of spoken interaction, integrating both verbal and non-verbal aspects such as prosody, turn-taking, feedback, and joint attention. In 2014, he co-founded Furhat Robotics, where he continues to serve part-time as Chief Scientist.


ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model

Neural Information Processing Systems

Visual object tracking aims to locate a targeted object in a video sequence based on an initial bounding box. Recently, Vision-Language (VL) trackers have proposed to utilize additional natural language descriptions to enhance versatility in various applications. However, VL trackers are still inferior to State-of-The-Art (SoTA) visual trackers in terms of tracking performance. We found that this inferiority primarily results from their heavy reliance on manual textual annotations, which include the frequent provision of ambiguous language descriptions. In this paper, we propose ChatTracker to leverage the wealth of world knowledge in the Multimodal Large Language Model (MLLM) to generate high-quality language descriptions and enhance tracking performance.


Cosmos-LLaVA: Chatting with the Visual Cosmos-LLaVA: G\"orselle Sohbet Etmek

Zeer, Ahmed, Dogan, Eren, Erdem, Yusuf, Ince, Elif, Shbib, Osama, Uzun, M. Egemen, Uz, Atahan, Yuce, M. Kaan, Kesgin, H. Toprak, Amasyali, M. Fatih

arXiv.org Artificial Intelligence

In this study, a Turkish visual instruction model was developed and various model architectures and dataset combinations were analysed to improve the performance of this model. The Cosmos-LLaVA model, which is built by combining different large language models and image coders, is designed to overcome the deficiencies in the Turkish language. In the experiments, the effects of fine-tuning with various datasets on the model performance are analysed in detail. The results show that model architecture and dataset selection have a significant impact on performance. Bu \c{c}al{\i}\c{s}mada bir T\"urk\c{c}e g\"orsel talimat modeli geli\c{s}tirilerek bu modelin performans{\i}n{\i} art{\i}rmaya y\"onelik \c{c}e\c{s}itli model mimarileri ve veri k\"umesi kombinasyonlar{\i} derinlemesine incelenmi\c{s}tir. Farkl{\i} b\"uy\"uk dil modelleri ve g\"or\"unt\"u kodlay{\i}c{\i}lar{\i}n{\i}n bir araya getirilmesiyle olu\c{s}turulan Cosmos-LLaVA modeli, T\"urk\c{c}e dilindeki eksiklikleri gidermeye y\"onelik olarak tasarlanm{\i}\c{s}t{\i}r. Yap{\i}lan deneylerde, \c{c}e\c{s}itli veri k\"umeleri ile yap{\i}lan ince ayarlar{\i}n model performans{\i}n{\i} nas{\i}l etkiledi\u{g}i detayl{\i} olarak ele al{\i}nm{\i}\c{s}t{\i}r. Sonu\c{c}lar, model mimarisi ve veri k\"umesi se\c{c}iminin performans \"uzerinde \"onemli bir etkiye sahip oldu\u{g}unu g\"ostermektedir.


When You Call a Restaurant, You Might Be Chatting With an AI Host

WIRED

A pleasant female voice greets me over the phone. "Hi, I'm an assistant named Jasmine for Bodega," the voice says. "Do you have patio seating," I ask. Jasmine sounds a little sad as she tells me that unfortunately, the San Francisco–based Vietnamese restaurant doesn't have outdoor seating. Rather, her tone is a feature, a setting.


Top Things We Should Never Do While Chatting With ChatGPT

#artificialintelligence

Auto-GPT, also known as Automatic Generative Pre-training Transformer, is a state-of-the-art technology that can generate high-quality, human-like text based on a given prompt or input. Setting up Auto-GPT can be a complex process, but with the right tools and guidance, anyone can get started with this powerful technology. In this article, we will walk you through the steps of setting up Auto-GPT, from choosing a model to selecting a cloud service provider, to fine-tuning your model for optimal results. Step 1: Choose Your Auto-GPT Model The first step in setting up Auto-GPT is to choose a model that best fits your needs. There are several different models available, each with different capabilities and tradeoffs.


Chatting with the Future: Predictions for AI in the Next Decade - KDnuggets

#artificialintelligence

This one is a no-brainer. We've had ChatGPT, Google Bard and god knows what else has come out of the woodwork in the past month. So what is Natural Language Processing (NLP) and why did I mention ChatGPT and Google Bard? NLP is the process of helping computers understand text data. Learning a language is already difficult for us humans, so you can imagine how difficult it is to teach a computer to understand text data.



Chatting With Chat GPT About The Band – Yolo 69 420

#artificialintelligence

The steamiest, moist-est, stankiest thing to do on the interweb right now is to talk with Open AI's chatbot, Chat GPT. I figured I should say hi. Quick disclaimer: I have no insightful comments or opinions regarding this technology. Remember, I am a dummy. However, I will comment anyways, because that's my right as an Americuhhhnn and becuz, freeeedommmm.


How to Tell if You're Chatting with a Bot

#artificialintelligence

Artificial intelligence (AI) is invading every aspect of our lives. So much is ruled by algorithms you could make the case that the robot uprising has already occurred and we lost, and lost badly. Robots decide what song plays next, robots recommend TV shows, and robots are even getting pretty good at writing and creating music. Most of us are fine with that. AI tends to automate tasks we want automated, like the aforementioned streaming recommendations, and on some level we're all aware of these bots in our lives, so our interactions with them are more or less voluntary.


Chatting with Your Voice Assistant - Connected World

#artificialintelligence

I have a vision that voice assistants are evolving so quickly they are going to connect us to more than just hailing a cab or ordering some food. Do you have the same vision? In February I wrote a blog discussing how many of us hate, literally hate, the concept of Big Brother hovering over our lives and listening to our every word. Oh, how times have changed in a half a year. Now we have become a society that talks less about who's listening and instead about how fast we can order something with our voice and using voice assistants.