Goto

Collaborating Authors

 sagar


Vision-Based Localization and LLM-based Navigation for Indoor Environments

arXiv.org Artificial Intelligence

Indoor navigation remains a complex challenge due to the absence of reliable GPS signals and the architectural intricacies of large enclosed environments. This study presents an indoor localization and navigation approach that integrates vision-based localization with large language model (LLM)-based navigation. The localization system utilizes a ResNet-50 convolutional neural network fine-tuned through a two-stage process to identify the user's position using smartphone camera input. To complement localization, the navigation module employs an LLM, guided by a carefully crafted system prompt, to interpret preprocessed floor plan images and generate step-by-step directions. Experimental evaluation was conducted in a realistic office corridor with repetitive features and limited visibility to test localization robustness. The model achieved high confidence and an accuracy of 96% across all tested waypoints, even under constrained viewing conditions and short-duration queries. Navigation tests using ChatGPT on real building floor maps yielded an average instruction accuracy of 75%, with observed limitations in zero-shot reasoning and inference time. This research demonstrates the potential for scalable, infrastructure-free indoor navigation using off-the-shelf cameras and publicly available floor plans, particularly in resource-constrained settings like hospitals, airports, and educational institutions.


Dating burnout: meet the people who ditched the apps โ€“ and found love offline

The Guardian

When Georgie Thorogood's date made a sleazy joke about "horsey girls carrying whips", she knew it was time to make a hasty exit. After meeting Tom through a dating app in the summer of 2021, she had been hoping for some polite conversation over a few drinks, maybe some romantic chemistry if she was lucky. What she got was a two-hour rant about his ex-wife and some creepy innuendo. "I knew straight away he wasn't for me. I politely told him I didn't want to see him again, but he took the rejection really badly. I work in music communications and at the time I was setting up a festival. He started getting aggressive and telling me that I was destined to fail," she says.


YouTube's new documentary demystifying artificial intelligence features Robert Downey Jr. and an AI baby

#artificialintelligence

YouTube has launched a new free-to-watch documentary series about artificial intelligence fronted by "Iron Man" star Robert Downey Jr. The YouTube Original series debuted on the platform on Wednesday, and is called "The Age of AI." Its stated aim is to demystify misconceptions around AI. One of the main focuses of the first episode is a New Zealand-based company called Soul Machines, which specialises in making digital avatars. Its founder Mark Sagar is an award-winning visual effects artist who's worked on films like "Rise of the Planet of the Apes" and "Avatar." Sagar is working on a project he calls "Baby X," in which he is using AI to simulate a human baby, modelled after his own daughter.


YouTube's series on AI with Robert Downey Jr. is finally available

#artificialintelligence

In the debut episode "How Far is Too Far?," the actor introduces viewers to Soul Machines CEO Mark Sagar. Sagar is a special effects artist who worked on films like King Kong and Avatar who is using his expertise in animating faces to create an AI-animated digital avatar for the Black Eyed Peas' will.i.am. In the second episode, meanwhile, we learn about Project Euphonia, a speech assistance tool Google showed off at I/O 2019. This episode features former NFL linebacker Tim Shaw who has ALS and difficulty moving and speaking as a result of his condition. We see a prototype of Euphonia in action, and Shaw help Google record voice samples for the AI to try and interpret his speech.


Robert Downey Jr launches YouTube doc with AI baby

#artificialintelligence

YouTube has premiered a Robert Downey Jr-fronted series that seeks to demystify artificial intelligence. The documentary - produced by the actor in partnership with his wife Susan - is one of the platform's highest profile and biggest-budgeted factual commissions to date. The Avengers star is expected to give the Age of AI mass appeal. One AI expert said there was "lots of eye candy for viewers with short attention spans". Calum Chace, author of four books on the subject, added that artificial intelligence is a "large, complex, and important" subject. And he noted that YouTube - whose parent company Google is a huge investor and user of related technologies - had engaged with some of its controversies.


Indian Data Scientist Comes Up with Deep Learning Method of Predicting Bitcoin Prices in Real Time

#artificialintelligence

The cryptocurrency industry has a reputation for being volatile, unpredictable, and ever-changing. Predicting the way that the market would move could easily give an advantage to the everyday investor, and one data scientist believes that he's figured it out. Abinhav Sagar of the prestigious Vellore Institute of Technology recently stated that it is possible to use a Long Short-Term Memory (LSTM) neural network to predict these prices with real-world accuracy. Sagar published a blog about this exact method on December 2nd, showing the four steps he can take with the technology to create predictions in a "relatively unpredictable" market. The demonstration started with a comment from Sagar that the application of this machine learning tech has been relatively limited in the cryptocurrency sector, even though it has had some success in the stock market.


Data Scientist Uses Deep Learning to Predict BTC Price in Real-Time

#artificialintelligence

A data scientist at India's prestigious Vellore Institute of Technology has outlined a method for how to purportedly predict crypto prices in real-time using a Long Short-Term Memory (LSTM) neural network. In a blog post published on Dec. 2, researcher Abinhav Sagar demonstrated a four-step process for how to use machine learning technology to forecast prices in a sector he purported is "relatively unpredictable" as compared with traditional markets. Sagar prefaced his demonstration by noting that while machine learning has achieved some success in predicting stock market prices, its application in the cryptocurrency field has been restricted. In support of this claim, he argued that cryptocurrency prices fluctuate in accordance with fast-paced technological developments, as well as economic, security and political factors. Sagar's four-step proposed method involves 1) collecting real-time cryptocurrency data; 2) preparing the data for neural network training; 3) testing the prediction using the LSTM neural network; 4) visualizing the results of the prediction.


Batch Norm Patent Granted To Google: Is AI Ownership The Gold Rush Of 21st Century?

#artificialintelligence

The machine learning community has witnessed a surge in releases of frameworks, libraries and software. Tech pioneers like Google, Amazon, Microsoft and others have insisted their intention behind open-sourcing their technology. However, there has been a growing trend of these tech giants claiming ownership for their innovations. According to the National Bureau of Economic Research study, in 2010, there were 145 US patent filings that mentioned machine learning, compared to 594 in 2016. Google, especially, has filed patents related to machine learning and neural networks 99 times in 2016 alone.


DeepMind Generates High Fidelity Speech With GAN-TTS

#artificialintelligence

GANs have achieved state-of-the-art results in image and video generation, and have been successfully applied for unsupervised feature learning among many other applications. Generative adversarial networks have seen rapid development in recent years, however, their audio generation prowess has largely gone unnoticed. In an attempt to explore the audio generation abilities of GANs, a team of DeepMind researchers published a work where they introduce a new model called GAN-TTS. Text-to-Speech (TTS) is a process for converting text into a humanlike voice output. Many audio generation models operate in the waveform domain.


AI Genius Created a Virtual Baby Who Can Laugh, Cry and Play the Piano

#artificialintelligence

Mark Sagar, the artificial intelligence genius, has created a virtual reality baby who can read words from a book, laugh, cry, and even play the piano. Sagar's company, Soul Machines Ltd., is trying to humanize AI, writes Bloomberg. He thinks one key to making humans feel more connected to AI is to make the virtual beings more lifelike, reports Bloomberg. That's why Soul Machines' creations have human voices and can wince and grin. Eventually, Sagar would like to produce the first wave of "likable, believable virtual assistants that work as customer service agents and breathe life into hunks of plastic."