Goto

Collaborating Authors

 inception



Cognitive Inception: Agentic Reasoning against Visual Deceptions by Injecting Skepticism

Zhao, Yinjie, Zhao, Heng, Wen, Bihan, Zhou, Joey Tianyi

arXiv.org Artificial Intelligence

As the development of AI-generated contents (AIGC), multi-modal Large Language Models (LLM) struggle to identify generated visual inputs from real ones. Such shortcoming causes vulnerability against visual deceptions, where the models are deceived by generated contents, and the reliability of reasoning processes is jeopardized. Therefore, facing rapidly emerging generative models and diverse data distribution, it is of vital importance to improve LLMs' generalizable reasoning to verify the authenticity of visual inputs against potential deceptions. Inspired by human cognitive processes, we discovered that LLMs exhibit tendency of over-trusting the visual inputs, while injecting skepticism could significantly improve the models visual cognitive capability against visual deceptions. Based on this discovery, we propose \textbf{Inception}, a fully reasoning-based agentic reasoning framework to conduct generalizable authenticity verification by injecting skepticism, where LLMs' reasoning logic is iteratively enhanced between External Skeptic and Internal Skeptic agents. To the best of our knowledge, this is the first fully reasoning-based framework against AIGC visual deceptions. Our approach achieved a large margin of performance improvement over the strongest existing LLM baselines and SOTA performance on AEGIS benchmark.


A Modified VGG19-Based Framework for Accurate and Interpretable Real-Time Bone Fracture Detection

Haque, Md. Ehsanul, Fahim, Abrar, Dey, Shamik, Jahan, Syoda Anamika, Islam, S. M. Jahidul, Rokoni, Sakib, Morshed, Md Sakib

arXiv.org Artificial Intelligence

Early and accurate detection of the bone fracture is paramount to initiating treatment as early as possible and avoiding any delay in patient treatment and outcomes. Interpretation of X-ray image is a time consuming and error prone task, especially when resources for such interpretation are limited by lack of radiology expertise. Additionally, deep learning approaches used currently, typically suffer from misclassifications and lack interpretable explanations to clinical use. In order to overcome these challenges, we propose an automated framework of bone fracture detection using a VGG-19 model modified to our needs. It incorporates sophisticated preprocessing techniques that include Contrast Limited Adaptive Histogram Equalization (CLAHE), Otsu's thresholding, and Canny edge detection, among others, to enhance image clarity as well as to facilitate the feature extraction. Therefore, we use Grad-CAM, an Explainable AI method that can generate visual heatmaps of the model's decision making process, as a type of model interpretability, for clinicians to understand the model's decision making process. It encourages trust and helps in further clinical validation. It is deployed in a real time web application, where healthcare professionals can upload X-ray images and get the diagnostic feedback within 0.5 seconds. The performance of our modified VGG-19 model attains 99.78\% classification accuracy and AUC score of 1.00, making it exceptionally good. The framework provides a reliable, fast, and interpretable solution for bone fracture detection that reasons more efficiently for diagnoses and better patient care.


Memento: Note-Taking for Your Future Self

Wan, Chao, Gong, Albert, Mishra, Mihir, Henneking, Carl-Leander, Beger, Claas, Weinberger, Kilian Q.

arXiv.org Artificial Intelligence

Large language models (LLMs) excel at reasoning-only tasks, but struggle when reasoning must be tightly coupled with retrieval, as in multi-hop question answering. To overcome these limitations, we introduce a prompting strategy that first decomposes a complex question into smaller steps, then dynamically constructs a database of facts using LLMs, and finally pieces these facts together to solve the question. We show how this three-stage strategy, which we call Memento, can boost the performance of existing prompting strategies across diverse settings. On the 9-step PhantomWiki benchmark, Memento doubles the performance of chain-of-thought (CoT) when all information is provided in context. On the open-domain version of 2WikiMultiHopQA, CoT-RAG with Memento improves over vanilla CoT-RAG by more than 20 F1 percentage points and over the multi-hop RAG baseline, IRCoT, by more than 13 F1 percentage points. On the challenging MuSiQue dataset, Memento improves ReAct by more than 3 F1 percentage points, demonstrating its utility in agentic settings.


Reliable Conversational Agents under ASP Control that Understand Natural Language

Zeng, Yankai

arXiv.org Artificial Intelligence

Conversational agents are designed to understand dialogs and generate meaningful responses to communicate with humans. After the popularity of ChatGPT, with its surprising performance and powerful conversational ability, commercial Large Language Models (LLMs) for general NLP tasks such as GPT-4 [1], etc., sprung up and brought the generative AI as a solution to the public view. These LLMs work quite well in content generation tasks, but their deficiency in fact-and-knowledge-oriented tasks is wellestablished by now [13]. These models themselves cannot tell whether the text they generate is based on facts or made-up stories, and they cannot always follow the given data and rules strictly and sometimes even modify the data at will, also called hallucination. The reasoning that these LLMs appear to perform is also at a very shallow level.


MIT maps how the brain experiences movies

Popular Science

Our brains have to do a lot of work when we watch a movie. There are plots to follow, dialogue to interpret, visuals to take in, and more. Now, scientists have created a detailed map of how the human brain functions during the process. Using data from functional magnetic resonance imaging (fMRI), a team from Massachusetts Institute of Technology mapped what different brain networks activate when subjects watch clips from a range of movies. They also saw how different executive networks in the brains are prioritized when watching easy versus difficult scenes.


Using deep learning to help distinguish dark matter from cosmic noise

AIHub

Gravity makes dark matter clump into dense halos, indicated by bright patches, where galaxies form. In this simulation, a halo like the one that hosts the Milky Way forms and a smaller halo resembling the Large Magellanic Cloud falls toward it. SLAC and Stanford researchers, working with collaborators from the Dark Energy Survey, have used simulations like these to better understand the connection between dark matter and galaxy formation. Dark matter is the invisible force holding the universe together – or so we think. It makes up around 85% of all matter and around 27% of the universe's contents, but since we can't see it directly, we have to study its gravitational effects on galaxies and other cosmic structures.


Lucid dreaming: The bizarre ability to control your DREAMS - and the three tricks that could allow you to try it

Daily Mail - Science & tech

The idea of controlling your dreams might sound like the plot of the latest science fiction blockbuster. But this mysterious gift is a reality for around 20 per cent of people, who are able to go on exciting trips in impossible worlds. Depicted in films such as'Inception', lucid dreaming could provide a useful link between the real world and the dream world. Scientists are trying to tap into the potential of lucid dreaming, helping people complete tasks like turning on lights or even driving virtual cars while asleep. Here are three tricks that could allow you to try it for yourself.


AI ranks EVERY Christopher Nolan movie - after director took home first-ever Oscar for Oppenheimer... so do YOU agree with ChatGPT?

Daily Mail - Science & tech

'Oppenheimer' swept away the competition at the 2024 Oscars, receiving seven awards including earning renowned director Christopher Nolan his first golden man statuette. While this is the filmmaker's first major award-winning film, he has been producing movies since 1998 when he made Following - and has made 10 more since. We asked ChatGPT to rank his other 11 films dating back to 26 years to the'Following' and 2010 film'Inception' up through his his 2012 film'The Dark Knight Rises' and his 2020 film'Tenet.' Renowned director Christopher Nolan took home his first Oscar for his critically acclaimed film, ' Oppenheimer.' The historic film starred Cillian Murphy as J Robert Oppenheimer, the director of the Los Alamos lab that designed and built the world's first atomic bomb during World War II - he is often known as the'father of the atomic bomb' Oppenheimer swept the box office when it was released on July 21, 2023, reeling in a whopping 82.4 million in opening weekend, winning Nolan Best Picture and Best Director during Sunday's award show.


Why creating an international body for AI is a bad idea

FOX News

Jessica Melugin, Competitive Enterprise Institute Director of Center for Technology and Innovation, discusses Twitter accusing Meta of stealing trade secrets and a New York City law requiring businesses to audit A.I. hiring tools. Former Google CEO Eric Schmidt recently re-upped his calls for a global body, akin to the Intergovernmental Panel on Climate Change (IPCC), to advise member nations on regulating artificial intelligence (AI). Schmidt first made his case for an "International Panel on AI Safety" – an "IPCC for AI," if you will – in an October 2023 op-ed in the Financial Times. He writes of the AI panel's potential to be an, "an independent, expert-led body empowered to objectively inform governments about the current state of AI capabilities and make evidence-based predictions." He claims that AI policy makers, "are looking for impartial, technically reliable and timely assessments about its speed of progress and impact."