AITopics

2410.21521

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Virginia (0.04)
Asia > China (0.04)

Genre:

Overview (0.68)
Research Report (0.50)

Industry:

Education (0.66)
Leisure & Entertainment (0.55)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

arXiv.org Artificial IntelligenceDec-2-2024

From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding

Zou, Heqing, Luo, Tianze, Xie, Guiyang, Victor, null, Zhang, null, Lv, Fengmao, Wang, Guangcong, Chen, Junyang, Wang, Zhuochen, Zhang, Hansheng, Zhang, Huaijian

The integration of Large Language Models (LLMs) with visual encoders has recently shown promising performance in visual understanding tasks, leveraging their inherent capability to comprehend and generate human-like text for visual reasoning. Given the diverse nature of visual data, MultiModal Large Language Models (MM-LLMs) exhibit variations in model designing and training for understanding images, short videos, and long videos. Our paper focuses on the substantial differences and unique challenges posed by long video understanding compared to static image and short video understanding. Unlike static images, short videos encompass sequential frames with both spatial and within-event temporal information, while long videos consist of multiple events with between-event and long-term temporal information. In this survey, we aim to trace and summarize the advancements of MM-LLMs from image understanding to long video understanding. We review the differences among various visual understanding tasks and highlight the challenges in long video understanding, including more fine-grained spatiotemporal details, dynamic events, and long-term dependencies. We then provide a detailed summary of the advancements in MM-LLMs in terms of model design and training methodologies for understanding long videos. Finally, we compare the performance of existing MM-LLMs on video understanding benchmarks of various lengths and discuss potential future directions for MM-LLMs in long video understanding.

arxiv preprint arxiv, video, zhang, (15 more...)

2409.18938

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Singapore (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningDec-2-2024

MEP-Net: Generating Solutions to Scientific Problems with Limited Knowledge by Maximum Entropy Principle

Yang, Wuyue, Peng, Liangrong, Li, Guojie, Hong, Liu

Maximum entropy principle (MEP) offers an effective and unbiased approach to inferring unknown probability distributions when faced with incomplete information, while neural networks provide the flexibility to learn complex distributions from data. This paper proposes a novel neural network architecture, the MEP-Net, which combines the MEP with neural networks to generate probability distributions from moment constraints. We also provide a comprehensive overview of the fundamentals of the maximum entropy principle, its mathematical formulations, and a rigorous justification for its applicability for non-equilibrium systems based on the large deviations principle. Through fruitful numerical experiments, we demonstrate that the MEP-Net can be particularly useful in modeling the evolution of probability distributions in biochemical reaction networks and in generating complex distributions from data.

constraint, mep-net, probability distribution, (13 more...)

arXiv.org Machine Learning

2412.0209

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > New York (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)
Asia > China > Fujian Province > Fuzhou (0.04)

Genre:

Overview (0.86)
Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Kunhoth, Suchithra, Maadeed, Somaya Al-, Akbari, Younes, Saady, Rafif Al

Computational Methods for Breast Cancer Molecular Profiling through Routine Histopathology: A Review

Precision medicine has become a central focus in breast cancer management, advancing beyond conventional methods to deliver more precise and individualized therapies. Traditionally, histopathology images have been used primarily for diagnostic purposes; however, they are now recognized for their potential in molecular profiling, which provides deeper insights into cancer prognosis and treatment response. Recent advancements in artificial intelligence (AI) have enabled digital pathology to analyze histopathologic images for both targeted molecular and broader omic biomarkers, marking a pivotal step in personalized cancer care. These technologies offer the capability to extract various biomarkers such as genomic, transcriptomic, proteomic, and metabolomic markers directly from the routine hematoxylin and eosin (H&E) stained images, which can support treatment decisions without the need for costly molecular assays. In this work, we provide a comprehensive review of AI-driven techniques for biomarker detection, with a focus on diverse omic biomarkers that allow novel biomarker discovery. Additionally, we analyze the major challenges faced in this field for robust algorithm development. These challenges highlight areas where further research is essential to bridge the gap between AI research and clinical application.

bioinformatics, machine learning, prediction, (18 more...)

2412.10392

Country:

Oceania > Australia (0.04)
North America > United States (0.04)
Europe > Sweden > Uppsala County > Uppsala (0.04)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Biomedical Informatics (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Anand, Avinash, Jaiswal, Raj, Dharmadhikari, Abhishek, Marathe, Atharva, Popat, Harsh Parimal, Mital, Harshil, Prasad, Kritarth, Shah, Rajiv Ratn, Zimmermann, Roger

Improving Multimodal LLMs Ability In Geometry Problem Solving, Reasoning, And Multistep Scoring

This paper presents GPSM4K, a comprehensive geometry multimodal dataset tailored to augment the problem-solving capabilities of Large Vision Language Models (LVLMs). GPSM4K encompasses 2157 multimodal question-answer pairs manually extracted from mathematics textbooks spanning grades 7-12 and is further augmented to 5340 problems, consisting of both numerical and theorem-proving questions. In contrast to PGPS9k, Geometry3K, and Geo170K which feature only objective-type questions, GPSM4K offers detailed step-by-step solutions in a consistent format, facilitating a comprehensive evaluation of problem-solving approaches. This dataset serves as an excellent benchmark for assessing the geometric reasoning capabilities of LVLMs. Evaluation of our test set shows that there is scope for improvement needed in open-source language models in geometry problem-solving. Finetuning on our training set increases the geometry problem-solving capabilities of models. Further, We also evaluate the effectiveness of techniques such as image captioning and Retrieval Augmentation generation (RAG) on model performance. We leveraged LLM to automate the task of final answer evaluation by providing ground truth and predicted solutions. This research will help to assess and improve the geometric reasoning capabilities of LVLMs.

large language model, machine learning, natural language, (19 more...)

2412.00846

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > Singapore (0.04)
(4 more...)

Genre:

Overview (0.93)
Research Report > New Finding (0.67)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Fazli, Mojtaba S., Quinn, Shannon

Object Tracking in a $360^o$ View: A Novel Perspective on Bridging the Gap to Biomedical Advancements

Object tracking is a fundamental tool in modern innovation, with applications in defense systems, autonomous vehicles, and biomedical research. It enables precise identification, monitoring, and spatiotemporal analysis of objects across sequential frames, providing insights into dynamic behaviors. In cell biology, object tracking is vital for uncovering cellular mechanisms, such as migration, interactions, and responses to drugs or pathogens. These insights drive breakthroughs in understanding disease progression and therapeutic interventions. Over time, object tracking methods have evolved from traditional feature-based approaches to advanced machine learning and deep learning frameworks. While classical methods are reliable in controlled settings, they struggle in complex environments with occlusions, variable lighting, and high object density. Deep learning models address these challenges by delivering greater accuracy, adaptability, and robustness. This review categorizes object tracking techniques into traditional, statistical, feature-based, and machine learning paradigms, with a focus on biomedical applications. These methods are essential for tracking cells and subcellular structures, advancing our understanding of health and disease. Key performance metrics, including accuracy, efficiency, and adaptability, are discussed. The paper explores limitations of current methods and highlights emerging trends to guide the development of next-generation tracking systems for biomedical research and broader scientific domains.

artificial intelligence, biomedical advancement, machine learning, (16 more...)

2412.01119

Country:

North America > United States > Georgia > Clarke County > Athens (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.13)
North America > United States > California > San Francisco County > San Francisco (0.13)
(12 more...)

Genre:

Workflow (1.00)
Research Report > Promising Solution (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

The Advancement of Personalized Learning Potentially Accelerated by Generative AI

Wei, Yuang, Jiang, Yuan-Hao, Liu, Jiayi, Qi, Changyong, Jia, Rui

The rapid development of Generative AI (GAI) has sparked revolutionary changes across various aspects of education. Personalized learning, a focal point and challenge in educational research, has also been influenced by the development of GAI. To explore GAI's extensive impact on personalized learning, this study investigates its potential to enhance various facets of personalized learning through a thorough analysis of existing research. The research comprehensively examines GAI's influence on personalized learning by analyzing its application across different methodologies and contexts, including learning strategies, paths, materials, environments, and specific analyses within the teaching and learning processes. Through this in-depth investigation, we find that GAI demonstrates exceptional capabilities in providing adaptive learning experiences tailored to individual preferences and needs. Utilizing different forms of GAI across various subjects yields superior learning outcomes. The article concludes by summarizing scenarios where GAI is applicable in educational processes and discussing strategies for leveraging GAI to enhance personalized learning, aiming to guide educators and learners in effectively utilizing GAI to achieve superior learning objectives.

large language model, machine learning, natural language, (15 more...)

2412.00691

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > Middle East > Jordan (0.04)
South America > Suriname > Marowijne District > Albina (0.04)
(4 more...)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
Research Report > New Finding (0.93)
Research Report > Promising Solution (0.68)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.86)

Kohlbrenner, Carson, Escobedo, Caleb, Nechyporenko, Nataliya, Roncone, Alessandro

A Sensor Position Localization Method for Flexible, Non-Uniform Capacitive Tactile Sensor Arrays

Tactile sensing is used in robotics to obtain real-time feedback during physical interactions. Fine object manipulation is a robotic application that benefits from a high density of sensors to accurately estimate object pose, whereas a low sensing resolution is sufficient for collision detection. Introducing variable sensing resolution into a single tactile sensing array can increase the range of tactile use cases, but also invokes challenges in localizing internal sensor positions. In this work, we present a mutual capacitance sensor array with variable sensor density, VARSkin, along with a localization method that determines the position of each sensor in the non-uniform array. When tested on two distinct artificial skin patches with concealed sensor layouts, our method achieves a localization accuracy within $\pm 2mm$. We also provide a comprehensive error analysis, offering strategies for further precision improvement.

artificial skin, localization method, sensor, (16 more...)

2412.00672

Country: North America > United States > Colorado > Boulder County > Boulder (0.14)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Health & Medicine (0.94)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

A Survey on Human-Centric LLMs

Wang, Jing Yi, Sukiennik, Nicholas, Li, Tong, Su, Weikang, Hao, Qianyue, Xu, Jingbo, Huang, Zihan, Xu, Fengli, Li, Yong

The rapid evolution of large language models (LLMs) and their capacity to simulate human cognition and behavior has given rise to LLM-based frameworks and tools that are evaluated and applied based on their ability to perform tasks traditionally performed by humans, namely those involving cognition, decision-making, and social interaction. This survey provides a comprehensive examination of such human-centric LLM capabilities, focusing on their performance in both individual tasks (where an LLM acts as a stand-in for a single human) and collective tasks (where multiple LLMs coordinate to mimic group dynamics). We first evaluate LLM competencies across key areas including reasoning, perception, and social cognition, comparing their abilities to human-like skills. Then, we explore real-world applications of LLMs in human-centric domains such as behavioral science, political science, and sociology, assessing their effectiveness in replicating human behaviors and interactions. Finally, we identify challenges and future research directions, such as improving LLM adaptability, emotional intelligence, and cultural sensitivity, while addressing inherent biases and enhancing frameworks for human-AI collaboration. This survey aims to provide a foundational understanding of LLMs from a human-centric perspective, offering insights into their current capabilities and potential for future development.

arxiv preprint arxiv, llm, reasoning, (15 more...)

2411.14491

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
(8 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Government (1.00)
Education (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Hemmat, Arshia, Vadaei, Kianoosh, Heydari, Mohammad Hassan, Fatemi, Afsaneh

Leveraging Retrieval-Augmented Generation for Persian University Knowledge Retrieval

This paper introduces an innovative approach using Retrieval-Augmented Generation (RAG) pipelines with Large Language Models (LLMs) to enhance information retrieval and query response systems for university-related question answering. By systematically extracting data from the university official webpage and employing advanced prompt engineering techniques, we generate accurate, contextually relevant responses to user queries. We developed a comprehensive university benchmark, UniversityQuestionBench (UQB), to rigorously evaluate our system performance, based on common key metrics in the filed of RAG pipelines, assessing accuracy and reliability through various metrics and real-world scenarios. Our experimental results demonstrate significant improvements in the precision and relevance of generated responses, enhancing user experience and reducing the time required to obtain relevant answers. In summary, this paper presents a novel application of RAG pipelines and LLMs, supported by a meticulously prepared university benchmark, offering valuable insights into advanced AI techniques for academic data retrieval and setting the stage for future research in this domain.

dataset, information, student, (11 more...)

2411.06237

Country:

Asia > Middle East > Iran > Isfahan Province > Isfahan (0.05)
North America > United States > Massachusetts (0.04)
North America > United States > Washington > King County > Seattle (0.04)

Genre:

Research Report > New Finding (0.88)
Overview > Innovation (0.54)

Industry: Education > Educational Setting > Higher Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)