AITopics

2501.09686

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(4 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.87)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Education > Educational Setting (0.46)
Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Yang, Te-Lun, Liu, Jyi-Shane, Tseng, Yuen-Hsien, Jang, Jyh-Shing Roger

Knowledge Retrieval Based on Generative AI

arXiv.org Artificial IntelligenceJan-16-2025

This study develops a question-answering system based on Retrieval-Augmented Generation (RAG) using Chinese Wikipedia and Lawbank as retrieval sources. Using TTQA and TMMLU+ as evaluation datasets, the system employs BGE-M3 for dense vector retrieval to obtain highly relevant search results and BGE-reranker to reorder these results based on query relevance. The most pertinent retrieval outcomes serve as reference knowledge for a Large Language Model (LLM), enhancing its ability to answer questions and establishing a knowledge retrieval system grounded in generative AI. The system's effectiveness is assessed through a two-stage evaluation: automatic and assisted performance evaluations. The automatic evaluation calculates accuracy by comparing the model's auto-generated labels with ground truth answers, measuring performance under standardized conditions without human intervention. The assisted performance evaluation involves 20 finance-related multiple-choice questions answered by 20 participants without financial backgrounds. Initially, participants answer independently. Later, they receive system-generated reference information to assist in answering, examining whether the system improves accuracy when assistance is provided. The main contributions of this research are: (1) Enhanced LLM Capability: By integrating BGE-M3 and BGE-reranker, the system retrieves and reorders highly relevant results, reduces hallucinations, and dynamically accesses authorized or public knowledge sources. (2) Improved Data Privacy: A customized RAG architecture enables local operation of the LLM, eliminating the need to send private data to external servers. This approach enhances data security, reduces reliance on commercial services, lowers operational costs, and mitigates privacy risks.

accuracy, language model, llm, (14 more...)

2501.04635

Country: Asia > Taiwan (0.07)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.71)

EngadgetJan-15-2025, 19:03:19 GMT

Google brings real-time information from The Associated Press to Gemini

Google is partnering with The Associated Press to bring real-time information from the news agency to its Gemini app, the search giant announced on Wednesday. The financial terms of the agreement were not disclosed. The deal builds on an existing partnership Google had with The Associated Press to source real-time information for its search engine. "This will be particularly helpful to [Gemini app] users looking for up-to-date information," Google says of the deal. "AP and Google's longstanding relationship is based on working together to provide timely, accurate news and information to global audiences," said Kristin Heitmann, The Associated Press senior vice president and chief revenue officer.

bring real-time information, information, real-time information, (3 more...)

Engadget

Industry:

Information Technology > Services (0.63)
Media > News (0.61)

Technology:

Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.32)

PCWorldJan-15-2025, 18:12:34 GMT

Now you can instruct ChatGPT to do things in the future

OpenAI has now updated the ChatGPT with the new beta feature Tasks, which allows the AI chatbot to perform tasks at a later time. Users simply say what they need and when they need it. For example, a user can instruct ChatGPT to inform them of current stock prices every morning, remind them of their language studies every evening, or give them a daily 15-minute personal training session. The Tasks feature is currently being rolled out to Plus, Team, and Pro subscribers. It can be found in the model selector, where it is called "GPT-4o with Scheduled Activities (beta)."

instruct chatgpt

PCWorld

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.41)

EngadgetJan-15-2025, 14:42:42 GMT

Axios partners with OpenAI, forgetting the scorpion stung the frog

Axios is expanding its local newsletter presence from 30 to 34 cities. In its continued pretense of benefiting newsrooms, OpenAI has partnered with Axios in a three-year deal to cover Pittsburgh, Pennsylvania; Kansas City, Missouri; Boulder, Colorado; and Huntsville, Alabama. What does OpenAI get in exchange for its funding? Oh, just the ability to use Axios content to answer users' questions. Like the close to 20 newsrooms that OpenAI has already partnered with, Axios seems to have forgotten that the scorpion did end up stinging the frog.

axio partner, openai, scorpion, (3 more...)

Engadget

Country:

North America > United States > Missouri > Jackson County > Kansas City (0.62)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.28)
North America > United States > Colorado > Boulder County > Boulder (0.28)
North America > United States > Alabama > Madison County > Huntsville (0.28)

Industry: Media > News (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

EngadgetJan-15-2025, 14:03:13 GMT

NVIDIA's AI NPCs are a nightmare

The rise of AI NPCs has felt like a looming threat for years, as if developers couldn't wait to dump human writers and offload NPC conversations to generative AI models. At CES 2025, NVIDIA made it plainly clear the technology was right around the corner. PUBG developer Krafton, for instance, plans to use NVIDIA's ACE (Avatar Cloud Engine) to power AI companions, which will assist and banter with you during matches. Krafton isn't just stopping there -- it's also using ACE in its life simulation title InZOI to make characters smarter and generate objects. While the use of generative AI in games seems almost inevitable, as the medium has always toyed with new methods for making enemies and NPCs seem smarter and more realistic, seeing several NVIDIA ACE demos back-to-back made me genuinely sick to my stomach.

ai npc, npc, nvidia, (3 more...)

Engadget

Industry: Information Technology > Hardware (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Evaluating GenAI for Simplifying Texts for Education: Improving Accuracy and Consistency for Enhanced Readability

Day, Stephanie L., Cirica, Jacapo, Clapp, Steven R., Penkova, Veronika, Giroux, Amy E., Banta, Abbey, Bordeau, Catherine, Mutteneni, Poojitha, Sawyer, Ben D.

Generative artificial intelligence (GenAI) holds great promise as a tool to support personalized learning. Teachers need tools to efficiently and effectively enhance content readability of educational texts so that they are matched to individual students reading levels, while retaining key details. Large Language Models (LLMs) show potential to fill this need, but previous research notes multiple shortcomings in current approaches. In this study, we introduced a generalized approach and metrics for the systematic evaluation of the accuracy and consistency in which LLMs, prompting techniques, and a novel multi-agent architecture to simplify sixty informational reading passages, reducing each from the twelfth grade level down to the eighth, sixth, and fourth grade levels. We calculated the degree to which each LLM and prompting technique accurately achieved the targeted grade level for each passage, percentage change in word count, and consistency in maintaining keywords and key phrases (semantic similarity). One-sample t-tests and multiple regression models revealed significant differences in the best performing LLM and prompt technique for each of the four metrics. Both LLMs and prompting techniques demonstrated variable utility in grade level accuracy and consistency of keywords and key phrases when attempting to level content down to the fourth grade reading level. These results demonstrate the promise of the application of LLMs for efficient and precise automated text simplification, the shortcomings of current models and prompting methods in attaining an ideal balance across various evaluation criteria, and a generalizable method to evaluate future systems.

consistency, grade level, source text, (15 more...)

2501.09158

Country:

Europe > Sweden (0.14)
Europe > Norway (0.14)
Europe > Denmark (0.14)
(25 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment > Sports (1.00)
Health & Medicine > Therapeutic Area (1.00)
Education > Educational Setting > K-12 Education > Secondary School (0.87)
Education > Educational Setting > K-12 Education > Primary School (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Lee, Kyeongryul, Kim, Heehyeon, Whang, Joyce Jiyoung

SAIF: A Comprehensive Framework for Evaluating the Risks of Generative AI in the Public Sector

The rapid adoption of generative AI in the public sector, encompassing diverse applications ranging from automated public assistance to welfare services and immigration processes, highlights its transformative potential while underscoring the pressing need for thorough risk assessments. Despite its growing presence, evaluations of risks associated with AI-driven systems in the public sector remain insufficiently explored. Building upon an established taxonomy of AI risks derived from diverse government policies and corporate guidelines, we investigate the critical risks posed by generative AI in the public sector while extending the scope to account for its multimodal capabilities. In addition, we propose a Systematic dAta generatIon Framework for evaluating the risks of generative AI (SAIF). SAIF involves four key stages: breaking down risks, designing scenarios, applying jailbreak methods, and exploring prompt types. It ensures the systematic and consistent generation of prompt data, facilitating a comprehensive evaluation while providing a solid foundation for mitigating the risks. Furthermore, SAIF is designed to accommodate emerging jailbreak methods and evolving prompt types, thereby enabling effective responses to unforeseen risk scenarios. We believe that this study can play a crucial role in fostering the safe and responsible integration of generative AI into the public sector.

arxiv preprint arxiv, generative ai, proceedings, (14 more...)

2501.08814

Country:

North America > United States (0.14)
North America > Canada > British Columbia > Regional District of Central Okanagan > Kelowna (0.05)
Asia (0.04)

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government (1.00)
Law > Statutes (0.88)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Treude, Christoph, Gerosa, Marco A.

How Developers Interact with AI: A Taxonomy of Human-AI Collaboration in Software Engineering

Artificial intelligence (AI), including large language models and generative AI, is emerging as a significant force in software development, offering developers powerful tools that span the entire development lifecycle. Although software engineering research has extensively studied AI tools in software development, the specific types of interactions between developers and these AI-powered tools have only recently begun to receive attention. Understanding and improving these interactions has the potential to improve productivity, trust, and efficiency in AI-driven workflows. In this paper, we propose a taxonomy of interaction types between developers and AI tools, identifying eleven distinct interaction types, such as auto-complete code suggestions, command-driven actions, and conversational assistance. Building on this taxonomy, we outline a research agenda focused on optimizing AI interactions, improving developer control, and addressing trust and usability challenges in AI-assisted development. By establishing a structured foundation for studying developer-AI interactions, this paper aims to stimulate research on creating more effective, adaptive AI tools for software development.

developer, interaction, suggestion, (15 more...)

2501.08774

Country:

Asia > Singapore (0.05)
North America > United States > Arizona > Coconino County > Flagstaff (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.50)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Liang, Kaiqu, Hu, Haimin, Liu, Ryan, Griffiths, Thomas L., Fisac, Jaime Fernández

RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation

Generative AI systems like foundation models (FMs) must align well with human values to ensure their behavior is helpful and trustworthy. While Reinforcement Learning from Human Feedback (RLHF) has shown promise for optimizing model performance using human judgments, existing RLHF pipelines predominantly rely on immediate feedback, which can fail to accurately reflect the downstream impact of an interaction on users' utility. We demonstrate that feedback based on evaluators' foresight estimates of downstream consequences systematically induces Goodhart's Law dynamics, incentivizing misaligned behaviors like sycophancy and deception and ultimately degrading user outcomes. To alleviate this, we propose decoupling evaluation from prediction by refocusing RLHF on hindsight feedback. Our theoretical analysis reveals that conditioning evaluator feedback on downstream observations mitigates misalignment and improves expected human utility, even when these observations are simulated by the AI system itself. To leverage this insight in a practical alignment algorithm, we introduce Reinforcement Learning from Hindsight Simulation (RLHS), which first simulates plausible consequences and then elicits feedback to assess what behaviors were genuinely beneficial in hindsight. We apply RLHS to two widely-employed online and offline preference optimization methods -- Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO) -- and show empirically that misalignment is significantly reduced with both methods. Through an online human user study, we show that RLHS consistently outperforms RLHF in helping users achieve their goals and earns higher satisfaction ratings, despite being trained solely with simulated hindsight feedback. These results underscore the importance of focusing on long-term consequences, even simulated ones, to mitigate misalignment in RLHF.

arxiv preprint arxiv, information, requirement, (15 more...)

2501.08617

Country:

North America > United States > New York (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)