AITopics

2410.18881

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

arXiv.org Artificial IntelligenceOct-24-2024

The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI

Li, Fulu

In this paper, we give an in-depth analysis on the mathematical problem formulations and the probabilistic optimization explorations for some of the key components in Transformer model [33] in the field of generative AI. We explore and discuss some potential further enhancement for current state of the art methods for some key underlying technologies of generative AI models from algorithmic and probabilistic optimization perspective. In particular, we present an optimal solution for sub-word encoding (SWE) based on similar initial settings as that of byte-pair encoding (BPE) algorithm in [9] with similar objectives as that of WordPiece approach in [28, 31] to maximize the likelihood of the training data. We also present cross entropy optimization method to optimize hyperparameters for word2vec model [17]. In addition, we propose a factored combination of rotary positional encoding (RoPE) [32] and attention with linear biases (ALiBi) [23] with a harmonic series. We also present a probabilistic FlashAttention [6, 7] (PrFlashAttention) method with a probability distribution over block distances in the matrix to decide which block is likely to participate in a given round of attention computation while maintaining the lower triangle shape of the tensor for autoregressive language models by re-shaping the tensors. Finally, we present staircase adaptive quantization (SAQ) of key-value (KV) cache for multi-query attention (MQA) based on the framework presented in [16] to have gradual quantization degradation while achieving reasonable model quality and cost savings.

machine learning, natural language, training data, (20 more...)

2410.18441

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Saxony > Leipzig (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.92)

New ScientistOct-23-2024, 16:00:15 GMT

Google tool makes AI-generated writing easily detectable

Google has been using artificial intelligence watermarking to automatically identify text generated by the company's Gemini chatbot, making it easier to distinguish AI-generated content from human-written posts. That watermark system could help prevent misuse of the AI chatbots for misinformation and disinformation – not to mention cheating in school and business settings. Now, the tech company is making an open-source version of its technique available so that other generative AI developers can similarly watermark the output from their own large language models, says Pushmeet Kohli at Google DeepMind, the company's AI research team, which combines the former Google Brain and DeepMind labs. "While SynthID isn't a silver bullet for identifying AI-generated content, it is an important building block for developing more reliable AI identification tools," he says. Google creates self-replicating life from digital'primordial soup' Independent researchers voiced similar optimism.

google deepmind, large language model, machine learning, (14 more...)

New Scientist

Country:

North America > United States > Texas > Travis County > Austin (0.06)
North America > United States > Maryland (0.06)

Industry:

Information Technology (1.00)
Media > News (0.58)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.38)

Structure Language Models for Protein Conformation Generation

Lu, Jiarui, Chen, Xiaoyin, Lu, Stephen Zhewen, Shi, Chence, Guo, Hongyu, Bengio, Yoshua, Tang, Jian

Proteins adopt multiple structural conformations to perform their diverse biological functions, and understanding these conformations is crucial for advancing drug discovery. Traditional physics-based simulation methods often struggle with sampling equilibrium conformations and are computationally expensive. Recently, deep generative models have shown promise in generating protein conformations as a more efficient alternative. However, these methods predominantly rely on the diffusion process within a 3D geometric space, which typically centers around the vicinity of metastable states and is often inefficient in terms of runtime. In this paper, we introduce Structure Language Modeling (SLM) as a novel framework for efficient protein conformation generation. Specifically, the protein structures are first encoded into a compact latent space using a discrete variational auto-encoder, followed by conditional language modeling that effectively captures sequencespecific conformation distributions. This enables a more efficient and interpretable exploration of diverse ensemble modes compared to existing methods. Based on this general framework, we instantiate SLM with various popular LM architectures as well as proposing the ESMDiff, a novel BERT-like structure language model fine-tuned from ESM3 with masked diffusion. We verify our approach in various scenarios, including the equilibrium dynamics of BPTI, conformational change pairs, and intrinsically disordered proteins. SLM provides a highly efficient solution, offering a 20-100x speedup than existing methods in generating diverse conformations, shedding light on promising avenues for future research. Protein structure dynamics are fundamental to understanding the biological functions of proteins. The ability of proteins to adopt multiple conformations is crucial for their function in influencing interactions with other biomolecules and the environment. Traditional computational methods, such as molecular dynamics (MD) simulations, have long been used to explore these dynamics. However, these methods are computationally expensive and time-consuming.

artificial intelligence, machine learning, natural language, (19 more...)

2410.18403

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Aggregated Knowledge Model: Enhancing Domain-Specific QA with Fine-Tuned and Retrieval-Augmented Generation Models

Liu, Fengchen, Jung, Jordan, Feinstein, Wei, DAmbrogia, Jeff, Jung, Gary

This paper introduces a novel approach to enhancing closed-domain Question Answering (QA) systems, focusing on the specific needs of the Lawrence Berkeley National Laboratory (LBL) Science Information Technology (ScienceIT) domain. Utilizing a rich dataset derived from the ScienceIT documentation, our study embarks on a detailed comparison of two fine-tuned large language models and five retrieval-augmented generation (RAG) models. Through data processing techniques, we transform the documentation into structured context-question-answer triples, leveraging the latest Large Language Models (AWS Bedrock, GCP PaLM2, Meta LLaMA2, OpenAI GPT-4, Google Gemini-Pro) for data-driven insights. Additionally, we introduce the Aggregated Knowledge Model (AKM), which synthesizes responses from the seven models mentioned above using K-means clustering to select the most representative answers. The evaluation of these models across multiple metrics offers a comprehensive look into their effectiveness and suitability for the LBL ScienceIT environment. The results demonstrate the potential benefits of integrating fine-tuning and retrieval-augmented strategies, highlighting significant performance improvements achieved with the AKM. The insights gained from this study can be applied to develop specialized QA systems tailored to specific domains.

large language model, machine learning, natural language, (18 more...)

2410.18344

Country:

North America > United States > California > Alameda County > Berkeley (0.15)
North America > United States > Louisiana > East Baton Rouge Parish > Baton Rouge (0.06)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Contra Costa County > Walnut Creek (0.04)

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)
Overview > Innovation (0.34)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.36)

Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling

Bhan, Nirav, Gupta, Shival, Manaswini, Sai, Baba, Ritik, Yadav, Narun, Desai, Hillori, Choudhary, Yash, Pawar, Aman, Shrivastava, Sarthak, Biswas, Sudipta

Large Language Models (LLMs) have shown remarkable capabilities in various domains, yet their economic impact has been limited by challenges in tool use and function calling. This paper introduces ThorV2, a novel architecture that significantly enhances LLMs' function calling abilities. We develop a comprehensive benchmark focused on HubSpot CRM operations to evaluate ThorV2 against leading models from OpenAI and Anthropic. Our results demonstrate that ThorV2 outperforms existing models in accuracy, reliability, latency, and cost efficiency for both single and multi-API calling tasks. We also show that ThorV2 is far more reliable and scales better to multistep tasks compared to traditional models. Our work offers the tantalizing possibility of more accurate function-calling compared to today's best-performing models using significantly smaller LLMs. These advancements have significant implications for the development of more capable AI assistants and the broader application of LLMs in real-world scenarios.

large language model, machine learning, natural language, (19 more...)

2410.1795

Country: Asia > India > West Bengal > Kharagpur (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.61)

Heckel, Kade M., Weller, Adrian

Countering Autonomous Cyber Threats

With the capability to write convincing and fluent natural language and generate code, Foundation Models present dual-use concerns broadly and within the cyber domain specifically. Generative AI has already begun to impact cyberspace through a broad illicit marketplace for assisting malware development and social engineering attacks through hundreds of malicious-AI-as-a-services tools. More alarming is that recent research has shown the potential for these advanced models to inform or independently execute offensive cyberspace operations. However, these previous investigations primarily focused on the threats posed by proprietary models due to the until recent lack of strong open-weight model and additionally leave the impacts of network defenses or potential countermeasures unexplored. Critically, understanding the aptitude of downloadable models to function as offensive cyber agents is vital given that they are far more difficult to govern and prevent their misuse. As such, this work evaluates several state-of-the-art FMs on their ability to compromise machines in an isolated network and investigates defensive mechanisms to defeat such AI-powered attacks. Using target machines from a commercial provider, the most recently released downloadable models are found to be on par with a leading proprietary model at conducting simple cyber attacks with common hacking tools against known vulnerabilities. To mitigate such LLM-powered threats, defensive prompt injection (DPI) payloads for disrupting the malicious cyber agent's workflow are demonstrated to be effective. From these results, the implications for AI safety and governance with respect to cybersecurity is analyzed.

large language model, machine learning, natural language, (18 more...)

2410.18312

Country:

North America > United States (1.00)
Asia > Russia (0.27)
Europe > Ukraine (0.04)
(4 more...)

Genre:

Workflow (0.88)
Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)
Government > Regional Government > North America Government > United States Government (0.92)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Collaborative AI in Sentiment Analysis: System Architecture, Data Prediction and Deployment Strategies

Zhang, Chaofeng, Hou, Jia, Tan, Xueting, Li, Gaolei, Chen, Caijuan

The advancement of large language model (LLM) based artificial intelligence technologies has been a game-changer, particularly in sentiment analysis. This progress has enabled a shift from highly specialized research environments to practical, widespread applications within the industry. However, integrating diverse AI models for processing complex multimodal data and the associated high costs of feature extraction presents significant challenges. Motivated by the marketing oriented software development +needs, our study introduces a collaborative AI framework designed to efficiently distribute and resolve tasks across various AI systems to address these issues. Initially, we elucidate the key solutions derived from our development process, highlighting the role of generative AI models like \emph{chatgpt}, \emph{google gemini} in simplifying intricate sentiment analysis tasks into manageable, phased objectives. Furthermore, we present a detailed case study utilizing our collaborative AI system in edge and cloud, showcasing its effectiveness in analyzing sentiments across diverse online media channels.

large language model, machine learning, sentiment analysis, (17 more...)

2410.13247

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China > Shanghai > Shanghai (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Information Technology > Services (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

EngadgetOct-22-2024, 19:30:42 GMT

OpenAI and Microsoft are funding 10 million in grants for AI-powered journalism

OpenAI and Microsoft are funding projects to bring more AI tools into the newsroom. The duo will give grants of up to 10 million to Chicago Public Media, the Minnesota Star Tribune, Newsday (in Long Island, NY), The Philadelphia Inquirer and The Seattle Times. Each of the publications will hire a two-year AI fellow to develop projects for implementing the technology and improving business sustainability. Three more outlets are expected to receive fellowship grants in a second round. OpenAI and Microsoft are each contributing 2.5 million in direct funding as well as 2.5 million in software and enterprise credits.

ai-powered journalism, openai and microsoft, publication, (1 more...)

Engadget

Country:

North America > United States > Minnesota (0.28)
North America > United States > Illinois > Cook County > Chicago (0.28)

Industry:

Media > News (0.99)
Government > Regional Government > North America Government > United States Government (0.61)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

The GuardianOct-22-2024, 15:00:43 GMT

Thom Yorke and Julianne Moore join thousands of creatives in AI warning

Abba's Björn Ulvaeus, the actor Julianne Moore, the Radiohead singer Thom Yorke are among 10,500 signatories of a statement from the creative industries warning artificial intelligence companies that unlicensed use of their work is a "major, unjust threat" to artists' livelihoods. "The unlicensed use of creative works for training generative AI is a major, unjust threat to the livelihoods of the people behind those works, and must not be permitted," reads the statement. Thousands of creative professionals from the worlds of literature, music, film, theatre and television have given their backing to the statement, with authors including Kazuo Ishiguro, Ann Patchett, and Kate Mosse, musicians including the Cure's Robert Smith as well as the composer Max Richter and actors including Kevin Bacon, Rosario Dawson and F Murray Abraham. The organiser of the letter, the British composer and former AI executive Ed Newton-Rex, said people who make a living from creative work are "very worried" about the situation. "There are three key resources that generative AI companies need to build AI models: people, compute, and data. They spend vast sums on the first two – sometimes a million dollars per engineer, and up to a billion dollars per model. But they expect to take the third – training data – for free," he said.

julianne moore join thousand, newton-rex, thom yorke, (14 more...)

The Guardian

Country:

Europe > United Kingdom (0.17)
North America > United States (0.06)

Industry: Law > Intellectual Property & Technology Law (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.58)