Goto

Collaborating Authors

 Large Language Model


An Over-parameterized Exponential Regression

arXiv.org Artificial Intelligence

Over the past few years, there has been a significant amount of research focused on studying the ReLU activation function, with the aim of achieving neural network convergence through over-parametrization. However, recent developments in the field of Large Language Models (LLMs) have sparked interest in the use of exponential activation functions, specifically in the attention mechanism. Mathematically, we define the neural function $F: \mathbb{R}^{d \times m} \times \mathbb{R}^d \rightarrow \mathbb{R}$ using an exponential activation function. Given a set of data points with labels $\{(x_1, y_1), (x_2, y_2), \dots, (x_n, y_n)\} \subset \mathbb{R}^d \times \mathbb{R}$ where $n$ denotes the number of the data. Here $F(W(t),x)$ can be expressed as $F(W(t),x) := \sum_{r=1}^m a_r \exp(\langle w_r, x \rangle)$, where $m$ represents the number of neurons, and $w_r(t)$ are weights at time $t$. It's standard in literature that $a_r$ are the fixed weights and it's never changed during the training. We initialize the weights $W(0) \in \mathbb{R}^{d \times m}$ with random Gaussian distributions, such that $w_r(0) \sim \mathcal{N}(0, I_d)$ and initialize $a_r$ from random sign distribution for each $r \in [m]$. Using the gradient descent algorithm, we can find a weight $W(T)$ such that $\| F(W(T), X) - y \|_2 \leq \epsilon$ holds with probability $1-\delta$, where $\epsilon \in (0,0.1)$ and $m = \Omega(n^{2+o(1)}\log(n/\delta))$. To optimize the over-parameterization bound $m$, we employ several tight analysis techniques from previous studies [Song and Yang arXiv 2019, Munteanu, Omlor, Song and Woodruff ICML 2022].


TextMI: Textualize Multimodal Information for Integrating Non-verbal Cues in Pre-trained Language Models

arXiv.org Artificial Intelligence

Pre-trained large language models have recently achieved ground-breaking performance in a wide variety of language understanding tasks. However, the same model can not be applied to multimodal behavior understanding tasks (e.g., video sentiment/humor detection) unless non-verbal features (e.g., acoustic and visual) can be integrated with language. Jointly modeling multiple modalities significantly increases the model complexity, and makes the training process data-hungry. While an enormous amount of text data is available via the web, collecting large-scale multimodal behavioral video datasets is extremely expensive, both in terms of time and money. In this paper, we investigate whether large language models alone can successfully incorporate non-verbal information when they are presented in textual form. We present a way to convert the acoustic and visual information into corresponding textual descriptions and concatenate them with the spoken text. We feed this augmented input to a pre-trained BERT model and fine-tune it on three downstream multimodal tasks: sentiment, humor, and sarcasm detection. Our approach, TextMI, significantly reduces model complexity, adds interpretability to the model's decision, and can be applied for a diverse set of tasks while achieving superior (multimodal sarcasm detection) or near SOTA (multimodal sentiment analysis and multimodal humor detection) performance. We propose TextMI as a general, competitive baseline for multimodal behavioral analysis tasks, particularly in a low-resource setting.


Prompt Engineering: How To Speak To AI in 2023 To Get What You Want

#artificialintelligence

Is prompt engineering a process that tries to get accurate, logical, and consistent answers from an AI language model? Or is it a way to find the faults in a language model and then fix them to achieve the perfect artificial intelligence model, which kills "prompt engineering?" In this article, we'll concentrate on ChatGPT because it is the most popular model at the moment. But just in case this AI tool is new to you, I suggest you read our "ChatGPT for Beginners" article first. We'll also look at prompts for image generators like DALLE 2. I have written a few articles about this LLM (large language model) and learned that it is not so smart.


What We Still Don't Know About How A.I. Is Trained

The New Yorker

There is no doubt that GPT-4, the latest iteration of the artificial-intelligence engine created by the company OpenAI, is innovative and cool. It can create a poem in the style of Basho, spell out the chord progression and time signature for a simple tune, and provide a seven-step recipe for a peanut-butter-and-jelly sandwich. When I asked it to write a musical about a narcissistic politician who holds the fate of the world in his hands, it delivered a story in two acts, with a protagonist named Alex Sterling who "navigates a maze of power, manipulation, and the consequences of his decisions," as he sings "Narcissus in the Mirror," "The Price of Power," and about a dozen other invented songs. Those songs appear to have been created out of thin air; certainly, no human conceived them. Still, Alex's story, which "explores themes of self-discovery, redemption, and the responsibility of leadership," is quite familiar.


Ernie Bot, China's answer to ChatGPT, is delayed -- again

Washington Post - Technology News

Just when the conversation was starting to get good and we approached borderline subjects, we repeatedly found ourselves back at square one. Even simple requests for facts about China's government or top leader Xi led it to terminate the exchange with a canned reply about being an AI that was still learning, and a link to begin a new conversation -- making conversation with Ernie Bot less smooth than with ChatGPT.


We tested Google's Bard chatbot and here's how you can try it out - Plugavel

#artificialintelligence

Last week, Google finally opened a public test of its chatbot Bard, a direct rival to OpenAI's ChatGPT and the new Microsoft Bing chatbot. Unlike the latter, which are based on the large language model (LLM) called GPT, Bard uses the LaMDA model, developed by Google. And unlike ChatGPT, but just like Bing, its data is up-to-date and it can use information found on the web in real time. To access Bard, you must meet certain criteria. First, you must have an account with Google.


In a first, Punjab and Haryana HC uses Chat GPT for deciding upon bail plea

#artificialintelligence

Chandigarh [India], March 28 (ANI): The Punjab Haryana High Court on Tuesday became the first court in India to have used Chat GPT technology (artificial intelligence) to decide on the bail plea of an accused and it rejected the petition. The bench led by Anoop Chitkara sought the response of Chat GPT (Artificial Intelligence) while hearing the bail application of an accused arrested in June 2020 for alleged rioting, criminal intimidation, murder and criminal conspiracy. Justice Chitkara assessed the reply received from Chat GPT and rejected the bail plea of the accused on the basis of his experiences and decisions given earlier. The judge said that "To inflict death is cruel in itself, but if cruelty leads to death, then the situation changes. When a physical assault is committed in a brutal manner, the parameters of bail also change".


Microsoft's new Security Copilot will help network admins respond to threats in minutes, not days

Engadget

Humanity took another step towards its Ghost in the Shell future on Tuesday with Microsoft's unveiling of the new Security Copilot AI at its inaugural Microsoft Secure event. The automated enterprise-grade security system is powered by OpenAI's GPT-4, runs on the Azure infrastructure and promises admins the ability "to move at the speed and scale of AI." Security Copilot is similar to the large language model (LLM) that drives the Bing Copilot feature, but with a training geared heavily towards network security rather than general conversational knowledge and web search optimization. "This security-specific model in turn incorporates a growing set of security-specific skills and is informed by Microsoft's unique global threat intelligence and more than 65 trillion daily signals," Vasu Jakkal, Corporate Vice President of Microsoft Security, Compliance, Identity, and Management, wrote Tuesday. "Just since the pandemic, we've seen an incredible proliferation [in corporate hacking incidents],"Jakkal told Bloomberg. For example, "it takes one hour and 12 minutes on average for an attacker to get full access to your inbox once a user has clicked on a phishing link. It used to be months or weeks for someone to get access."


Introducing Microsoft Security Copilot: Empowering defenders at the speed of AI - The Official Microsoft Blog

#artificialintelligence

The odds are against today's defenders Today the odds remain stacked against cybersecurity professionals. Too often, they fight an asymmetric battle against prolific, relentless and sophisticated attackers. To protect their organizations, defenders must respond to threats that are often hidden among noise. Compounding this challenge is a global shortage of skilled security professionals, leading to an estimated 3.4 million openings in the field. The volume and velocity of attacks requires us to continually create new technologies that can tip the scales in favor of defenders.


AI doesn't belong everywhere. Stop using a hammer to make lasagna.

Washington Post - Technology News

Vanderbilt University officials could have used the AI language generator ChatGPT as suggestions for a difficult email to students grieving over a deadly shooting at another college. Instead, they used the AI's soulless text verbatim.