Large Language Model
Elon Musk's Urgent Warning, Demands Pause on AI Research
Elon Musk, the owner of Tesla Motors, has joined forces with several other industry experts to pen an open letter calling for a pause on the further development of AI tools like OpenAI's newly launched GPT-4. The letter cites potential "risks to society and humanity" as the primary reason for this request. By uniting prominent figures like Elon Musk and leading experts in the AI field, the letter emphasizes the need for responsible development and collaboration among industry stakeholders. The open letter, signed by Musk and other industry heavyweights, stresses the importance of ensuring that power systems are developed only once we have confidence in their positive effects and manageable risks. Tesla, which uses AI for its autopilot system, showcases Musk's own involvement in the field and the gravity of his concerns.
Block-wise Bit-Compression of Transformer-based Models
With the popularity of the recent Transformer-based models represented by BERT, GPT-3 and ChatGPT, there has been state-of-the-art performance in a range of natural language processing tasks. However, the massive computations, huge memory footprint, and thus high latency of Transformer-based models is an inevitable challenge for the cloud with high real-time requirement. To tackle the issue, we propose BBCT, a method of block-wise bit-compression for transformer without retraining. Our method achieves more fine-grained compression of the whole transformer, including embedding, matrix multiplication, GELU, softmax, layer normalization, and all the intermediate results. As a case, we compress an efficient BERT with the method of BBCT. Our benchmark test results on General Language Understanding Evaluation (GLUE) show that BBCT can achieve less than 1% accuracy drop in most tasks.
Large language models can rate news outlet credibility
Yang, Kai-Cheng, Menczer, Filippo
Although large language models (LLMs) have shown exceptional performance in various natural language processing tasks, they are prone to hallucinations. State-of-the-art chatbots, such as the new Bing, attempt to mitigate this issue by gathering information directly from the internet to ground their answers. In this setting, the capacity to distinguish trustworthy sources is critical for providing appropriate accuracy contexts to users. Here we assess whether ChatGPT, a prominent LLM, can evaluate the credibility of news outlets. With appropriate instructions, ChatGPT can provide ratings for a diverse set of news outlets, including those in non-English languages and satirical sources, along with contextual explanations. Our results show that these ratings correlate with those from human experts (Spearmam's $\rho=0.54, p<0.001$). These findings suggest that LLMs could be an affordable reference for credibility ratings in fact-checking applications. Future LLMs should enhance their alignment with human expert judgments of source credibility to improve information accuracy.
From Zero to Hero: Convincing with Extremely Complicated Math
Weiherer, Maximilian, Egger, Bernhard
Becoming a (super) hero is almost every kid's dream. During their sheltered childhood, they do whatever it takes to grow up to be one. Work hard, play hard -- all day long. But as they're getting older, distractions are more and more likely to occur. They're getting off track. They start discovering what is feared as simple math. Finally, they end up as a researcher, writing boring, non-impressive papers all day long because they only rely on simple mathematics. No top-tier conferences, no respect, no groupies. Life's over. To finally put an end to this tragedy, we propose a fundamentally new algorithm, dubbed zero2hero, that turns every research paper into a scientific masterpiece. Given a LaTeX document containing ridiculously simple math, based on next-generation large language models, our system automatically over-complicates every single equation so that no one, including yourself, is able to understand what the hell is going on. Future reviewers will be blown away by the complexity of your equations, immediately leading to acceptance. zero2hero gets you back on track, because you deserve to be a hero$^{\text{TM}}$. Code leaked at \url{https://github.com/mweiherer/zero2hero}.
Zero-shot meta-learning for small-scale data from human subjects
Jiang, Julie, Lerman, Kristina, Ferrara, Emilio
Abstract--While developments in machine learning led to impressive performance gains on big data, many human subjects data are, in actuality, small and sparsely labeled. Existing methods applied to such data often do not easily generalize to out-of-sample subjects. Instead, models must make predictions on test data that may be drawn from a different distribution, a problem known as zero-shot learning. To address this challenge, we develop an end-to-end framework using a meta-learning approach, which enables the model to rapidly adapt to a new prediction task with limited training data for out-of-sample test data. We use three real-world small-scale human subjects datasets (two randomized control studies and one observational study), for which we predict treatment outcomes for held-out treatment groups. Our model learns the latent treatment effects of each intervention and, by design, can naturally handle multitask predictions. However, these methods have had limited success in I. Though such studies remain the gold standard large amount of labeled data yet have limited capacity for of scientific discovery [1], [3], many are small and sparsely transferring knowledge [14], [15], hindering their ability to labeled due to regulatory challenges, ethical considerations generalize to complex yet small human subjects datasets and [4], data availability (e.g., investigating rare diseases [3]), tasks [16].
Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT
Xia, Chunqiu Steven, Zhang, Lingming
Automated Program Repair (APR) aims to automatically generate patches for buggy programs. Recent APR work has been focused on leveraging modern Large Language Models (LLMs) to directly generate patches for APR. Such LLM-based APR tools work by first constructing an input prompt built using the original buggy code and then queries the LLM to generate patches. While the LLM-based APR tools are able to achieve state-of-the-art results, it still follows the classic Generate and Validate repair paradigm of first generating lots of patches and then validating each one afterwards. This not only leads to many repeated patches that are incorrect but also miss the crucial information in test failures as well as in plausible patches. To address these limitations, we propose ChatRepair, the first fully automated conversation-driven APR approach that interleaves patch generation with instant feedback to perform APR in a conversational style. ChatRepair first feeds the LLM with relevant test failure information to start with, and then learns from both failures and successes of earlier patching attempts of the same bug for more powerful APR. For earlier patches that failed to pass all tests, we combine the incorrect patches with their corresponding relevant test failure information to construct a new prompt for the LLM to generate the next patch. In this way, we can avoid making the same mistakes. For earlier patches that passed all the tests, we further ask the LLM to generate alternative variations of the original plausible patches. In this way, we can further build on and learn from earlier successes to generate more plausible patches to increase the chance of having correct patches. While our approach is general, we implement ChatRepair using state-of-the-art dialogue-based LLM -- ChatGPT. By calculating the cost of accessing ChatGPT, we can fix 162 out of 337 bugs for \$0.42 each!
Network Visualization of ChatGPT Research: a study based on term and keyword co-occurrence network analysis
The main objective of this paper is to identify the major research areas of ChatGPT through term and keyword co-occurrence network mapping techniques. For conducting the present study, total of 577 publications were retrieved from the Lens database for the network visualization. The findings of the study showed that chatgpt occurrence in maximum number of times followed by its related terms such as artificial intelligence, large language model, gpt, study etc. This study will be helpful to library and information science as well as computer or information technology professionals.
Evaluating Large Language Models on a Highly-specialized Topic, Radiation Oncology Physics
Holmes, Jason, Liu, Zhengliang, Zhang, Lian, Ding, Yuzhen, Sio, Terence T., McGee, Lisa A., Ashman, Jonathan B., Li, Xiang, Liu, Tianming, Shen, Jiajian, Liu, Wei
We present the first study to investigate Large Language Models (LLMs) in answering radiation oncology physics questions. Because popular exams like AP Physics, LSAT, and GRE have large test-taker populations and ample test preparation resources in circulation, they may not allow for accurately assessing the true potential of LLMs. This paper proposes evaluating LLMs on a highly-specialized topic, radiation oncology physics, which may be more pertinent to scientific and medical communities in addition to being a valuable benchmark of LLMs. We developed an exam consisting of 100 radiation oncology physics questions based on our expertise at Mayo Clinic. Four LLMs, ChatGPT (GPT-3.5), ChatGPT (GPT-4), Bard (LaMDA), and BLOOMZ, were evaluated against medical physicists and non-experts. ChatGPT (GPT-4) outperformed all other LLMs as well as medical physicists, on average. The performance of ChatGPT (GPT-4) was further improved when prompted to explain first, then answer. ChatGPT (GPT-3.5 and GPT-4) showed a high level of consistency in its answer choices across a number of trials, whether correct or incorrect, a characteristic that was not observed in the human test groups. In evaluating ChatGPTs (GPT-4) deductive reasoning ability using a novel approach (substituting the correct answer with "None of the above choices is the correct answer."), ChatGPT (GPT-4) demonstrated surprising accuracy, suggesting the potential presence of an emergent ability. Finally, although ChatGPT (GPT-4) performed well overall, its intrinsic properties did not allow for further improvement when scoring based on a majority vote across trials. In contrast, a team of medical physicists were able to greatly outperform ChatGPT (GPT-4) using a majority vote. This study suggests a great potential for LLMs to work alongside radiation oncology experts as highly knowledgeable assistants.
A 3D Printed Robot Equipped With GPT-5 to Lead Meta - 3Dnatives
Recently renamed Meta, Facebook is one of the undisputed giants of emerging technologies, whether in the fields of virtual reality, augmented reality, artificial intelligence or even additive manufacturing. Indeed, the group is progressively advancing in this market, slowly but surely. Notably, the American giant already announced the acquisition of Luxexcel a few months ago. This time, Meta has decided to combine all this expertise to announce a new and somewhatโฆ surprising project. Mark Zuckerberg, founder and CEO of the former Facebook, declared in an official press release that the company has 3D printed a robot equipped with OpenAI's GPT-5 to sit on the board of directors and support it in its strategic decisions.
ChatGPT gets banned in Italy as the fight against AI begins - AIVAnet
ChatGPT has been temporarily banned in Italy due to privacy concerns and faces a Federal Trade Commission (FTC) complaint in the U.S. that calls for new releases of ChatGPT to be halted. According to the Associated Press, the Italian Data Protection Authority will maintain the ban "until ChatGPT respects privacy." The problem with user data being visible to others during ChatGPT's March 20 outage was mentioned as the reason for this action. No details were shared about how this ban would be enforced or whether it would affect OpenAI partners that use ChatGPT, such as Microsoft's Bing Chat. ChatGPT: how to use the AI chatbot everyone's talking about OpenAI and ChatGPT logos are marked do not enter with a red circle and line symbol.