Large Language Model
SAINE: Scientific Annotation and Inference Engine of Scientific Research
Rao, Susie Xi, Tu, Yilei, Egger, Peter H.
We present SAINE, an Scientific Annotation and Inference ENgine based on a set of standard open-source software, such as Label Studio and MLflow. We show that our annotation engine can benefit the further development of a more accurate classification. Based on our previous work on hierarchical discipline classifications, we demonstrate its application using SAINE in understanding the space for scholarly publications. The user study of our annotation results shows that user input collected with the help of our system can help us better understand the classification process. We believe that our work will help to foster greater transparency and better understand scientific research. Our annotation and inference engine can further support the downstream meta-science projects. We welcome collaboration and feedback from the scientific community on these projects. The demonstration video can be accessed from https://youtu.be/yToO-G9YQK4. A live demo website is available at https://app.heartex.com/user/signup/?token=e2435a2f97449fa1 upon free registration.
Explanation Regeneration via Information Bottleneck
Li, Qintong, Wu, Zhiyong, Kong, Lingpeng, Bi, Wei
Explaining the black-box predictions of NLP models naturally and accurately is an important open problem in natural language generation. These free-text explanations are expected to contain sufficient and carefully-selected evidence to form supportive arguments for predictions. Due to the superior generative capacity of large pretrained language models, recent work built on prompt engineering enables explanation generation without specific training. However, explanation generated through single-pass prompting often lacks sufficiency and conciseness. To address this problem, we develop an information bottleneck method EIB to produce refined explanations that are sufficient and concise. Our approach regenerates the free-text explanation by polishing the single-pass output from the pretrained language model but retaining the information that supports the contents being explained. Experiments on two out-of-domain tasks verify the effectiveness of EIB through automatic evaluation and thoroughly-conducted human evaluation.
Meta's Threads tops 100 million users in just 5 days, Zuckerberg says
Despite not being available in Europe yet because of European Union data privacy regulations, Threads has reached 100 million users faster than any other app. The speed of its growth handily beat artificial intelligence app ChatGPT, which took two months to reach that mark, according to a UBS study.
Sarah Silverman sues OpenAI and Meta over copyright infringement
On Friday, the comedian and author, alongside novelists Christopher Golden and Richard Kadrey, filed a pair of complaints against OpenAI and Meta ( via Gizmodo). Everyday pirates can access these materials through direct downloads, but perhaps more usefully for those generating large language models, many shadow libraries also make written material available in bulk torrent packages. One exhibit from Silverman's lawsuit involves an exchange between the comedian's lawyers and ChatGPT. Silverman's legal team asked the chatbot to summarize The Bedwetter, a memoir she published in 2010. The chatbot was not only able to outline entire parts of the book, but some passages it relayed appear to have been reproduced verbatim.
Programs to detect AI discriminate against non-native English speakers, shows study
Computer programs that are used to detect essays, job applications and other work generated by artificial intelligence can discriminate against people who are non-native English speakers, researchers say. Tests on seven popular AI text detectors found that articles written by people who did not speak English as a first language were often wrongly flagged as AI-generated, a bias that could have a serious impact on students, academics and job applicants. With the rise of ChatGPT, a generative AI program that can write essays, solve problems and create computer code, many teachers now consider AI detection as a "critical countermeasure to deter a 21st-century form of cheating", the researchers say, but they warn that the 99% accuracy claimed by some detectors is "misleading at best." Alex Hern's weekly dive in to how technology is shaping our lives Scientists led by James Zou, an assistant professor of biomedical data science at Stanford University, ran 91 English essays written by non-native English speakers through seven popular GPT detectors to see how well the programs performed. More than half of the essays, which were written for a widely recognised English proficiency test known as the Test of English as a Foreign Language, or TOEFL, were flagged as AI-generated, with one program flagging 98% of the essays as composed by AI.
Sarah Silverman sues OpenAI and Meta for copyright infringement
Silverman has filed the suits along with two authors, Christopher Golden and Richard Kadrey, in which they claim the AI models developed by OpenAI and Meta used their work as part of their training data. Tools like ChatGPT, a highly popular chatbot, are based on large language models that are fed vast amounts of data taken from the internet in order to train them to give convincing responses to text prompts from users. The suits claim the authors' works were obtained from "shadow library" sites that have "long been of interest to the AI-training community". The OpenAI suit includes exhibits claiming that, when prompted, it summarised three books: Silverman's The Bedwetter, Ararat by Golden, and Kadrey's Sandman Slim. The Meta suit cites multiple works by Kadrey and Golden, alongside The Bedwetter, and flags a Meta paper that indicates LLaMA's training datasets included material taken from shadow libraries the suit describes as "flagrantly illegal".
Meta's Twitter-killer app Threads passes 100million users in five days
Meta Inc's Threads app launched by Instagram that has been called a Twitter-killer has signed up more than 100 million users in less than five days. That is according to data tracking websites on Monday, suggesting the app has smashed the record of AI tool ChatGPT for fastest-growing consumer app. While ChatGPT took two months to hit the 100 million user mark and video-sharing app TikTok took nine months, Instagram itself took two and a half years to reach that mark after its 2010 launch. Threads went live on Apple and Android app stores in 100 countries late on Wednesday (July 5), though it is not available in Europe because parent company Meta is unsure how to navigate the European Union's data privacy legislation. Meanwhile, experts have described the traffic of Elon Musk-owned Twitter as'tanking' in the face of the new competition.
Threads hits 100 million users in five-day record surge
The Threads app launched by Instagram as a rival to Twitter has seen more than 100 million users sign up in less than five days, data tracking websites said on Monday, smashing the record of artificial intelligence tool ChatGPT for the fastest-growing consumer app. While ChatGPT took two months to hit the 100-million-user mark and video-sharing app TikTok took nine months, Instagram itself took two and a half years to reach the same mark after its 2010 launch. Threads went live on Apple and Android app stores in 100 countries late on Wednesday, though it is not available in Europe due to legal issues the parent company Meta has had with the European Union's data privacy legislation. Twitter is thought to have around 200 million regular users but it has suffered repeated technical failures since Elon Musk bought the platform last year and sacked thousands of staff. Musk, who also serves as the boss of Tesla and SpaceX, has also alienated many users by introducing charges for previously free services and allowing banned right-wing accounts back on the platform.
How AI Can Make Gaming Better for All Players
When Google revealed Project Gameface, the company was proud to show off a hands-free, AI-powered gaming mouse that, according to its announcement, "enables people to control a computer's cursor using their head movement and facial gestures." While this may not be the first AI-based gaming tool, it was certainly one of the first to put AI in the hands of players, rather than developers. The project was inspired by Lancy Carr, a quadriplegic video game streamer who utilizes a head-tracking mouse as part of his gaming setup. After his existing hardware was lost in a fire, Google stepped in to create an open source, highly configurable, low-cost alternative to expensive replacement hardware, powered by machine learning. While AI's broader existence is proving divisive, we set out to discover whether AI, when used for good, could be the future of gaming accessibility.
Google is testing its medical AI chatbot at the Mayo Clinic
Google is already testing its Med-PaLM 2 AI chat technology at at the Mayo Clinic and other hospitals, The Wall Street Journal has reported. It's based on the company's PaLM 2 large language model (LLM) that underpins Bard, Google's ChatGPT rival -- and was launched just months ago at Google I/O. Unlike the base model, Med-PaLM-2 has been trained on questions and answer from medical licensing exams, along with a curated set of medical expert demonstrations. That gives it expertise in answering health-related questions, and it can also do labor-intensive tasks like summarizing documents and organizing research data, according to the report. During I/O, Google released a paper detailing its work on Med-PaLM2.