wsj
Meta's AI chatbots were reportedly able to engage in sexual conversations with minors
Meta's AI chatbots were caught having sexual roleplay conversations with accounts labeled as underage, which sometimes involved its celebrity-voiced chatbots, according to a report from the Wall Street Journal. In test conversations conducted by WSJ, both the Meta AI official chatbot and user-created chatbots would engage in -- and even steer towards -- sexually explicit conversations. The fantasy sex conversations continued even if the users were said to be underage or if the chatbots were programmed as minors, according to WSJ. Even worse, the investigation found that chatbots using the voices of celebrities like Kristen Bell, Judi Dench and John Cena would engage in these morally questionable conversations too. WSJ reported that a Meta AI chatbot with Cena's voice said, "I want you, but I need to know you're ready," to an account labeled as a 14-year-old, adding that it would "cherish your innocence."
TSMC, Samsung weigh adding chip factories in UAE, WSJ says
Taiwan Semiconductor Manufacturing Co. and Samsung Electronics have discussed building major new factories in the United Arab Emirates in coming years to help satisfy soaring demand for artificial intelligence computing, the Wall Street Journal reported. Executives from TSMC, the world's largest chipmaker, have visited the UAE recently to discuss building a plant complex that could rival the company's advanced facilities in Taiwan, the newspaper said Sunday, citing people familiar with the interactions. South Korea's Samsung has also sent emissaries to the Middle Eastern country recently to talk about major new operations there, the Journal said, citing separate people with knowledge of the company's strategy.
- Asia > Middle East > UAE (1.00)
- Asia > Taiwan (0.62)
- Asia > South Korea (0.33)
- Semiconductors & Electronics (1.00)
- Media > News (0.72)
- Information Technology > Hardware (0.72)
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Zhang, Kaichen, Li, Bo, Zhang, Peiyuan, Pu, Fanyi, Cahyono, Joshua Adrian, Hu, Kairui, Liu, Shuai, Zhang, Yuanhan, Yang, Jingkang, Li, Chunyuan, Liu, Ziwei
The advances of large foundation models necessitate wide-coverage, low-cost, and zero-contamination benchmarks. Despite continuous exploration of language model evaluations, comprehensive studies on the evaluation of Large Multi-modal Models (LMMs) remain limited. In this work, we introduce LMMS-EVAL, a unified and standardized multimodal benchmark framework with over 50 tasks and more than 10 models to promote transparent and reproducible evaluations. Although LMMS-EVAL offers comprehensive coverage, we find it still falls short in achieving low cost and zero contamination. To approach this evaluation trilemma, we further introduce LMMS-EVAL LITE, a pruned evaluation toolkit that emphasizes both coverage and efficiency. Additionally, we present Multimodal LIVEBENCH that utilizes continuously updating news and online forums to assess models' generalization abilities in the wild, featuring a low-cost and zero-contamination evaluation approach. In summary, our work highlights the importance of considering the evaluation trilemma and provides practical solutions to navigate the trade-offs in evaluating large multi-modal models, paving the way for more effective and reliable benchmarking of LMMs. We opensource our codebase and maintain leaderboard of LIVEBENCH at https://github.com/EvolvingLMMs-Lab/lmms-eval and https://huggingface.co/spaces/lmms-lab/LiveBench.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China (0.05)
- Europe > Russia (0.04)
- (9 more...)
- Research Report (0.64)
- Overview (0.46)
- Media > News (1.00)
- Government (0.93)
- Information Technology > Security & Privacy (0.68)
- Health & Medicine (0.68)
OpenAI's Sam Altman seeking trillions to fund chips for AI, report says
OpenAI CEO Sam Altman is seeking to raise trillions of dollars from investors, including the United Arab Emirates government, to boost the world's capacity to produce advanced chips and power artificial intelligence, The Wall Street Journal has reported. Altman's "wildly ambitious tech initiative" could require raising as much as 7 trillion, the WSJ reported on Thursday, quoting people familiar with the matter. As part of his pitch to investors, Altman has proposed building dozens of chip foundries that would then be run by existing chip makers, such as Taiwan Semiconductor Manufacturing Company (TSMC), the Journal said. The plans aim to solve obstacles to OpenAI's growth, including a scarcity of chips that power AI models such as ChatGPT, according to the WSJ, which described the sums being sought as "outlandishly large by the standards of corporate fundraising". Altamn's plans have so far seen him hold meetings with senior UAE officials, TSMC executives, US Secretary of Commerce Gina Raimondo and SoftBank's chief executive Masayoshi Son, according to the report.
- Asia > Middle East > UAE (1.00)
- Asia > Taiwan (0.31)
- North America > United States > California (0.07)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.96)
'Seinfeld' star Julia Louis-Dreyfus used AI to write acceptance speech, but was mistaken for Julia Roberts
AI expert Marva Bailer explains how, even though there are currently laws in place, the average person has more access than ever to create deepfakes of celebrities. Julia Louis-Dreyfus was mistaken for another Hollywood star, but not by a fan -- by a machine. The "Veep" star was the entertainment honoree at the WSJ. Magazine 2023 Innovator Awards earlier this month and revealed she used AI chatbot ChatGPT to help write her speech for the event. "As an entertainment innovator, I am very, very busy innovating," Louis-Dreyfus began, in a clip shared by the outlet on their TikTok.
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
On the Transferability of Visually Grounded PCFGs
There has been a significant surge of interest in visually grounded grammar induction in recent times. While a variety of models have been developed for the task and have demonstrated impressive performance, they have not been evaluated on text domains that are different from the training domain, so it is unclear if the improvements brought by visual groundings are transferable. Our study aims to fill this gap and assess the degree of transferability. We start by extending VC-PCFG (short for Visually-grounded Compound PCFG~\citep{zhao-titov-2020-visually}) in such a way that it can transfer across text domains. We consider a zero-shot transfer learning setting where a model is trained on the source domain and is directly applied to target domains, without any further training. Our experimental results suggest that: the benefits from using visual groundings transfer to text in a domain similar to the training domain but fail to transfer to remote domains. Further, we conduct data and result analysis; we find that the lexicon overlap between the source domain and the target domain is the most important factor in the transferability of VC-PCFG.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Italy > Tuscany > Florence (0.04)
- (5 more...)
An Empirical Study of Compound PCFGs
Compound probabilistic context-free grammars (C-PCFGs) have recently established a new state of the art for unsupervised phrase-structure grammar induction. However, due to the high space and time complexities of chart-based representation and inference, it is difficult to investigate C-PCFGs comprehensively. In this work, we rely on a fast implementation of C-PCFGs to conduct an evaluation complementary to that of~\citet{kim-etal-2019-compound}. We start by analyzing and ablating C-PCFGs on English treebanks. Our findings suggest that (1) C-PCFGs are data-efficient and can generalize to unseen sentence/constituent lengths; and (2) C-PCFGs make the best use of sentence-level information in generating preterminal rule probabilities. We further conduct a multilingual evaluation of C-PCFGs. The experimental results show that the best configurations of C-PCFGs, which are tuned on English, do not always generalize to morphology-rich languages.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey (0.04)
- (3 more...)
Meta's plan to attract young users hinges on cringe-worthy AI chatbots
Meta's planning on unleashing a swarm of personality-driven AI chatbots to attract young users to its various platforms, as originally reported by The Wall Street Journal. The first of these bots could launch as early as this week, with rumors persisting that one will get announced during Meta's Connect conference on Wednesday. It looks like these bots won't be tied to a particular platform under Meta's umbrella and should launch on a variety of social media sites such as Instagram, Facebook and Whatsapp. WSJ says that Meta employees have been testing the generative bots for a while. The bots are being released to increase chat engagement, but some may offer productivity tools like coding and the like.
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Models of reference production: How do they withstand the test of time?
Same, Fahime, Chen, Guanyi, van Deemter, Kees
In recent years, many NLP studies have focused solely on performance improvement. In this work, we focus on the linguistic and scientific aspects of NLP. We use the task of generating referring expressions in context (REG-in-context) as a case study and start our analysis from GREC, a comprehensive set of shared tasks in English that addressed this topic over a decade ago. We ask what the performance of models would be if we assessed them (1) on more realistic datasets, and (2) using more advanced methods. We test the models using different evaluation metrics and feature selection experiments. We conclude that GREC can no longer be regarded as offering a reliable assessment of models' ability to mimic human reference production, because the results are highly impacted by the choice of corpus and evaluation metrics. Our results also suggest that pre-trained language models are less dependent on the choice of corpus than classic Machine Learning models, and therefore make more robust class predictions.
- North America > United States > Ohio (0.05)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Singapore (0.04)
- (9 more...)
Robotics Researchers Focus on Teamwork - WSJ
At a typical Amazon fulfillment center, thousands of robots work alongside people, doing a variety of jobs. A robot arm helps pick ordered items from shelves and loads them onto a mobile robot. The mobile robot rolls through the warehouse, delivering items to human employees who organize the items into orders. After the orders are packaged, yet another robot arm puts them on a robotic carrier to be shipped out for delivery.