Large Language Model
The Morning After: The Amazon Prime Day deals worth your time and money
It's back and here to ruin our savings and increase the gadgets in our homes. Yes, Amazon Prime Day isn't entirely about headphones, tablets and wearables, but for Engadget staff… well, it feels like it is. Prime Day deals on tech are typically only matched by Black Friday and Cyber Monday deals, making it a good time to pick up any devices you want – at a discount. In the past, the best devices often weren't given the Prime Day discount treatment, but this year has several things I not only bought myself but have recommended to friends and family. That includes $50 off the second-generation AirPods Pro (literally using them as I write this newsletter), last year's Kindle e-reader, down from $100 to $65 and, my pick for the best smartphone under $500, the Google Pixel 7a, now a dollar shy of $450.
Meta's Twitter rival Threads surges to 100 million sign-ups faster than ChatGPT
Meta Platforms' Twitter rival Threads crossed 100 million sign-ups within five days of launch, CEO Mark Zuckerberg said Monday, dethroning ChatGPT as the fastest-growing online platform to hit the milestone. Threads has been setting records for user growth since its launch Wednesday, with celebrities, politicians and other newsmakers joining the platform that is seen by analysts as the first serious threat to the Elon Musk-owned microblogging app. "That's mostly organic demand, and we haven't even turned on many promotions yet," Zuckerberg said in a Threads post announcing the milestone. This could be due to a conflict with your ad-blocking or security software. Please add japantimes.co.jp and piano.io to your list of allowed sites.
BLUEX: A benchmark based on Brazilian Leading Universities Entrance eXams
Almeida, Thales Sales, Laitz, Thiago, Bonás, Giovana K., Nogueira, Rodrigo
One common trend in recent studies of language models (LMs) is the use of standardized tests for evaluation. However, despite being the fifth most spoken language worldwide, few such evaluations have been conducted in Portuguese. This is mainly due to the lack of high-quality datasets available to the community for carrying out evaluations in Portuguese. To address this gap, we introduce the Brazilian Leading Universities Entrance eXams (BLUEX), a dataset of entrance exams from the two leading universities in Brazil: UNICAMP and USP. The dataset includes annotated metadata for evaluating the performance of NLP models on a variety of subjects. Furthermore, BLUEX includes a collection of recently administered exams that are unlikely to be included in the training data of many popular LMs as of 2023. The dataset is also annotated to indicate the position of images in each question, providing a valuable resource for advancing the state-of-the-art in multimodal language understanding and reasoning. We describe the creation and characteristics of BLUEX and establish a benchmark through experiments with state-of-the-art LMs, demonstrating its potential for advancing the state-of-the-art in natural language understanding and reasoning in Portuguese.
Can I say, now machines can think?
Aggarwal, Nitisha, Saxena, Geetika Jain, Singh, Sanjeev, Pundir, Amit
Generative AI techniques have opened the path for new generations of machines in diverse domains. These machines have various capabilities for example, they can produce images, generate answers or stories, and write codes based on the "prompts" only provided by users. These machines are considered 'thinking minds' because they have the ability to generate human-like responses. In this study, we have analyzed and explored the capabilities of artificial intelligence-enabled machines. We have revisited on Turing's concept of thinking machines and compared it with recent technological advancements. The objections and consequences of the thinking machines are also discussed in this study, along with available techniques to evaluate machines' cognitive capabilities. We have concluded that Turing Test is a critical aspect of evaluating machines' ability. However, there are other aspects of intelligence too, and AI machines exhibit most of these aspects.
Towards Robust and Efficient Continual Language Learning
Fisch, Adam, Rannen-Triki, Amal, Pascanu, Razvan, Bornschein, Jörg, Lazaridou, Angeliki, Gribovskaya, Elena, Ranzato, Marc'Aurelio
As the application space of language models continues to evolve, a natural question to ask is how we can quickly adapt models to new tasks. We approach this classic question from a continual learning perspective, in which we aim to continue fine-tuning models trained on past tasks on new tasks, with the goal of "transferring" relevant knowledge. However, this strategy also runs the risk of doing more harm than good, i.e., negative transfer. In this paper, we construct a new benchmark of task sequences that target different possible transfer scenarios one might face, such as a sequence of tasks with high potential of positive transfer, high potential for negative transfer, no expected effect, or a mixture of each. An ideal learner should be able to maximally exploit information from all tasks that have any potential for positive transfer, while also avoiding the negative effects of any distracting tasks that may confuse it. We then propose a simple, yet effective, learner that satisfies many of our desiderata simply by leveraging a selective strategy for initializing new models from past task checkpoints. Still, limitations remain, and we hope this benchmark can help the community to further build and analyze such learners.
Objaverse-XL: A Universe of 10M+ 3D Objects
Deitke, Matt, Liu, Ruoshi, Wallingford, Matthew, Ngo, Huong, Michel, Oscar, Kusupati, Aditya, Fan, Alan, Laforte, Christian, Voleti, Vikram, Gadre, Samir Yitzhak, VanderBilt, Eli, Kembhavi, Aniruddha, Vondrick, Carl, Gkioxari, Georgia, Ehsani, Kiana, Schmidt, Ludwig, Farhadi, Ali
Natural language processing and 2D vision models have attained remarkable proficiency on many tasks primarily by escalating the scale of training data. However, 3D vision tasks have not seen the same progress, in part due to the challenges of acquiring high-quality 3D data. In this work, we present Objaverse-XL, a dataset of over 10 million 3D objects. Our dataset comprises deduplicated 3D objects from a diverse set of sources, including manually designed objects, photogrammetry scans of landmarks and everyday items, and professional scans of historic and antique artifacts. Representing the largest scale and diversity in the realm of 3D datasets, Objaverse-XL enables significant new possibilities for 3D vision. Our experiments demonstrate the improvements enabled with the scale provided by Objaverse-XL. We show that by training Zero123 on novel view synthesis, utilizing over 100 million multi-view rendered images, we achieve strong zero-shot generalization abilities. We hope that releasing Objaverse-XL will enable further innovations in the field of 3D vision at scale.
Better Handling Coreference Resolution in Aspect Level Sentiment Classification by Fine-Tuning Language Models
Mullick, Dhruv, Ghanem, Bilal, Fyshe, Alona
Customer feedback is invaluable to companies as they refine their products. Monitoring customer feedback can be automated with Aspect Level Sentiment Classification (ALSC) which allows us to analyse specific aspects of the products in reviews. Large Language Models (LLMs) are the heart of many state-of-the-art ALSC solutions, but they perform poorly in some scenarios requiring Coreference Resolution (CR). In this work, we propose a framework to improve an LLM's performance on CR-containing reviews by fine tuning on highly inferential tasks. We show that the performance improvement is likely attributed to the improved model CR ability. We also release a new dataset that focuses on CR in ALSC.
Synthetic Dataset for Evaluating Complex Compositional Knowledge for Natural Language Inference
Akoju, Sushma Anand, Vacareanu, Robert, Riaz, Haris, Blanco, Eduardo, Surdeanu, Mihai
We introduce a synthetic dataset called Sentences Involving Complex Compositional Knowledge (SICCK) and a novel analysis that investigates the performance of Natural Language Inference (NLI) models to understand compositionality in logic. We produce 1,304 sentence pairs by modifying 15 examples from the SICK dataset (Marelli et al., 2014). To this end, we modify the original texts using a set of phrases - modifiers that correspond to universal quantifiers, existential quantifiers, negation, and other concept modifiers in Natural Logic (NL) (MacCartney, 2009). We use these phrases to modify the subject, verb, and object parts of the premise and hypothesis. Lastly, we annotate these modified texts with the corresponding entailment labels following NL rules. We conduct a preliminary verification of how well the change in the structural and semantic composition is captured by neural NLI models, in both zero-shot and fine-tuned scenarios. We found that the performance of NLI models under the zero-shot setting is poor, especially for modified sentences with negation and existential quantifiers. After fine-tuning this dataset, we observe that models continue to perform poorly over negation, existential and universal modifiers.