hashimoto
Jona Health Review: Microbiome Decoder for Health Conditions
I'm really glad I took this mail-order medical-grade microbiome shotgun test to look for warning signs of health conditions. All products featured on WIRED are independently selected by our editors. However, when you buy something through our retail links, we may earn an affiliate commission. Medical-grade shotgun test is the gold standard. "Show the work," so you can see which studies it's referencing. Results can be confusing or conflicting. Need a doctor to understand some of the results. We hear a lot about the microbiome, also known as the zoo of different bacteria living in your digestive system. We know some are good and some are bad.
Alif: Advancing Urdu Large Language Models via Multilingual Synthetic Data Distillation
Shafique, Muhammad Ali, Mehreen, Kanwal, Arham, Muhammad, Amjad, Maaz, Butt, Sabur, Farooq, Hamza
Developing a high-performing large language models (LLMs) for low-resource languages such as Urdu, present several challenges. These challenges include the scarcity of high-quality datasets, multilingual inconsistencies, and safety concerns. Existing multilingual LLMs often address these issues by translating large volumes of available data. However, such translations often lack quality and cultural nuance while also incurring significant costs for data curation and training. To address these issues, we propose Alif-1.0-8B-Instruct, a multilingual Urdu-English model, that tackles these challenges with a unique approach. We train the model on a high-quality, multilingual synthetic dataset (Urdu-Instruct), developed using a modified self-instruct technique. By using unique prompts and seed values for each task along with a global task pool, this dataset incorporates Urdu-native chain-of-thought based reasoning, bilingual translation, cultural relevance, and ethical safety alignments. This technique significantly enhances the comprehension of Alif-1.0-8B-Instruct model for Urdu-specific tasks. As a result, Alif-1.0-8B-Instruct, built upon the pretrained Llama-3.1-8B, demonstrates superior performance compared to Llama-3.1-8B-Instruct for Urdu specific-tasks. It also outperformed leading multilingual LLMs, including Mistral-7B-Instruct-v0.3, Qwen-2.5-7B-Instruct, and Cohere-Aya-Expanse-8B, all within a training budget of under $100. Our results demonstrate that high-performance and low-resource language LLMs can be developed efficiently and culturally aligned using our modified self-instruct approach. All datasets, models, and code are publicly available at: https://github.com/traversaal-ai/alif-urdu-llm.
Woman says ChatGPT saved her life by helping detect cancer, which doctors missed
Fox News senior medical analyst Dr. Marc Siegel joined'Fox & Friends' to discuss the impact of artificial intelligence on medicine and his take on President Trump's decision to withdraw from the World Health Organization. A mother of two credits ChatGPT for saving her life, claiming the artificial intelligence chatbot flagged the condition leading to her cancer when doctors missed it. Lauren Bannon, who divides her time between North Carolina and the U.S. Virgin Islands, first noticed in February 2024 that she was having trouble bending her fingers in the morning and evening, as reported by Kennedy News and Media. After four months, the 40-year-old was told by doctors that she had rheumatoid arthritis, despite testing negative for the condition. WHAT IS ARTIFICIAL INTELLIGENCE (AI)?
Multi-group Uncertainty Quantification for Long-form Text Generation
Liu, Terrance, Wu, Zhiwei Steven
While large language models are rapidly moving towards consumer-facing applications, they are often still prone to factual errors and hallucinations. In order to reduce the potential harms that may come from these errors, it is important for users to know to what extent they can trust an LLM when it makes a factual claim. To this end, we study the problem of uncertainty quantification of factual correctness in long-form natural language generation. Given some output from a large language model, we study both uncertainty at the level of individual claims contained within the output (via calibration) and uncertainty across the entire output itself (via conformal prediction). Moreover, we invoke multicalibration and multivalid conformal prediction to ensure that such uncertainty guarantees are valid both marginally and across distinct groups of prompts. Using the task of biography generation, we demonstrate empirically that having access to and making use of additional group attributes for each prompt improves both overall and group-wise performance. As the problems of calibration, conformal prediction, and their multi-group counterparts have not been extensively explored previously in the context of long-form text generation, we consider these empirical results to form a benchmark for this setting.
On the Benefits of Fine-Grained Loss Truncation: A Case Study on Factuality in Summarization
Flores, Lorenzo Jaime Yu, Cohan, Arman
Text summarization and simplification are among the most widely used applications of AI. However, models developed for such tasks are often prone to hallucination, which can result from training on unaligned data. One efficient approach to address this issue is Loss Truncation (LT) (Kang and Hashimoto, 2020), an approach to modify the standard log loss to adaptively remove noisy examples during training. However, we find that LT alone yields a considerable number of hallucinated entities on various datasets. We study the behavior of the underlying losses between factual and non-factual examples, to understand and refine the performance of LT. We demonstrate that LT's performance is limited when the underlying assumption that noisy targets have higher NLL loss is not satisfied, and find that word-level NLL among entities provides better signal for distinguishing factuality. We then leverage this to propose a fine-grained NLL loss and fine-grained data cleaning strategies, and observe improvements in hallucination reduction across some datasets. Our work is available at https://https://github.com/yale-nlp/fine-grained-lt.
Apple Is an AI Company Now
After more than a decade, autocorrect "fails" could be on their way out. Apple's much-maligned spelling software is getting upgraded by artificial intelligence: Using sophisticated language models, the new autocorrect won't just check words against a dictionary, but will be able to consider the context of the word in a sentence. In theory, it won't suggest consolation when you mean consolidation, because it'll know that those words aren't interchangeable. The next generation of autocorrect was one of several small updates to the iPhone experience that Apple announced earlier this month. The Photos app will be able to differentiate between your dog and other dogs, automatically recognizing your pup the same way it recognizes people who frequently appear in your pictures.
Self-Supervised Pre-Training for Deep Image Prior-Based Robust PET Image Denoising
Onishi, Yuya, Hashimoto, Fumio, Ote, Kibo, Matsubara, Keisuke, Ibaraki, Masanobu
Deep image prior (DIP) has been successfully applied to positron emission tomography (PET) image restoration, enabling represent implicit prior using only convolutional neural network architecture without training dataset, whereas the general supervised approach requires massive low- and high-quality PET image pairs. To answer the increased need for PET imaging with DIP, it is indispensable to improve the performance of the underlying DIP itself. Here, we propose a self-supervised pre-training model to improve the DIP-based PET image denoising performance. Our proposed pre-training model acquires transferable and generalizable visual representations from only unlabeled PET images by restoring various degraded PET images in a self-supervised approach. We evaluated the proposed method using clinical brain PET data with various radioactive tracers ($^{18}$F-florbetapir, $^{11}$C-Pittsburgh compound-B, $^{18}$F-fluoro-2-deoxy-D-glucose, and $^{15}$O-CO$_{2}$) acquired from different PET scanners. The proposed method using the self-supervised pre-training model achieved robust and state-of-the-art denoising performance while retaining spatial details and quantification accuracy compared to other unsupervised methods and pre-training model. These results highlight the potential that the proposed method is particularly effective against rare diseases and probes and helps reduce the scan time or the radiotracer dose without affecting the patients.