Goto

Collaborating Authors

 Large Language Model


Harnessing human-AI collaboration for an AI roadmap that moves beyond pilots

MIT Technology Review

In this exclusive webcast, Concentrix's Ryan Peterson, Everest Group's Shirley Hung, and Valmont's Heidi Hough discuss turning AI ambitions into operational advantages. The past year has marked a turning point in the corporate AI conversation. After a period of eager experimentation, organizations are now confronting a more complex reality: While investment in AI has never been higher, the path from pilot to production remains elusive. Three-quarters of enterprises remain stuck in experimentation mode, despite mounting pressure to convert early tests into operational gains. "Most organizations can suffer from what we like to call PTSD, or process technology skills and data challenges," says Shirley Hung, partner at Everest Group. "They have rigid, fragmented workflows that don't adapt well to change, technology systems that don't speak to each other, talent that is really immersed in low-value tasks rather than creating high impact.


Microsoft's Copilot AI PC plan fizzled, but it still served a purpose

Engadget

Microsoft's Copilot+ AI PC plan fizzled, but it still served a purpose At least Microsoft was able to reshape premium PCs. Microsoft's Copilot+ initiative launched last year with a clear goal: To produce capable laptops for people eagerly anticipating AI-powered features. Read that sentence again, and it's glaringly obvious that Microsoft's plan was flawed from the start. Most consumers aren't nearly as hyped for AI features as the companies eager to foist artificial intelligence upon us. Microsoft's Recall -- which snaps screenshots of your PC to create a database of everything you've done-was dogged by privacy concerns from the start.


The era of AI persuasion in elections is about to begin

MIT Technology Review

AI is eminently capable of political persuasion and could automate it at a mass scale. In January 2024, the phone rang in homes all around New Hampshire. On the other end was Joe Biden's voice, urging Democrats to "save your vote" by skipping the primary. It sounded authentic, but it wasn't. The call was a fake, generated by artificial intelligence. Today, the technology behind that hoax looks quaint.


It's Time to Save Silicon Valley From Itself

WIRED

Big Tech has lost its way. At WIRED's Big Interview event, Techdirt editor Mike Masnick and Common Tools CEO Alex Komoroske announced a manifesto designed to help the industry get back on track. Alex Komoroske has always been at odds with Big Tech's darker side. Though he cut his product-management teeth at Google and Stripe, he was never comfortable with the industry's increasing prioritization of profits over people. Once during his time at Google, he extolled the societal benefits of a project only to be met with, "Oh Alex, you'd be a VP by now if you just stopped thinking through the implications of your actions."


Bridging Online Behavior and Clinical Insight: A Longitudinal LLM-based Study of Suicidality on YouTube Reveals Novel Digital Markers

arXiv.org Artificial Intelligence

Suicide remains a leading cause of death in Western countries. As social media becomes central to daily life, digital footprints offer valuable insight into suicidal behavior. Focusing on individuals who attempted suicide while uploading videos to their channels, we investigate: How do linguistic patterns on YouTube reflect suicidal behavior, and how do these patterns align with or differ from expert knowledge? We examined linguistic changes around suicide attempts and compared individuals who attempted suicide while actively uploading to their channel with three control groups: those with prior attempts, those experiencing major life events, and matched individuals from the broader cohort. Applying complementary bottom-up, hybrid, and expert-driven approaches, we analyzed a novel longitudinal dataset of 181 suicide-attempt channels and 134 controls. In the bottom-up analysis, LLM-based topic-modeling identified 166 topics; five were linked to suicide attempts, two also showed attempt-related temporal changes (Mental Health Struggles, $OR = 1.74$; YouTube Engagement, $OR = 1.67$; $p < .01$). In the hybrid approach, clinical experts reviewed LLM-derived topics and flagged 19 as suicide-related. However, none showed significant effects beyond those identified bottom-up. YouTube Engagement, a platform-specific indicator, was not flagged, underscoring the value of bottom-up discovery. A top-down psychological assessment of suicide narratives revealed differing motivations: individuals describing prior attempts aimed to help others ($ฮฒ=-1.69$, $p<.01$), whereas those attempted during the uploading period emphasized personal recovery ($ฮฒ=1.08$, $p<.01$). By integrating these approaches, we offer a nuanced understanding of suicidality, bridging digital behavior and clinical insights.


DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

arXiv.org Artificial Intelligence

Recent unified multimodal large language models (MLLMs) have shown impressive capabilities, incorporating chain-of-thought (CoT) reasoning for enhanced text-to-image generation. However, existing approaches remain limited, either treating the model merely as a standalone generator or relying on abstract textual planning. T o this end, we propose Draft-as-CoT (DraCo), a novel interleaved reasoning paradigm that fully leverages both textual and visual contents in CoT for better planning and verification. Our method first generates a low-resolution draft image as preview, providing more concrete and structural visual planning and guidance. Then, we employ the model's inherent understanding capability to verify potential semantic misalignments between the draft and input prompt, and performs refinement through selective corrections with super-resolution. In this way, our approach addresses two fundamental challenges: the coarse-grained nature of textual planning and the difficulty in generating rare attribute combinations. T o support training, we curate DraCo-240K, aiming to enhance three atomic capabilities spanning general correction, instance manipulation, and layout reorganization. Supported by DraCo-CFG, a specialized classifier-free guidance (CFG) strategy for interleaved reasoning, DraCo achieves a tremendous increase on GenEval (+8%), Imagine-Bench (+0.91), and GenEval++ (+3%), significantly outperforming direct generation and other generation methods empowered by CoT. The project is at https://github.com/CaraJ7/DraCo.


Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning

arXiv.org Artificial Intelligence

Long context reasoning in large language models (LLMs) has demonstrated enhancement of their cognitive capabilities via chain-of-thought (CoT) inference. Training such models is usually done via reinforcement learning with verifiable rewards (RLVR) in reasoning based problems, like math and programming. However, RLVR is limited by several bottlenecks, such as, lack of dense reward, and inadequate sample efficiency. As a result, it requires significant compute resources in post-training phase. To overcome these limitations, in this work, we propose \textbf{Semantic Soft Bootstrapping (SSB)}, a self-distillation technique, in which the same base language model plays the role of both teacher and student, but receives different semantic contexts about the correctness of its outcome at training time. The model is first prompted with a math problem and several rollouts are generated. From them, the correct and most common incorrect response are filtered, and then provided to the model in context to produce a more robust, step-by-step explanation with a verified final answer. This pipeline automatically curates a paired teacher-student training set from raw problem-answer data, without any human intervention. This generation process also produces a sequence of logits, which is what the student model tries to match in the training phase just from the bare question alone. In our experiment, Qwen2.5-3B-Instruct on GSM8K dataset via parameter-efficient fine-tuning. We then tested its accuracy on MATH500, and AIME2024 benchmarks. Our experiments show a jump of 10.6%, and 10% improvements in accuracy, respectively, over group relative policy optimization (GRPO), which is a commonly used RLVR algorithm. Our code is available at https://github.com/purbeshmitra/semantic-soft-bootstrapping, and the model, curated dataset is available at https://huggingface.co/purbeshmitra/semantic-soft-bootstrapping.


Structured Document Translation via Format Reinforcement Learning

arXiv.org Artificial Intelligence

Recent works on structured text translation remain limited to the sentence level, as they struggle to effectively handle the complex document-level XML or HTML structures. To address this, we propose \textbf{Format Reinforcement Learning (FormatRL)}, which employs Group Relative Policy Optimization on top of a supervised fine-tuning model to directly optimize novel structure-aware rewards: 1) TreeSim, which measures structural similarity between predicted and reference XML trees and 2) Node-chrF, which measures translation quality at the level of XML nodes. Additionally, we apply StrucAUC, a fine-grained metric distinguishing between minor errors and major structural failures. Experiments on the SAP software-documentation benchmark demonstrate improvements across six metrics and an analysis further shows how different reward functions contribute to improvements in both structural and translation quality.


David vs. Goliath: Can Small Models Win Big with Agentic AI in Hardware Design?

arXiv.org Artificial Intelligence

Large Language Model(LLM) inference demands massive compute and energy, making domain-specific tasks expensive and unsustainable. As foundation models keep scaling, we ask: Is bigger always better for hardware design? Our work tests this by evaluating Small Language Models coupled with a curated agentic AI framework on NVIDIA's Comprehensive Verilog Design Problems(CVDP) benchmark. Results show that agentic workflows: through task decomposition, iterative feedback, and correction - not only unlock near-LLM performance at a fraction of the cost but also create learning opportunities for agents, paving the way for efficient, adaptive solutions in complex design tasks.


Multi-LLM Collaboration for Medication Recommendation

arXiv.org Artificial Intelligence

As healthcare increasingly turns to AI for scalable and trustworthy clinical decision support, ensuring reliability in model reasoning remains a critical challenge. Individual large language models (LLMs) are susceptible to hallucinations and inconsistency, whereas naive ensembles of models often fail to deliver stable and credible recommendations. Building on our previous work on LLM Chemistry, which quantifies the collaborative compatibility among LLMs, we apply this framework to improve the reliability in medication recommendation from brief clinical vignettes. Our approach leverages multi-LLM collaboration guided by Chemistry-inspired interaction modeling, enabling ensembles that are effective (exploiting complementary strengths), stable (producing consistent quality), and calibrated (minimizing interference and error amplification). We evaluate our Chemistry-based Multi-LLM collaboration strategy on real-world clinical scenarios to investigate whether such interaction-aware ensembles can generate credible, patient-specific medication recommendations. Preliminary results are encouraging, suggesting that LLM Chemistry-guided collaboration may offer a promising path toward reliable and trustworthy AI assistants in clinical practice.