Goto

Collaborating Authors

 san jose



Nvidia to invest 5bn in Intel after Trump administration's 10% stake

The Guardian

Nvidia's CEO, Jensen Huang, talks during the keynote address of Nvidia GTC on 18 March 2025 in San Jose, California. Nvidia's CEO, Jensen Huang, talks during the keynote address of Nvidia GTC on 18 March 2025 in San Jose, California. Nvidia to invest $5bn in Intel after Trump administration's 10% stake Nvidia, the world's leading chipmaker, has announced plans to invest $5bn in Intel and collaborate with the struggling semiconductor company on products. A month after the Trump administration confirmed it had taken a 10% stake in Intel - the latest extraordinary intervention by the White House in corporate America - Nvidia said it would team up with the firm to work on custom datacenters that form the backbone of artificial intelligence (AI) infrastructure, as well as personal computer products. Intel shares jumped nearly 23% after markets closed, making it the largest one-day percentage gain for the company since 1987.



Prosody as a Teaching Signal for Agent Learning: Exploratory Studies and Algorithmic Implications

Knierim, Matilda, Jain, Sahil, Aydoğan, Murat Han, Mitra, Kenneth, Desai, Kush, Saran, Akanksha, Baraka, Kim

arXiv.org Artificial Intelligence

Agent learning from human interaction often relies on explicit signals, but implicit social cues, such as prosody in speech, could provide valuable information for more effective learning. This paper advocates for the integration of prosody as a teaching signal to enhance agent learning from human teachers. Through two exploratory studies--one examining voice feedback in an interactive reinforcement learning setup and the other analyzing restricted audio from human demonstrations in three Atari games--we demonstrate that prosody carries significant information about task dynamics. Our findings suggest that prosodic features, when coupled with explicit feedback, can enhance reinforcement learning outcomes. Moreover, we propose guidelines for prosody-sensitive algorithm design and discuss insights into teaching behavior. Our work underscores the potential of leveraging prosody as an implicit signal for more efficient agent learning, thus advancing human-agent interaction paradigms.


Detecting AI-Generated Texts in Cross-Domains

Zhou, You, Wang, Jie

arXiv.org Artificial Intelligence

Existing tools to detect text generated by a large language model (LLM) have met with certain success, but their performance can drop when dealing with texts in new domains. To tackle this issue, we train a ranking classifier called RoBERTa-Ranker, a modified version of RoBERTa, as a baseline model using a dataset we constructed that includes a wider variety of texts written by humans and generated by various LLMs. We then present a method to fine-tune RoBERTa-Ranker that requires only a small amount of labeled data in a new domain. Experiments show that this fine-tuned domain-aware model outperforms the popular DetectGPT and GPTZero on both in-domain and cross-domain texts, where AI-generated texts may either be in a different domain or generated by a different LLM not used to generate the training datasets. This approach makes it feasible and economical to build a single system to detect AI-generated texts across various domains.


Mitigation of gender bias in automatic facial non-verbal behaviors generation

Delbosc, Alice, Ochs, Magalie, Sabouret, Nicolas, Ravenet, Brian, Ayache, Stephane

arXiv.org Artificial Intelligence

Research on non-verbal behavior generation for social interactive agents focuses mainly on the believability and synchronization of non-verbal cues with speech. However, existing models, predominantly based on deep learning architectures, often perpetuate biases inherent in the training data. This raises ethical concerns, depending on the intended application of these agents. This paper addresses these issues by first examining the influence of gender on facial non-verbal behaviors. We concentrate on gaze, head movements, and facial expressions. We introduce a classifier capable of discerning the gender of a speaker from their non-verbal cues. This classifier achieves high accuracy on both real behavior data, extracted using state-of-the-art tools, and synthetic data, generated from a model developed in previous work.Building upon this work, we present a new model, FairGenderGen, which integrates a gender discriminator and a gradient reversal layer into our previous behavior generation model. This new model generates facial non-verbal behaviors from speech features, mitigating gender sensitivity in the generated behaviors. Our experiments demonstrate that the classifier, developed in the initial phase, is no longer effective in distinguishing the gender of the speaker from the generated non-verbal behaviors.


Multilingual Dyadic Interaction Corpus NoXi+J: Toward Understanding Asian-European Non-verbal Cultural Characteristics and their Influences on Engagement

Funk, Marius, Okada, Shogo, André, Elisabeth

arXiv.org Artificial Intelligence

Non-verbal behavior is a central challenge in understanding the dynamics of a conversation and the affective states between interlocutors arising from the interaction. Although psychological research has demonstrated that non-verbal behaviors vary across cultures, limited computational analysis has been conducted to clarify these differences and assess their impact on engagement recognition. To gain a greater understanding of engagement and non-verbal behaviors among a wide range of cultures and language spheres, in this study we conduct a multilingual computational analysis of non-verbal features and investigate their role in engagement and engagement prediction. To achieve this goal, we first expanded the NoXi dataset, which contains interaction data from participants living in France, Germany, and the United Kingdom, by collecting session data of dyadic conversations in Japanese and Chinese, resulting in the enhanced dataset NoXi+J. Next, we extracted multimodal non-verbal features, including speech acoustics, facial expressions, backchanneling and gestures, via various pattern recognition techniques and algorithms. Then, we conducted a statistical analysis of listening behaviors and backchannel patterns to identify culturally dependent and independent features in each language and common features among multiple languages. These features were also correlated with the engagement shown by the interlocutors. Finally, we analyzed the influence of cultural differences in the input features of LSTM models trained to predict engagement for five language datasets. A SHAP analysis combined with transfer learning confirmed a considerable correlation between the importance of input features for a language set and the significant cultural characteristics analyzed.


"Is This It?": Towards Ecologically Valid Benchmarks for Situated Collaboration

Bohus, Dan, Andrist, Sean, Bao, Yuwei, Horvitz, Eric, Paradiso, Ann

arXiv.org Artificial Intelligence

To track the performance of emerging models and understand their capabilities, the research community has developed We report initial work towards constructing ecologically valid a variety of benchmarks for video-and embodied-question answering benchmarks to assess the capabilities of large multimodal models [7, 9, 11, 12, 14, 20, 22]. These benchmarks are typically for engaging in situated collaboration. In contrast to existing constructed by identifying a preexisting multimodal dataset (or benchmarks, in which question-answer pairs are generated post creating a synthetic one via a virtual environment), and then generating hoc over preexisting or synthetic datasets via templates, human question-answer pairs from templates, human annotators, or annotators, or large language models (LLMs), we propose and investigate via LLMs. The questions are designed to probe model capabilities an interactive system-driven approach, where the questions along various dimensions, such as spatial understanding, episodic are generated by users in context, during their interactions with memory, and the recognition of objects and their attributes. While an end-to-end situated AI system. We illustrate how the questions these benchmarks provide useful probes for model competency, we that arise are different in form and content from questions typically argue that they do not accurately capture the types of questions found in existing embodied question answering (EQA) benchmarks users ask when engaged in a real-time task, and thus do not reflect and discuss new real-world challenge problems brought to the fore.


Better Alignment with Instruction Back-and-Forth Translation

Nguyen, Thao, Li, Jeffrey, Oh, Sewoong, Schmidt, Ludwig, Weston, Jason, Zettlemoyer, Luke, Li, Xian

arXiv.org Artificial Intelligence

We propose a new method, instruction back-and-forth translation, to construct high-quality synthetic data grounded in world knowledge for aligning large language models (LLMs). Given documents from a web corpus, we generate and curate synthetic instructions using the backtranslation approach proposed by Li et al.(2023a), and rewrite the responses to improve their quality further based on the initial documents. Fine-tuning with the resulting (backtranslated instruction, rewritten response) pairs yields higher win rates on AlpacaEval than using other common instruction datasets such as Humpback, ShareGPT, Open Orca, Alpaca-GPT4 and Self-instruct. We also demonstrate that rewriting the responses with an LLM outperforms direct distillation, and the two generated text distributions exhibit significant distinction in embedding space. Further analysis shows that our backtranslated instructions are of higher quality than other sources of synthetic instructions, while our responses are more diverse and complex than those obtained from distillation. Overall we find that instruction back-and-forth translation combines the best of both worlds -- making use of the information diversity and quantity found on the web, while ensuring the quality of the responses which is necessary for effective alignment.


From Stem to Stern: Contestability Along AI Value Chains

Balayn, Agathe, Pi, Yulu, Widder, David Gray, Alfrink, Kars, Yurrita, Mireia, Upadhyay, Sohini, Karusala, Naveena, Lyons, Henrietta, Turkay, Cagatay, Tessono, Christelle, Attard-Frost, Blair, Gadiraju, Ujwal

arXiv.org Artificial Intelligence

This workshop will grow and consolidate a community of interdisciplinary CSCW researchers focusing on the topic of contestable AI. As an outcome of the workshop, we will synthesize the most pressing opportunities and challenges for contestability along AI value chains in the form of a research roadmap. This roadmap will help shape and inspire imminent work in this field. Considering the length and depth of AI value chains, it will especially spur discussions around the contestability of AI systems along various sites of such chains. The workshop will serve as a platform for dialogue and demonstrations of concrete, successful, and unsuccessful examples of AI systems that (could or should) have been contested, to identify requirements, obstacles, and opportunities for designing and deploying contestable AI in various contexts. This will be held primarily as an in-person workshop, with some hybrid accommodation. The day will consist of individual presentations and group activities to stimulate ideation and inspire broad reflections on the field of contestable AI. Our aim is to facilitate interdisciplinary dialogue by bringing together researchers, practitioners, and stakeholders to foster the design and deployment of contestable AI.