Goto

Collaborating Authors

 acquaintance


LLMSR@XLLM25: Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation

Yuan, Jiahao, Sun, Xingzhe, Yu, Xing, Wang, Jingwen, Du, Dehui, Cui, Zhiqing, Di, Zixiang

arXiv.org Artificial Intelligence

The LLMSR@XLLM25 formulates a low-resource structural reasoning task that challenges LLMs to generate interpretable, step-by-step rationales with minimal labeled data. We present Less is More, the third-place winning approach in the LLMSR@XLLM25, which focuses on structured reasoning from only 24 labeled examples. Our approach leverages a multi-agent framework with reverse-prompt induction, retrieval-augmented reasoning synthesis via GPT-4o, and dual-stage reward-guided filtering to distill high-quality supervision across three subtasks: question parsing, CoT parsing, and step-level verification. All modules are fine-tuned from Meta-Llama-3-8B-Instruct under a unified LoRA+ setup. By combining structure validation with reward filtering across few-shot and zero-shot prompts, our pipeline consistently improves structure reasoning quality. These results underscore the value of controllable data distillation in enhancing structured inference under low-resource constraints. Our code is available at https://github.com/JhCircle/Less-is-More.


Using Deep Q-Learning to Dynamically Toggle between Push/Pull Actions in Computational Trust Mechanisms

Lygizou, Zoi, Kalles, Dimitris

arXiv.org Artificial Intelligence

Recent work on decentralized computational trust models for open Multi Agent Systems has resulted in the development of CA, a biologically inspired model which focuses on the trustee's perspective. This new model addresses a serious unresolved problem in existing trust and reputation models, namely the inability to handle constantly changing behaviors and agents' continuous entry and exit from the system. In previous work, we compared CA to FIRE, a well-known trust and reputation model, and found that CA is superior when the trustor population changes, whereas FIRE is more resilient to the trustee population changes. Thus, in this paper, we investigate how the trustors can detect the presence of several dynamic factors in their environment and then decide which trust model to employ in order to maximize utility. We frame this problem as a machine learning problem in a partially observable environment, where the presence of several dynamic factors is not known to the trustor and we describe how an adaptable trustor can rely on a few measurable features so as to assess the current state of the environment and then use Deep Q Learning (DQN), in a single-agent Reinforcement Learning setting, to learn how to adapt to a changing environment. We ran a series of simulation experiments to compare the performance of the adaptable trustor with the performance of trustors using only one model (FIRE or CA) and we show that an adaptable agent is indeed capable of learning when to use each model and, thus, perform consistently in dynamic environments.


How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO

Ng, Man Tik, Tse, Hui Tung, Huang, Jen-tse, Li, Jingjing, Wang, Wenxuan, Lyu, Michael R.

arXiv.org Artificial Intelligence

The role-play ability of Large Language Models (LLMs) has emerged as a popular research direction. However, existing studies focus on imitating well-known public figures or fictional characters, overlooking the potential for simulating ordinary individuals. Such an oversight limits the potential for advancements in digital human clones and non-player characters in video games. To bridge this gap, we introduce ECHO, an evaluative framework inspired by the Turing test. This framework engages the acquaintances of the target individuals to distinguish between human and machine-generated responses. Notably, our framework focuses on emulating average individuals rather than historical or fictional figures, presenting a unique advantage to apply the Turing Test. We evaluated three role-playing LLMs using ECHO, with GPT-3.5 and GPT-4 serving as foundational models, alongside the online application GPTs from OpenAI. Our results demonstrate that GPT-4 more effectively deceives human evaluators, and GPTs achieves a leading success rate of 48.3%. Furthermore, we investigated whether LLMs could discern between human-generated and machine-generated texts. While GPT-4 can identify differences, it could not determine which texts were human-produced. Our code and results of reproducing the role-playing LLMs are made publicly available via https://github.com/CUHK-ARISE/ECHO.


I felt numb – not sure what to do. How did deepfake images of me end up on a porn site?

The Guardian

There was an insistent knock at the door. This in itself was startling – it was the winter of 2020 and we hadn't yet returned to socialising indoors after lockdown. When I answered, I was surprised to see a male acquaintance of mine. He said he needed to speak to me. I knew it was something unprecedented because he asked to come in. He told me to sit down. That's when the adrenaline started coursing through me – people only suggest that when they're about to deliver bad news.


Identifying the style by a qualified reader on a short fragment of generated poetry

Orekhov, Boris

arXiv.org Artificial Intelligence

Style is an important concept in today's challenges in natural language generating. After the success in the field of image style transfer, the task of text style transfer became actual and attractive. Researchers are also interested in the tasks of style reproducing in generation of the poetic text. Evaluation of style reproducing in natural poetry generation remains a problem. I used 3 character-based LSTM-models to work with style reproducing assessment. All three models were trained on the corpus of texts by famous Russian-speaking poets. Samples were shown to the assessors and 4 answer options were offered, the style of which poet this sample reproduces. In addition, the assessors were asked how well they were familiar with the work of the poet they had named. Students studying history of literature were the assessors, 94 answers were received. It has appeared that accuracy of definition of style increases if the assessor can quote the poet by heart. Each model showed at least 0.7 macro-average accuracy. The experiment showed that it is better to involve a professional rather than a naive reader in the evaluation of style in the tasks of poetry generation, while lstm models are good at reproducing the style of Russian poets even on a limited training corpus.


The Rise Of Voice Cloning And DeepFakes In The Disinformation Wars

#artificialintelligence

In 2020, it was estimated that disinformation in the form of fake news costs around $78 billion annually. But deepfakes, mainly in social media, have matured and are fueled by the sophistication of artificial intelligence are moving into the business sector. In 2019, Deeptrace, a cybersecurity company reported that the number of online deepfake videos doubled, reaching close to 15,000 in under a year. Several startups like Truepic, that's raised $26 million from M12, Microsoft's venture arm, has taken a different approach to deepfakes. They focus on identifying not what is fake, tracking the authenticity of the content at the point it is captured.


Proliferation Of Machine Learning Video Chat In Relationships

#artificialintelligence

Machine learning is becoming more important in our daily lives. But most of us probably never envisioned a day when it would be important in online dating or the beginning of new relationships. A growing number of video chat services are utilizing machine learning features in interesting ways. MarTech Series published an article last year on the growing relevance of machine learning in video conferencing. The same principles can be just as applicable to video chats with online dating services.


There is no such thing as 'he's just not my type', scientists say

Daily Mail - Science & tech

Scientists say online daters and singletons'might as well let a stranger pick their dates' because they don't really know what they want in a romantic partner. US researchers say they've found little evidence that people actually desire romantic partners who uniquely fit their ideal description or type. Singletons often become so romantically interested in prospective matches that they convince themselves that their date does possess the traits they deem most desirable. A person's ideal partner does not reflect'any unique personal insight' of tastes, researchers say – and when we say what we like in a partner we're actually just describing qualities that everyone likes. The research could help shift online dating away from a model that focuses on stringently matching profiles and attributes.


Selfless parrots get by with some help from their friends

The Japan Times

WASHINGTON – Acting selflessly to help others in need was long thought to be a trait confined to mammals, in particular humans and great apes. But a new study has found that African gray parrots volunteer assistance to both their good friends and mere acquaintances -- even when there is no expectation of personal gain. The paper, published Thursday in the journal Current Biology, advances our knowledge of the evolution of cooperation and social intelligence, co-author Auguste von Bayern of the Max Planck Institute for Ornithology in Starnberg, Germany, said. Both parrots and birds such as crows and ravens are renowned for their extraordinary problem-solving skills, and are sometimes called "feathered apes." Alex, the famous Harvard-based African gray parrot that died in 2007, developed a vocabulary of over 100 words, could identify colors and quantify objects up to the number six, among many other accomplishments.


Forget AI ethics--treat technology like a new relationship instead

#artificialintelligence

Not a week passes without an ethical misstep by Big Tech. From Facebook's personal data overreaches to thousands of e-commerce sites that trick people into superfluous purchases to cities implementing facial-recognition systems without consent, the tech industry continues to stress-test trust. In response, ethical guidelines have flourished. Whether a short checklist, visual principles, or lengthy treatise, most agree on core principles of privacy, safety and security, transparency, fairness, and autonomy. But despite the efforts of think tanks, tech companies, and government agencies, the principles haven't been so easy to put into practice.