Goto

Collaborating Authors

 colloquialism



E-Bench: Towards Evaluating the Ease-of-Use of Large Language Models

Zhang, Zhenyu, Hao, Bingguang, Li, Jinpeng, Zhang, Zekai, Zhao, Dongyan

arXiv.org Artificial Intelligence

Most large language models (LLMs) are sensitive to prompts, and another synonymous expression or a typo may lead to unexpected results for the model. Composing an optimal prompt for a specific demand lacks theoretical support and relies entirely on human experimentation, which poses a considerable obstacle to popularizing generative artificial intelligence. However, there is no systematic analysis of the stability of LLMs in resisting prompt perturbations in real-world scenarios. In this work, we propose to evaluate the ease-of-use of LLMs and construct E-Bench, simulating the actual situation of human use from synonymous perturbation (including paraphrasing, simplification, and colloquialism) and typographical perturbation (such as typing). On this basis, we also discuss the combination of these two types of perturbation and analyze the main reasons for performance degradation. Experimental results indicate that with the increase of model size, although the ease-of-use are significantly improved, there is still a long way to go to build a sufficiently user-friendly model.


Can Technology Replace Human Interpreters?

#artificialintelligence

Over the few past years, the demand for real-time interpretation services has increased considerably. The globalisation of business can be considered a huge contributing factor for this phenomenon, as it has increased the opportunities for international trade and opened new markets for businesses all around the world. In order to be competitive and keep up with this increase in demand for interpreting services, developers have been working on technological solutions to meet the requirements for high-quality simultaneous interpretations, but can tech really replace humans when it comes to interpreting? Real-time translation systems include applications that can be installed on smartphones, computers, or other gadgets linked to the Internet. The words of the speaker are transcribed by a computer server, which analyses the content and selects the closest translation from a vast collection of phrase pairs in its database.