japanese language
Distinguishing ChatGPT(-3.5, -4)-generated and human-written papers through Japanese stylometric analysis
In the first half of 2023, text-generative artificial intelligence (AI), including ChatGPT, equipped with GPT-3.5 and GPT-4, from OpenAI, has attracted considerable attention worldwide. In this study, first, we compared Japanese stylometric features of texts generated by GPT (-3.5 and -4) and those written by humans. In this work, we performed multi-dimensional scaling (MDS) to confirm the distributions of 216 texts of three classes (72 academic papers written by 36 single authors, 72 texts generated by GPT-3.5, and 72 texts generated by GPT-4 on the basis of the titles of the aforementioned papers) focusing on the following stylometric features: (1) bigrams of parts-of-speech, (2) bigram of postpositional particle words, (3) positioning of commas, and (4) rate of function words. MDS revealed distinct distributions at each stylometric feature of GPT (-3.5 and -4) and human. Although GPT-4 is more powerful than GPT-3.5 because it has more parameters, both GPT (-3.5 and -4) distributions are likely to overlap. These results indicate that although the number of parameters may increase in the future, GPT-generated texts may not be close to that written by humans in terms of stylometric features. Second, we verified the classification performance of random forest (RF) for two classes (GPT and human) focusing on Japanese stylometric features. This study revealed the high performance of RF in each stylometric feature: The RF classifier focusing on the rate of function words achieved 98.1% accuracy. Furthermore the RF classifier focusing on all stylometric features reached 100% in terms of all performance indexes (accuracy, recall, precision, and F1 score). This study concluded that at this stage we human discriminate ChatGPT from human limited to Japanese language.
ZmBART: An Unsupervised Cross-lingual Transfer Framework for Language Generation
Maurya, Kaushal Kumar, Desarkar, Maunendra Sankar, Kano, Yoshinobu, Deepshikha, Kumari
Despite the recent advancement in NLP research, cross-lingual transfer for natural language generation is relatively understudied. In this work, we transfer supervision from high resource language (HRL) to multiple low-resource languages (LRLs) for natural language generation (NLG). We consider four NLG tasks (text summarization, question generation, news headline generation, and distractor generation) and three syntactically diverse languages, i.e., English, Hindi, and Japanese. We propose an unsupervised cross-lingual language generation framework (called ZmBART) that does not use any parallel or pseudo-parallel/back-translated data. In this framework, we further pre-train mBART sequence-to-sequence denoising auto-encoder model with an auxiliary task using monolingual data of three languages. The objective function of the auxiliary task is close to the target tasks which enriches the multi-lingual latent representation of mBART and provides good initialization for target tasks. Then, this model is fine-tuned with task-specific supervised English data and directly evaluated with low-resource languages in the Zero-shot setting. To overcome catastrophic forgetting and spurious correlation issues, we applied freezing model component and data argumentation approaches respectively. This simple modeling approach gave us promising results.We experimented with few-shot training (with 1000 supervised data points) which boosted the model performance further. We performed several ablations and cross-lingual transferability analyses to demonstrate the robustness of ZmBART.
South Sudan's Olympians in love with Japanese language -- as well as real track in Gunma
They are trying to get a head start, and unlike most of the 11,000 athletes who will be in Tokyo for the games, and thousands more for the Paralympics, they will be able to speak Japanese. "Just the language itself, I love it," said Abraham Majok, a runner who arrived in Japan in November with three other South Sudanese athletes and a coach. "And it's nice and since we started learning it. But, you know, we are moving well with it and we just love it." They are training northwest of Tokyo in Maebashi, Gunma Prefecture, supported mainly by donations from the public.
Japan's health care sector still a magnet for Filipinos
MANILA โ Job opportunities in Japan's health industry continue to attract Filipinos a decade since it started accepting candidate nurses and caregivers under a bilateral economic agreement. Earlier this month, a new group of Filipino health workers who aspire to work as nurses and caregivers here began preparatory training in the Japanese language and culture in two centers in Manila. The 341 applicants comprise the 12th batch of candidate nurses and caregivers under the Japan-Philippines Economic Partnership Agreement forged in 2008. Japan accepted the first batch of Filipino health workers in 2009. And I think I will broaden my experience and learn more there.
India opens first training center for Japanese-language teachers
NEW DELHI โ India's first training center for teachers of Japanese was officially opened Monday in the capital, New Delhi. The inauguration ceremony for the center, which is a joint project involving the Indian Ministry of External Affairs and the Japanese Embassy with the support of the Japan Foundation, was attended by Japanese Ambassador to India Kenji Hiramatsu. The ambassador noted in a speech that demand for learning Japanese is growing significantly as the Japan-India relationship flourishes, leading to an increasing number of employment opportunities in Japanese companies in the country. "The number of Japanese companies is increasing every year and is now about four times the number seen 10 years ago. These companies require Indians who can speak the Japanese language, in order to act as bridges between their Indian subsidiaries and headquarters in Japan," he said.
Japanese language launched on world-leading Speechmatics ASR platform
Speechmatics today released it's Japanese language pack to our cloud-based and on-premise Speech Recognition platform. Proving again the power and flexibility of the Auto-Auto language creation framework, releasing Japanese underlines our commitment to deliver broad language coverage to our rapidly expanding new and existing customer base. Misquoting Orwell: "All languages are equal, but some languages are more equal than others." Benoรฎt Brard from our Languages team here in Cambridge, "For Auto-Auto, some languages are indeed more challenging than others. And Japanese, at first sight, looked scary: our first language without spaces, with a logographic writing system (where the characters represent concept rather than sounds), and no native speakers to hand. It wasโฆ all Greek to us. Nonetheless, in a matter of days and a few nudges in places, Auto-Auto has proved to be generic and powerful enough to deliver our first Japanese model without further human intervention."
IBM Watson is now fluent in nine languages (and counting)
Memorably spoken by Alexander Graham Bell, these were the first words ever heard through a telephone. Since then, speech has become the natural format for long-distance communication across the globe. The impact of voice-to-voice communication has meant that even written messages, sent via email and social media, have become increasingly conversational in tone. That Watson was not IBM Watson, of course, or Watson's namesake Thomas J Watson. But IBM Watson, by bringing a cognitive, learning approach to the absorption of data, has made it possible for computer systems to understand spoken language, and the more natural, colloquial way we now express ourselves in text.