nlp technology
Resource-Aware Arabic LLM Creation: Model Adaptation, Integration, and Multi-Domain Testing
This paper presents a novel approach to fine-tuning the Qwen2-1.5B model for Arabic language processing using Quantized Low-Rank Adaptation (QLoRA) on a system with only 4GB VRAM. We detail the process of adapting this large language model to the Arabic domain, using diverse datasets including Bactrian, OpenAssistant, and Wikipedia Arabic corpora. Our methodology involves custom data preprocessing, model configuration, and training optimization techniques such as gradient accumulation and mixed-precision training. We address specific challenges in Arabic NLP, including morphological complexity, dialectal variations, and diacritical mark handling. Experimental results over 10,000 training steps show significant performance improvements, with the final loss converging to 0.1083. We provide comprehensive analysis of GPU memory usage, training dynamics, and model evaluation across various Arabic language tasks, including text classification, question answering, and dialect identification. The fine-tuned model demonstrates robustness to input perturbations and improved handling of Arabic-specific linguistic phenomena. This research contributes to multilingual AI by demonstrating a resource-efficient approach for creating specialized language models, potentially democratizing access to advanced NLP technologies for diverse linguistic communities. Our work paves the way for future research in low-resource language adaptation and efficient fine-tuning of large language models.
Ethical Concern Identification in NLP: A Corpus of ACL Anthology Ethics Statements
Karamolegkou, Antonia, Hansen, Sandrine Schiller, Christopoulou, Ariadni, Stamatiou, Filippos, Lauscher, Anne, Søgaard, Anders
What ethical concerns, if any, do LLM researchers have? We introduce EthiCon, a corpus of 1,580 ethical concern statements extracted from scientific papers published in the ACL Anthology. We extract ethical concern keywords from the statements and show promising results in automating the concern identification process. Through a survey, we compare the ethical concerns of the corpus to the concerns listed by the general public and professionals in the field. Finally, we compare our retrieved ethical concerns with existing taxonomies pointing to gaps and future research directions.
- North America > United States > New York > New York County > New York City (0.04)
- North America > Mexico (0.04)
- Asia > Singapore (0.04)
- (11 more...)
- Research Report (1.00)
- Questionnaire & Opinion Survey (1.00)
- Information Technology > Security & Privacy (1.00)
- Law (0.93)
Towards A Structured Overview of Use Cases for Natural Language Processing in the Legal Domain: A German Perspective
Vladika, Juraj, Meisenbacher, Stephen, Preis, Martina, Klymenko, Alexandra, Matthes, Florian
In recent years, the field of Legal Tech has risen in prevalence, as the Natural Language Processing (NLP) and legal disciplines have combined forces to digitalize legal processes. Amidst the steady flow of research solutions stemming from the NLP domain, the study of use cases has fallen behind, leading to a number of innovative technical methods without a place in practice. In this work, we aim to build a structured overview of Legal Tech use cases, grounded in NLP literature, but also supplemented by voices from legal practice in Germany. Based upon a Systematic Literature Review, we identify seven categories of NLP technologies for the legal domain, which are then studied in juxtaposition to 22 legal use cases. In the investigation of these use cases, we identify 15 ethical, legal, and social aspects (ELSA), shedding light on the potential concerns of digitally transforming the legal domain.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.05)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- (3 more...)
- Research Report (1.00)
- Questionnaire & Opinion Survey (0.94)
- Personal > Interview (0.94)
- Overview (0.89)
- Law (1.00)
- Information Technology > Security & Privacy (0.48)
Integrating Generative AI in Hackathons: Opportunities, Challenges, and Educational Implications
Sajja, Ramteja, Ramirez, Carlos Erazo, Li, Zhouyayan, Demiray, Bekir Z., Sermet, Yusuf, Demir, Ibrahim
Hackathons and software competitions, increasingly pivotal in the software industry, serve as vital catalysts for innovation and skill development for both organizations and students. These platforms enable companies to prototype ideas swiftly, while students gain enriched learning experiences, enhancing their practical skills. Over the years, hackathons have transitioned from mere competitive events to significant educational tools, fusing theoretical knowledge with real-world problem-solving. The integration of hackathons into computer science and software engineering curricula aims to align educational proficiencies within a collaborative context, promoting peer connectivity and enriched learning via industry-academia collaborations. However, the infusion of advanced technologies, notably artificial intelligence (AI), and machine learning, into hackathons is revolutionizing their structure and outcomes. This evolution brings forth both opportunities, like enhanced learning experiences, and challenges, such as ethical concerns. This study delves into the impact of generative AI, examining its influence on student's technological choices based on a case study on the University of Iowa 2023 event. The exploration provides insights into AI's role in hackathons, and its educational implications, and offers a roadmap for the integration of such technologies in future events, ensuring innovation is balanced with ethical and educational considerations.
- North America > United States > Iowa (0.25)
- Europe > Finland > South Karelia > Lappeenranta (0.04)
- Europe > Estonia > Tartu County > Tartu (0.04)
- Asia > Middle East > Jordan (0.04)
- Education > Educational Setting > Higher Education (0.68)
- Education > Curriculum > Subject-Specific Education (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.75)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.70)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.68)
Evaluating the Diversity, Equity and Inclusion of NLP Technology: A Case Study for Indian Languages
Khanuja, Simran, Ruder, Sebastian, Talukdar, Partha
In order for NLP technology to be widely applicable, fair, and useful, it needs to serve a diverse set of speakers across the world's languages, be equitable, i.e., not unduly biased towards any particular language, and be inclusive of all users, particularly in low-resource settings where compute constraints are common. In this paper, we propose an evaluation paradigm that assesses NLP technologies across all three dimensions. While diversity and inclusion have received attention in recent literature, equity is currently unexplored. We propose to address this gap using the Gini coefficient, a well-established metric used for estimating societal wealth inequality. Using our paradigm, we highlight the distressed state of current technologies for Indian (IN) languages (a linguistically large and diverse set, with a varied speaker population), across all three dimensions. To improve upon these metrics, we demonstrate the importance of region-specific choices in model building and dataset creation, and more importantly, propose a novel, generalisable approach to optimal resource allocation during fine-tuning. Finally, we discuss steps to mitigate these biases and encourage the community to employ multi-faceted evaluation when building linguistically diverse and equitable technologies.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > Indonesia > Bali (0.04)
- (3 more...)
- Government > Regional Government (0.68)
- Health & Medicine (0.67)
Smart Systems, Inc.
AI adoption is rapidly moving from an experiment to an essential part of business practices and planning. Across every industry, more use cases are being developed for AI to drive business efficiencies, optimize data, improve the customer experience and support business goals and initiatives. Companies looking to mature their AI programs should keep an eye out for these important conversations around the adoption of AI tools and technology. Machine learning is a valuable tool that has benefited businesses for decades but only became widely popular recently. In this new era of digital acceleration, companies are looking for ways to drive efficiency in their companies--and for many, the answer lies in automation.
Council Post: Four AI Trends To Watch
Martin Birch, CEO and president of ibml, has 20 years of experience as a global leader in the intelligent information management industry. AI adoption is rapidly moving from an experiment to an essential part of business practices and planning. Across every industry, more use cases are being developed for AI to drive business efficiencies, optimize data, improve the customer experience and support business goals and initiatives. Companies looking to mature their AI programs should keep an eye out for these important conversations around the adoption of AI tools and technology. Machine learning is a valuable tool that has benefited businesses for decades but only became widely popular recently.
AI's Potential to Tackle Crime in Europe
In the years to come, artificial intelligence will be a key feature of cross border criminal investigations, a joint report by Eurojust and eu-LISA, the union's official IT agency found. AI technologies can increase cooperation between EU member states in tackling crime, however, authorities must be careful since machine learning algorithms are prone to biases. AI was listed as a priority in the EU's e-Justice Action plan for 2019-2023. In a world where crime is borderless and criminals employ sophisticated communication tools and technologies, including encryption and AI; tackling crime requires cross-border cooperation by EU Member States and the application of technologies on par with those used by the criminal groups, urged Friday's report. "The field of justice is undergoing digital transformation, and artificial intelligence, as a set of different technologies, has great potential to contribute to and further enhance this process, allowing for a significant improvement in both the efficiency and effectiveness of operation of the judicial authorities," the report said.
- Law (0.61)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.55)
Understanding NLP: Will It Reign Supreme In The Next Decade? - Fingent Technology
"The human language, as precise as it is with its thousands of words, can still be so wonderfully vague." Human language is unique and complex. Encoding human language is considered a difficult task. For starters, you can arrange words in infinite ways to form a sentence. Also, each word can have several meanings.
A Wave Of Billion-Dollar Language AI Startups Is Coming
In 1998, Larry Page and Sergey Brin founded the greatest language AI startup of all time. But a new ... [ ] generation of challengers is coming. Language is at the heart of human intelligence. It therefore is and must be at the heart of our efforts to build artificial intelligence. No sophisticated AI can exist without mastery of language. The field of language AI--also referred to as natural language processing, or NLP--has undergone breathtaking, unprecedented advances over the past few years. Two related technology breakthroughs have driven this remarkable recent progress: self-supervised learning and a powerful new deep learning architecture known as the transformer. We now stand at an exhilarating inflection point. Next-generation language AI is poised to make the leap from academic research to widespread real-world adoption, generating many billions of dollars of value and transforming entire industries in the years ahead. A nascent ecosystem of startups is at the vanguard of this technology revolution. These companies have begun to apply cutting-edge NLP across sectors with a wide range of different product visions and business models. Given language's foundational importance throughout society and the economy, few areas of technology will have a more far-reaching impact in the years ahead. The first category of language AI startups worth discussing is those players that develop and make available core general-purpose NLP technology for other organizations to apply across industries and use cases. Building a state-of-the-art NLP model today is incredibly resource-intensive and technically challenging.
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > Tennessee (0.04)
- Europe > United Kingdom (0.04)
- (3 more...)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance (1.00)
- Information Technology > Services (0.68)
- (6 more...)