AITopics

Li, Miles Q., Fung, Benjamin C. M., Huang, Shih-Chia

Experience of Training a 1.7B-Parameter LLaMa Model From Scratch

arXiv.org Artificial IntelligenceDec-20-2024

Pretraining large language models is a complex endeavor influenced by multiple factors, including model architecture, data quality, training continuity, and hardware constraints. In this paper, we share insights gained from the experience of training DMaS-LLaMa-Lite, a fully open source, 1.7-billion-parameter, LLaMa-based model, on approximately 20 billion tokens of carefully curated data. We chronicle the full training trajectory, documenting how evolving validation loss levels and downstream benchmarks reflect transitions from incoherent text to fluent, contextually grounded output. Beyond pretraining, we extend our analysis to include a post-training phase focused on instruction tuning, where the model was refined to produce more contextually appropriate, user-aligned responses. We highlight practical considerations such as the importance of restoring optimizer states when resuming from checkpoints, and the impact of hardware changes on training stability and throughput. While qualitative evaluation provides an intuitive understanding of model improvements, our analysis extends to various performance benchmarks, demonstrating how high-quality data and thoughtful scaling enable competitive results with significantly fewer training tokens. By detailing these experiences and offering training logs, checkpoints, and sample outputs, we aim to guide future researchers and practitioners in refining their pretraining strategies. The training script is available on Github at https://github.com/McGill-DMaS/DMaS-LLaMa-Lite-Training-Code. The model checkpoints are available on Huggingface at https://huggingface.co/collections/McGill-DMaS/dmas-llama-lite-6761d97ba903f82341954ceb.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2412.13335

Country:

North America > Canada > Quebec > Montreal (0.28)
Africa > Middle East > Egypt > Giza Governorate > Giza (0.06)
Europe > United Kingdom > England (0.05)
(7 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)

arXiv.org Artificial IntelligenceOct-22-2024

Exploring Forgetting in Large Language Model Pre-Training

Liao, Chonghua, Xie, Ruobing, Sun, Xingwu, Sun, Haowen, Kang, Zhanhui

Catastrophic forgetting remains a formidable obstacle to building an omniscient model in large language models (LLMs). Despite the pioneering research on task-level forgetting in LLM fine-tuning, there is scant focus on forgetting during pre-training. We systematically explored the existence and measurement of forgetting in pre-training, questioning traditional metrics such as perplexity (PPL) and introducing new metrics to better detect entity memory retention. Based on our revised assessment of forgetting metrics, we explored low-cost, straightforward methods to mitigate forgetting during the pre-training phase. Further, we carefully analyzed the learning curves, offering insights into the dynamics of forgetting. Extensive evaluations and analyses on forgetting of pre-training could facilitate future research on LLMs.

large language model, machine learning, natural language, (16 more...)

2410.17018

Country:

Europe > France (0.04)
North America > United States > Florida > Broward County > Plantation (0.04)
North America > Jamaica (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceFeb-29-2024

Pointing out the Shortcomings of Relation Extraction Models with Semantically Motivated Adversarials

Nolano, Gennaro, Blum, Moritz, Ell, Basil, Cimiano, Philipp

In recent years, large language models have achieved state-of-the-art performance across various NLP tasks. However, investigations have shown that these models tend to rely on shortcut features, leading to inaccurate predictions and causing the models to be unreliable at generalization to out-of-distribution (OOD) samples. For instance, in the context of relation extraction (RE), we would expect a model to identify the same relation independently of the entities involved in it. For example, consider the sentence "Leonardo da Vinci painted the Mona Lisa" expressing the created(Leonardo da Vinci, Mona Lisa) relation. If we substiute "Leonardo da Vinci" with "Barack Obama", then the sentence still expresses the created relation. A robust model is supposed to detect the same relation in both cases. In this work, we describe several semantically-motivated strategies to generate adversarial examples by replacing entity mentions and investigate how state-of-the-art RE models perform under pressure. Our analyses show that the performance of these models significantly deteriorates on the modified datasets (avg. of -48.5% in F 1), which indicates that these models rely to a great extent on shortcuts, such as surface forms (or patterns therein) of entities, without making full use of the information present in the sentences.

proceedings, relation, substitution, (14 more...)

2402.19076

Country:

Europe > Germany > Berlin (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry: Government > Regional Government > North America Government > United States Government (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)

#artificialintelligenceJan-4-2023, 20:30:29 GMT

Want to Look Into the Future? This Awesome AI Innovations Are Going to Replace You! – eTatos.com

Unlike its direct competitors DALL-E or Midjourney, Stable Diffusion makes its source code available and can be run on local hardware. Furthermore, Stable Diffusion claims no rights on the generated output images. The user owns the unique creations and is supposedly free to use them commercially.

artificial intelligence, chatbot, natural language, (16 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.37)

#artificialintelligenceAug-29-2022, 14:05:08 GMT

How To Create Perfect Images For SEO With Dall-E 2

Adding unique, quality images can be a great help for SEO. Often, when you're writing an article, it's hard to find the right image to illustrate it – especially if you're looking for a royalty-free image. This is where quality images can make all the difference, as a captivating image can help grab the attention of internet users and improve your article's search rankings. Optimizing your images is a good SEO practice. It notably helps to strengthen your semantic power via keywords and ensures your presence in Google images.

create perfect image, dall-e 2, professional food photography, (12 more...)

Industry: Media > Photography (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.96)

#artificialintelligenceAug-29-2021, 02:10:59 GMT

Artist Uses Artificial Intelligence To Reconstruct Realistic Portraits of Historical Figures

Have you ever wondered what famous historical figures like Nefertiti and Cleopatra looked like in real life? Well, Bas Uterwijk might be able to show you a pretty good guess. The Dutch photographer and digital artist creates amazing AI portraits of famous historical figures using innovative neural network reconstructions. To create these portraits, Uterwijk uploads numerous references of the person's likeness to the AI applications. Then, he makes small adjustments to the program until he is satisfied with the result.

artist use artificial intelligence, portrait, reconstruct realistic portrait, (9 more...)

Country:

Europe (0.06)
Africa > Middle East > Egypt (0.06)

Technology:

Information Technology > Communications > Social Media (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)

#artificialintelligenceJul-13-2021, 01:06:53 GMT

Reasoning with Language Models and Knowledge Graphs for Question Answering

From search engines to personal assistants, we use question-answering systems every day. When we ask a question ("Where was the painter of the Mona Lisa born?"), the system needs to gather background knowledge ("The Mona Lisa was painted by Leonardo da Vinci", "Leonardo da Vinci was born in Italy") and reason over it to produce the answer ("Italy"). Knowledge sources In recent AI research, such background knowledge is commonly available in the forms of knowledge graphs (KGs) and language models (LMs) pre-trained on a large set of documents. In KGs, entities are represented as nodes and relations between them as edges, e.g. Examples of KGs include Freebase (general-purpose facts)1, ConceptNet (commonsense)2, and UMLS (biomedical facts)3.

knowledge, qa context, representation, (12 more...)

Country:

Europe > Italy (0.46)
North America > United States > California > Santa Clara County > Palo Alto (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.98)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.61)

Frank, Steven J., Frank, Andrea M.

A Neural Network Looks at Leonardo's(?) Salvator Mundi

arXiv.org Artificial IntelligenceMay-21-2020

We use convolutional neural networks (CNNs) to analyze authorship questions surrounding the works of Leonardo da Vinci -- in particular, Salvator Mundi, the world's most expensive painting and among the most controversial. Trained on the works of an artist under study and visually comparable works of other artists, our system can identify likely forgeries and shed light on attribution controversies. Leonardo's few extant paintings test the limits of our system and require corroborative techniques of testing and analysis.

artificial intelligence, leonardo, machine learning, (18 more...)

2005.106

Country:

North America > United States > New York (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.31)

Daily Mail - Science & techFeb-5-2020, 04:22:04 GMT

AI helps discover 'hidden' drawings by Leonardo da Vinci by mapping faint zinc traces on old canvas

Researchers have enlisted an AI to help them uncover hidden drawings on the canvas of one of Leonardo Da Vinci most famous paintings. The project was a collaboration between the National Gallery's Dr. Catherine Higgitt and a team from the Imperial College of London, led by Pier Luigi Dragotti. Higgitt and her team at the Gallery had discovered faint sketch marks on the canvas of da Vinci's'Virgin on the Rocks,' which he had originally been commissioned to create in 1483 for a chapel in Milan. Researchers in London discovered a hidden drawing on the canvas of one of Leonard da Vinci's most famous paintings, 'Virgin on the Rocks' (pictured above) The sketchings appeared to hint at an early version of the image that differed from the finished version, which depicts the Madonna with an infant Jesus and an infant John the Baptist in a cavern. The sketches showed wings, which suggested that da Vinci might have originally planned for an angel to be in the painting, as well as a different position for the Madonna.

da vinci, leonardo da vinci, sketch, (11 more...)

Daily Mail - Science & tech

Technology: Information Technology > Artificial Intelligence (1.00)