Too much training often results in sore muscles and pain. Obviously, computers also react badly if they have to train too much. At least that is the result of a working group in the USA: researchers from Carnegie Mellon University, Stanford University and Harvard University trained the #OpenSource language models #OLMO1B once with 2.3 and once with 3 million tokens and were surprised that the less trained models responded and worked better. The group explains this deterioration in performance with so-called ‘progressive sensitivity’: the longer the models are trained, the more sensitive their internal parameters react to changes, such as fine-tuning the model or adding more data. The results can be found here: https://lnkd.in/d9jvbRw5 HPCwire reports on this here: https://lnkd.in/duuJeuTS #ArtificialIntelligence, #LLM, #LargeLanguageModels, #LanguageModel Munich Datageeks e.V., Munich Data Science Institute (MDSI), Gauss Centre for Supercomputing