Leibniz Supercomputing Centre ’s Post

Too much training often results in sore muscles and pain. Obviously, computers also react badly if they have to train too much. At least that is the result of a working group in the USA: researchers from Carnegie Mellon University, Stanford University and Harvard University trained the #OpenSource language models #OLMO1B once with 2.3 and once with 3 million tokens and were surprised that the less trained models responded and worked better. The group explains this deterioration in performance with so-called ‘progressive sensitivity’: the longer the models are trained, the more sensitive their internal parameters react to changes, such as fine-tuning the model or adding more data. The results can be found here: https://lnkd.in/d9jvbRw5 HPCwire reports on this here: https://lnkd.in/duuJeuTS #ArtificialIntelligence, #LLM, #LargeLanguageModels, #LanguageModel Munich Datageeks e.V., Munich Data Science Institute (MDSI), Gauss Centre for Supercomputing

  • AI processing unit. Photo: Adobe

To view or add a comment, sign in

Explore topics