PSA: Yandex is a multi-billion Moscow based company, finances the Russian war of aggression in Ukraine, and is one of the main Kremlin's tool in spreading propaganda and suppressing dissent.
YaLM 100B is a GPT-like neural network for generating and processing text. It can be used freely by developers and researchers from all over the world.
The model leverages 100 billion parameters. It took 65 days to train the model on a cluster of 800 A100 graphics cards and 1.7 TB of online texts, books, and countless other sources in both English and Russian.
Training details and best practices on acceleration and stabilizations can be found on Medium (English) and Habr (Russian) articles.
Make sure to have 200GB of free disk space before downloading weights. The model (code is based on microsoft/DeepSpeedExamples/Megatron-LM-v1.1.5-ZeRO3) is supposed to run on multiple GPUs with tensor parallelism. It was tested on 4 (A100 80g) and 8 (V100 32g) GPUs, but is able to work with different configurations with ≈200GB of GPU memory in total which divide weight dimensions correctly (e.g. 16, 64, 128).