PSA: Yandex is a multi-billion Moscow based company, finances the Russian war of aggression in Ukraine, and is one of the main Kremlin's tool in spreading propaganda and suppressing dissent.
YaLM 100B is a GPT-like neural network for generating and processing text. It can be used freely by developers and researchers from all over the world.
The model leverages 100 billion parameters. It took 65 days to train the model on a cluster of 800 A100 graphics cards and 1.7 TB of online texts, books, and countless other sources in both English and Russian.
Training details and best practices on acceleration and stabilizations can be found on Medium (English) and Habr (Russian) articles.
Make sure to have 200GB of free disk space before downloading weights. The model (code is based on microsoft/DeepSpeedExamples/Megatron-LM-v1.1.5-ZeRO3) is supposed to run on multiple GPUs with tensor parallelism. It was tested on 4 (A100 80g) and 8 (V100 32g) GPUs, but is able to work with different configurations with ≈200GB of GPU memory in total which divide weight dimensions correctly (e.g. 16, 64, 128).
Jump in the discussion.
No email address required.
Programmers aren't people.
Jump in the discussion.
No email address required.
Never were
Jump in the discussion.
No email address required.
We are the next step in human evolution
Jump in the discussion.
No email address required.
Syntax error: Did you mean devolution?
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context
->
Jump in the discussion.
No email address required.
More options
Context
More options
Context