Adam, a 9-yr old optimizer, is the go-to for training LLMs (eg, GPT-3, OPT, LLAMA).
— Tengyu Ma (@tengyuma) May 24, 2023
Introducing Sophia, a new optimizer that is 2x faster than Adam on LLMs. Just a few more lines of code could cut your costs from $2M to $1M (if scaling laws hold).https://t.co/GrMY600lLO 🧵⬇️ pic.twitter.com/bPLCOWcIHZ
contributed by Andy on May 24, 2023