Charlie Snell

I'm a third year CS PhD student in Berkeley EECS advised by Dan Klein and Sergey Levine. Previously, I was a UC Berkeley undergrad, where I had the great opportunity to work with and learn from a number of fantastic AI researchers, such as Sergey Levine, Ruiqi Zhong, Dan Klein, Jacob Steinhardt, and Jason Eisner. I was also previously a Student Researcher at Google DeepMind.

Email  /  Google Scholar  /  Twitter  /  Github

profile photo
Research

See Google Scholar for more.
Predicting Emergent Capabilities by Finetuning
Charlie Snell, Eric Wallace, Dan Klein, Sergey Levine
COLM 2024
[paper]

Can we predict emergent capabilities in GPT-N+1 🌌 using GPT-N, which has random performance on the task? To do this, we use information about how pre-emergence model checkpoints behave under the influence of task-specific finetuning to obtain predictive power about the point of emergence in the few-shot setting.

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar
arXiv 2024
[paper]

On difficult problems, humans tend to think longer to improve their decisions. Can we instill a similar capability into LLMs? And how well can it perform? We find that by optimally scaling test-time compute we can outperform much larger models in a FLOPs matched evaluation.

The False Promise of Imitating Proprietary LLMs
Arnav Gudibande*, Eric Wallace*, Charlie Snell*, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, Dawn Song
ICLR 2024
[paper]

Recent systems – like Koala, Vicuna, and Alpaca – finetune a weaker language model to imitate the outputs of a stronger model, like ChatGPT or GPT-4. In this work, we critically analyze the shortcomings of this approach.

Learning by Distilling Context
Charlie Snell, Dan Klein, Ruiqi Zhong
arXiv 2022
[paper] [talk]

Language models significantly benefit from context tokens, such as prompts or scratchpads. We propose to apply context distillation so that a language model can improve itself by internalizing these gains.

Offline RL for Natural Language Generation with Implicit Language Q Learning
Charlie Snell, Ilya Kostrikov, Yi Su, Mengjiao Yang, Sergey Levine
ICLR 2023
[paper] [project page] [code] [talk]

We propose an effective and easy-to-use offline RL motivated method for steering language models towards successfully completing language tasks, such as goal directed dialogue, controled generation, and word games.

Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL
Ruiqi Zhong*, Charlie Snell*, Dan Klein, Jason Eisner
EMNLP 2023
[paper]

We introduce APEL, a new framework that enables non-programmers to indirectly annotate natural language utterances with executable meaning representations, such as SQL programs.

Describing Differences between Text Distributions with Natural Language
Ruiqi Zhong, Charlie Snell, Dan Klein, Jacob Steinhardt
ICML 2022
[paper] [code]

How do two distributions of text differ? We propose a method for automatically summarizing the differences by "learning a natural language hypothesis".

Context-Aware Language Modeling for Goal-Oriented Dialogue Systems
Charlie Snell, Mengjiao Yang, Justin Fu, Yi Su, Sergey Levine
NAACL 2022, Findings
[paper] [project page] [code]

We extend techniques from learning-based control, such as task relabeling, to derive a simple and effective method to finetune language models in a goal-aware way.

Approximating How Single Head Attention Learns
Charlie Snell*, Ruiqi Zhong*, Dan Klein, Jacob Steinhardt
arXiv 2021
[paper] [slides] [code] [blog]

Why do models often attend to salient words, and how does this evolve throughout training?

The Omniglot Jr. challenge; Can a model achieve child-level character generation and classification?
Eliza Kosoy, Masha Belyi, Charlie Snell, Josh Tenenbaum, Brenden Lake, Alison Gopnik
NeurIPS Workshop on BabyMind 2020
[paper]

We augment the original Omniglot dataset with a new dataset of children's handwritten characters. We then study the properties of a Bayesian Program Learning model trained on this new data.

Assorted Writing

I've had the pleasure of getting to write several articles for Machine Learning at Berkeley's technical blog.
Alien Dreams: An Emerging Art Scene
June 2021
[blog] [coverage] [discussion]

A tour through the wonderful AI art scene that emerged when CLIP was released in January 2021.

How is it so good ? (DALL-E Explained Pt. 2)
April 2021
[blog]

A technical and philosophical discussion of how DALL-E works, why it is so effective at generating images from a text prompt, and its theoretical limitations.

Understanding VQ-VAE (DALL-E Explained Pt. 1)
February 2021
[blog]

How do vector quantized variational autoencoders (VQ-VAEs) work? And what role do they play in modern generative models, such as DALL-E and Jukebox?

Side Projects / Open Source Implementations

Selected projects. See my github for much more.

(Press "y" to add a random circle, "n" to remove one, and "wasd" to pan.)
JaxSeq
October 2022
[code]

Built on top of HuggingFace's Transformers library, JaxSeq enables training very large language models in Jax with model and data parallelism across both multi-device and multi-node clusters.

Re-implementation of the paper "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"
November 2021
[code]

Re-create the dramatic train/test curves from the original paper; experiment with the grokking phenomenon yourself.

Music Preference Visualization with Deep Embeddings
June-July 2020
[tweet]

Harness the power of deep music representations to generate playlists and visualize your music preferences in an interactive web app.

Train Deep Neural Networks on a 2013 Macbook Air GPU
2017/2018
[code]

A deep learning framework implemented from scratch in C++/OpenCL. Implements GPU kernels that can run on a 2013 Macbook Air GPU (and other Apple computers). Implements LSTM training/inference for music lyric generation.

Yeah JeCUB App
2017/2018
[app store]

A humorous sound-box app.

2D Procedural Endless World
2015
[code]

Scroll through an infinite 2D block-world consisting of rugged terrain, endless caves, fluffy clouds, and extreme biomes all synthesized by PRNGs and Perlin Noise.


Website design from Jon Barron