Charlie Snell

I'm a second year CS PhD student in Berkeley EECS advised by Dan Klein and Sergey Levine. I am also a Student Researcher at Google DeepMind. Previously, I was a UC Berkeley undergrad, where I had the great opportunity to work with and learn from a number of fantastic AI researchers, such as Sergey Levine, Ruiqi Zhong, Dan Klein, Jacob Steinhardt, and Jason Eisner.

Email  /  Google Scholar  /  Twitter  /  Github

profile photo

See Google Scholar for more.
The False Promise of Imitating Proprietary LLMs
Arnav Gudibande*, Eric Wallace*, Charlie Snell*, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, Dawn Song
ICLR 2024

Recent systems – like Koala, Vicuna, and Alpaca – finetune a weaker language model to imitate the outputs of a stronger model, like ChatGPT or GPT-4. In this work, we critically analyze the shortcomings of this approach.

Learning by Distilling Context
Charlie Snell, Dan Klein, Ruiqi Zhong
arXiv 2022
[paper] [talk]

Language models significantly benefit from context tokens, such as prompts or scratchpads. We propose to apply context distillation so that a language model can improve itself by internalizing these gains.

Offline RL for Natural Language Generation with Implicit Language Q Learning
Charlie Snell, Ilya Kostrikov, Yi Su, Mengjiao Yang, Sergey Levine
ICLR 2023
[paper] [project page] [code] [talk]

We propose an effective and easy-to-use offline RL motivated method for steering language models towards successfully completing language tasks, such as goal directed dialogue, controled generation, and word games.

Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL
Ruiqi Zhong*, Charlie Snell*, Dan Klein, Jason Eisner
EMNLP 2023

We introduce APEL, a new framework that enables non-programmers to indirectly annotate natural language utterances with executable meaning representations, such as SQL programs.

Describing Differences between Text Distributions with Natural Language
Ruiqi Zhong, Charlie Snell, Dan Klein, Jacob Steinhardt
ICML 2022
[paper] [code]

How do two distributions of text differ? We propose a method for automatically summarizing the differences by "learning a natural language hypothesis".

Context-Aware Language Modeling for Goal-Oriented Dialogue Systems
Charlie Snell, Mengjiao Yang, Justin Fu, Yi Su, Sergey Levine
NAACL 2022, Findings
[paper] [project page] [code]

We extend techniques from learning-based control, such as task relabeling, to derive a simple and effective method to finetune language models in a goal-aware way.

Approximating How Single Head Attention Learns
Charlie Snell*, Ruiqi Zhong*, Dan Klein, Jacob Steinhardt
arXiv 2021
[paper] [slides] [code] [blog]

Why do models often attend to salient words, and how does this evolve throughout training?

The Omniglot Jr. challenge; Can a model achieve child-level character generation and classification?
Eliza Kosoy, Masha Belyi, Charlie Snell, Josh Tenenbaum, Brenden Lake, Alison Gopnik
NeurIPS Workshop on BabyMind 2020

We augment the original Omniglot dataset with a new dataset of children's handwritten characters. We then study the properties of a Bayesian Program Learning model trained on this new data.

Assorted Writing

I've had the pleasure of getting to write several articles for Machine Learning at Berkeley's technical blog.
Alien Dreams: An Emerging Art Scene
June 2021
[blog] [coverage] [discussion]

A tour through the wonderful AI art scene that emerged when CLIP was released in January 2021.

How is it so good ? (DALL-E Explained Pt. 2)
April 2021

A technical and philosophical discussion of how DALL-E works, why it is so effective at generating images from a text prompt, and its theoretical limitations.

Understanding VQ-VAE (DALL-E Explained Pt. 1)
February 2021

How do vector quantized variational autoencoders (VQ-VAEs) work? And what role do they play in modern generative models, such as DALL-E and Jukebox?

Side Projects / Open Source Implementations

Selected projects. See my github for much more.

(Press "y" to add a random circle, "n" to remove one, and "wasd" to pan.)
October 2022

Built on top of HuggingFace's Transformers library, JaxSeq enables training very large language models in Jax with model and data parallelism across both multi-device and multi-node clusters.

Re-implementation of the paper "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"
November 2021

Re-create the dramatic train/test curves from the original paper; experiment with the grokking phenomenon yourself.

Music Preference Visualization with Deep Embeddings
June-July 2020

Harness the power of deep music representations to generate playlists and visualize your music preferences in an interactive web app.

Train Deep Neural Networks on a 2013 Macbook Air GPU

A deep learning framework implemented from scratch in C++/OpenCL. Implements GPU kernels that can run on a 2013 Macbook Air GPU (and other Apple computers). Implements LSTM training/inference for music lyric generation.

Yeah JeCUB App
[app store]

A humorous sound-box app.

2D Procedural Endless World

Scroll through an infinite 2D block-world consisting of rugged terrain, endless caves, fluffy clouds, and extreme biomes all synthesized by PRNGs and Perlin Noise.

Website design from Jon Barron