Research
I'm interested in *representation vectors* inside language models, also often colloquially known as embeddings. Where do they come from? What do they mean? Most of my research nowadays aims to understand, disentangle, and improve these vectors.
We improve embedding models by adding context from the surrounding documents to the embeddings of individual documents and queries. Our work shows that embedding models can be improved by reordering the training data (contextual batching) and using a new contextual architecture.
We show that system prompts can be extracted without access to token probabilities, by simply asking a fixed set of questions to the language model and learning to map the questions' answers back to the initial prompts. Our system is efficient and outperforms prior work (Language Model Inversion) with around 15 question-answer pairs.
Do language models plan ahead for future tokens?
COLM 2024
[arXiv]
Wilson Wu, John X. Morris, Lionel Levine
We question whether AI language models "plan ahead" by using their computation to pre-store information that is useful for predicting future tokens. We propose a training scheme called *myopic training* that does not propagate gradients from current tokens' loss to hidden states from previous time steps.
Language Model Inversion
ICLR 2024
[arXiv]
John X. Morris, Wenting Zhao, Justin T. Chiu, Vitaly Shmatikov, Alexander M. Rush
We show that language models can be inverted, meaning that we can learn to reconstruct the input given only the model's output probability distribution for a single next token. We recover unknown prompts given only the LM outputs for those prompts. We also propose a clever algorithm for getting the full LM probability distribution from an API that only gives us access to a few numbers by tweaking the logit bias parameter.
Text Embeddings Reveal (Almost) As Much as Text
John X. Morris, Volodymyr Kuleshov, Vitaly Shmatikov, Alexander M. Rush
We show that we can recover text *exactly* from text embeddings. We can do this by training a corrective model that iteratively edits text and re-embeds it to form guesses that are closer in space to the true embedding.
Tree Prompting: Efficient Task Adaptation without Fine-Tuning
EMNLP 2023
[arXiv]
John X. Morris*, Chandan Singh*, Alexander M. Rush, Jianfeng Gao, Yuntian Deng
We propose a method for learning a decision tree on top of language model outputs for multiple prompts. This gives a way to do "fine-tuning" and classify outputs without any backward passes.
iPrompt: Explaining Patterns in Data with Language Models via Interpretable Autoprompting
EMNLP 2023 BlackboxNLP
[arXiv]
Chandan Singh*, John X. Morris*, Jyoti Aneja, Alexander M. Rush, Jianfeng Gao
We developed a method that searches for the optimal prompt for a given dataset. It turns out that the optimal prompt is often semantically meaningful and can tell us meaningful things about the data.
Unsupervised Text Deidentification
EMNLP Findings 2022
[arXiv]
John X. Morris, Justin T. Chiu, Ramin Zabih, Alexander M. Rush
We propose a method for removing personal information from text based on the information we know about each person. We test our method by redacting biographies from Wikipedia.