Words that Work: Using Large Language Models to Generate and Refine Hypotheses from Text

Rafael M. Batista & James Ross · Revise & Resubmit, Journal of Consumer Research

Abstract

In this paper, we introduce a data-driven framework for generating and refining hypotheses from text. Our three-step approach—Hypothesize, Intervene, and Predict—integrates large-language models (LLMs), machine learning (ML), and experimentation to discover testable insights about how language drives consumer behavior. Using a dataset with over 60,000 headlines and 32,000 A/B tests, we first Hypothesize linguistic features by prompting an LLM to identify differences between headline pairs. We then use an LLM to Intervene—systematically rewriting headlines to incorporate these features—and use an ML model, trained on historical outcomes, to Predict the causal impact of these changes on engagement. The framework generates a prioritized list of hypotheses, which we validate on a hold-out set of 1,693 A/B tests. Our approach indeed facilitates discovery. For instance, we find that describing physical reactions significantly increases engagement, while focusing on positive aspects of human behavior decreases it. This approach extends beyond headlines, offering a method for converting unstructured text data into insights that are interpretable, novel, testable, and generalizable. It does so while maintaining a transparent role for both human researchers and algorithmic processes, providing a practical tool for researchers, organizations, and policymakers seeking to aggregate insights from several messaging experiments.

Data Sampling & Tools

Generated hypotheses sample Interactive
Generated counterfactuals sample Interactive
Hypothesis generator Custom GPT

Poster

NSF Human-AI Complementarity for Decision Making 2025 Poster