Teaching AI to Hesitate: Cultural Understanding and the Persian Art of Taarof
- Karine Megerdoomian

- Nov 4
- 4 min read
Exploring how AI learns the social logic behind Persian politeness
In September 2025, we released a paper that’s very close to my heart: We Politely Insist: Your LLM Must Learn the Persian Art of Taarof. The response has been overwhelming, and deeply encouraging. On LinkedIn, the post sparked an outpouring of excitement from AI researchers, linguists, and everyday readers who appreciated this exploration of culture and AI. Even the CEO of the Atlantic featured the study in his "The most interesting thing in tech" series.
The response says a lot about where the field is headed. People are hungry for culturally-aware AI, systems capable of modeling not just linguistic form, but the social, pragmatic, and contextual dimensions of meaning that govern human communication.
The paper introduces TaarofBench, the first benchmark designed to test whether large language models (LLMs) can recognize and reproduce the uniquely Persian ritual of politeness known as taarof.
We often talk about “AI alignment” in terms of truth or safety, but what about cultural alignment? This project explores exactly that: how an AI system can (or can’t) interpret social logic that depends not on words, but on shared understanding, hierarchy, and human hesitation.
What is Taarof?
If you’ve ever been to Iran, you’ve experienced taarof, even if you didn’t know its name. It’s a graceful but sometimes dizzying ritual of generosity: I insist you take the last piece of cake; you politely refuse; I insist again; you finally accept.
On the surface, it looks like overpoliteness. But it’s actually a social language of respect — one that manages status, humility, and sincerity. The right move depends entirely on context. The wrong one can signal arrogance or distance. For humans, taarof feels instinctive. For AI, it’s chaos.
Why we built TaarofBench
The idea for this project began in conversation with my co-author, Ali Emami, at an NLP conference (EMNLP 2025 in Miami to be exact). As two Iranian computational linguists, we immediately slipped into light taarof — this mutual dance of “please, after you” — and realized that same moment of hesitation is something large language models never experience.
Our team consisted of Nikta Gohari Sadr (lead author), Sahar Heidariasl, Laleh Seyyed-Kalantari, Ali, and myself, and we decided to test whether AI can play the game of taarof.
We created a dataset of 450 scenarios drawn from daily Persian life: dining, invitations, shopping, gift-giving, paying bills, and compliment exchanges. Each one encodes not just text, but social roles (host, guest, customer, superior) and interaction stages (initiation, recognition, reciprocation).
To keep the data faithful, every example was reviewed by native speakers and validated across regions and generations. It’s a small dataset — but a dense one, full of cultural logic.

What we found
When we tested major LLMs (GPT-4o, Claude 3.5, Gemini 1.5, and several open models), the results were humbling. Even the best systems performed at 40% accuracy on taarof-expected scenarios, far below native speakers (who themselves averaged 82%). In contrast, the models excelled in situations where taarof wasn’t expected, scoring over 90%. In other words: AI is polite, but not appropriately polite.
One of the most striking findings was that switching from English to Persian improved performance dramatically, sometimes by 30 percentage points. The language cue mattered more than geography (“in Iran”), suggesting that models partially learn culture through language form itself.
But we also saw where bias enters: when gendered roles appeared, models started echoing stereotypes (men insisting, women deferring). That’s not cultural intelligence; it’s learned bias masquerading as etiquette.


Teaching AI to learn politeness
To see if taarof could be taught, Nikta and team fine-tuned open models using supervised learning and Direct Preference Optimization (DPO).The improvements were remarkable: models reached nearly 80% accuracy — approaching human-like intuition about when to insist, when to refuse, and when to stop.
These results suggest that while in-context learning helps activate partial cultural knowledge, fine-tuning, especially DPO, remains essential for capturing the nuanced, context-dependent practices of taarof.
Why it matters
Language is never just about meaning, it’s also about relationships. If AI gets the grammar right but the intent wrong, we risk what linguists call pragmatic failure: producing language that is formally correct but socially and contextually misaligned.
From business negotiation to diplomacy, healthcare, or customer service, misunderstanding politeness systems can lead to friction and mistrust. By grounding AI in cultural pragmatics, we can move from politeness to situated respect — a system that understands that sometimes saying “no” really means “yes.”
Next steps
Our hope is that TaarofBench becomes a blueprint for studying cultural reasoning in AI. Researchers from Japan, Korea, and Turkey have already reached out about parallel systems like keigo, nunchi, and israr.
This is just the beginning of a broader effort at our research center, Zoorna Institute, to build datasets and evaluation tools that center not only on low-resource languages, but on low-represented cultures.
Final thoughts
Taarof has always fascinated me because it sits between sincerity and ceremony. It is a microcosm of what makes human communication so beautifully complicated and sometimes, very frustrating. If AI is ever to interact gracefully in our multilingual world, it will have to learn not just to translate, but to hesitate.

Comments