Ethnographic Data Science and eHRAF: inferring cultural patterns from ethnographic writing

This material is based upon work supported by the National Science Foundation under Grant Number 2024286. HDNS-I: Infrastructure for Knowledge Linkages from Ethnography of World Societies. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Michael Fischer and Ben Kluga

AAA Conference, New Orleans, November 21st, 2025

Abstract

This poster session explores the use of Large Language Models (LLMs), or Generative Artificial Intelligence, as tools for dynamically evaluating ethnographic writing. Given that LLMs are frequently trained on extensive datasets that include a significant number of ethnographic texts, these offer a unique tool through which to examine and understand cultural representations. Rather than possessing inherent understanding, LLMs function as mediators, connecting myriad authors, voices and perspectives embedded within their training data to the reader. It is essential to recognize the "plural" in authors, as LLMs do not generate responses from a singular perspective but extrapolate from ranked, multiple sources. The generated text is an amalgamation of the diverse cultures and voices present in the training data.

LLMs operate primarily through pattern recognition, identifying relationships within the vast amounts of text processed. Their output is based on statistical correlations and contextual associations, producing text that echoes human language patterns. The meaning attributed to the LLM's generated text is ultimately derived from the reader's interpretation within their own cultural and cognitive framework. This research proposes utilizing LLMs to evaluate new ethnographic texts, examining the models' ability to capture and reproduce culturally specific language patterns. By doing so, we can better understand the limitations and capabilities of LLMs in representing cultures, and potentially gain fresh perspectives on the practice of ethnographic writing itself. This approach opens up new avenues for using LLMs as tools for anthropological research and cultural analysis.

The capacity of Large Language Models (LLMs) to generate text is fundamentally tied to the inherent structure and usage patterns of human language, as captured within their diverse training datasets. These datasets, acting as comprehensive models of language use, enable LLMs to produce responses that readers perceive as meaningful. This observation carries profound implications. For instance, it suggests that the origins of language may be rooted more deeply in social interactions, cultural transmission, and the accumulation of knowledge within communities, rather than in biological adaptations.

Furthermore, this perspective has significant ramifications for our understanding of language more broadly, and for ethnographic writing specifically. If an LLM can construct a viable working model of language use simply by processing examples of language in practice, and then effectively mediate that model in communication with a person, it indicates that language is not merely a passive encoding of internal cognitive functions. Instead, it suggests that language is a dynamic, adaptive system that conserves and transmits knowledge. Language, in this view, serves as a medium for shared external cognition, playing a crucial role in the very process of cultural transmission.

For a deeper understanding of how language is represented by computers, read our tutorial on word embeddings.

To conduct semantic inquiry of your own, look at our example page for HRAF API 1.

Read the presentation and get access to the data we used for analysis.

Download the poster (16mb)

Human Relations Area Files

Cultural information for education and research

Ethnography as Language Model: Speaking Beyond the Text

This material is based upon work supported by the National Science Foundation under Grant Number 2024286. HDNS-I: Infrastructure for Knowledge Linkages from Ethnography of World Societies

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.