David Bamman

Carnegie Mellon University
School of Computer Science
Language Technologies Institute

5719 Gates-Hillman Complex
Pittsburgh, PA 15213

twitter: @dbamman
email: dbamman at cs.cmu.edu

I'm a PhD student at the Language Technologies Institute in the School of Computer Science at CMU, part of the ARK research group and advised by Noah Smith. My research interests lie in the areas of natural language processing and machine learning, especially as applied to empirical questions in the humanities and social sciences.

I'm especially interested in questions of linguistic variation (both over long periods of time [JCDL2011] and short [Lexicalist]), literary influence [LaTeCH2008], and censorship [First Monday 2012] -- in general, the tension between linguistic creativity and the forces outside of our own free will that shape our use of language. Prior to CMU, I was a senior researcher in computational linguistics at the Perseus Project at Tufts University, where I led the development of the Ancient Greek and Latin Dependency Treebanks and the Dynamic Lexicon.

Teaching

Fall 2013: Digital Literary and Cultural Studies (76-429/829)

Publications

Bamman, David, Brendan O'Connor and Noah Smith, "Learning Latent Personas of Film Characters," in: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, August 2013 [pdf] [data]
Bamman, David, Adam Anderson, and Noah Smith, "Inferring Social Rank in an Old Assyrian Trade Network," Digital Humanities (2013) [ArXiv]
Schneider, Nathan, Brendan O'Connor, Naomi Saphra, David Bamman, Manaal Faruqui, Jason Baldridge, Noah A. Smith, and Chris Dyer, "A Framework for (Under)specifying Dependency Syntax without Overloading Annotators," In Proceedings of the ACL Linguistic Annotation Workshop (LAW 2013), Sofia, Bulgaria, August 2013. [Extended version]
Bamman, David, Jacob Eisenstein and Tyler Schnoebelen, "Gender in Twitter: Styles, Stances, and Social Networks," in review (2012). [ArXiv] [data]
- Press: [Boston Globe]
Bamman, David, Brendan O'Connor and Noah A. Smith, "Censorship and Deletion Practices in Chinese Social Media," First Monday 17.3 (March 2012). [html] [bib]
- Press: [BBC] [New Scientist]
O'Connor, Brendan, David Bamman and Noah A. Smith, "Computational Text Analysis for Social Science: Model Assumptions and Complexity," NIPS Workshop on Computational Social Science and the Wisdom of Crowds (2011). [pdf] [bib]
Bamman, David, and Gregory Crane, "Measuring Historical Word Sense Variation," in: Proceedings of the 11th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2011). Runner up, Best Paper Award. [pdf] [bib]
Bamman, David, and Gregory Crane, "The Ancient Greek and Latin Dependency Treebanks," in: Caroline Sporleder, Antal van den Bosch and Kalliopi Zervanou (eds.), Language Technology for Cultural Heritage (Springer, 2011). [pdf] [bib]
Bamman, David, "Mapping the Demographics of American English with Twitter," Language Log, May 18, 2010. [html]
Bamman, David, Alison Babeu, and Gregory Crane, "Transferring Structural Markup Across Translations Using Multilingual Alignment and Projection," in: Proceedings of the 10th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2010). Winner, Best Paper Award. [pdf] [bib]
Bamman, David, Francesco Mambrini and Gregory Crane, "An Ownership Model of Annotation: The Ancient Greek Dependency Treebank," in: Proceedings of the Eighth International Workshop on Treebanks and Linguistic Theories (TLT8) (Milan, Italy: 2009). [pdf] [bib]
Bamman, David, and Gregory Crane, "Computational Linguistics and Classical Lexicography," Digital Humanities Quarterly 3.1 (2009). [html] [bib]
Bamman, David, Marco Passarotti and Gregory Crane, "A Case Study in Treebank Collaboration and Comparison: Accusativus cum Infinitivo and Subordination in Latin," Prague Bulletin of Mathematical Linguistics 90 (2008). [pdf] [bib]
Bamman, David and Gregory Crane, "The Logic and Discovery of Textual Allusion," in: Proceedings of the 2008 LREC Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2008). [pdf] [bib]
Bamman, David and Gregory Crane, "Building a Dynamic Lexicon from a Digital Library," in: Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2008). [pdf] [bib]

Datasets

CMU Book Summary Dataset. 16,559 book plot summaries + metadata.
CMU Movie Summary Dataset. 42,306 movie plot summaries + metadata
Twitter14K Dataset. Aggregated word counts from 14,464 Twitter users (9.2M tweets)

Notes to Myself That Might Be Useful to Others

Notes on Multinomial sparsity and the Dirichlet concentration parameter α