David BammanCarnegie Mellon University
School of Computer Science
Language Technologies Institute
5719 Gates-Hillman Complex
Pittsburgh, PA 15213
email: dbamman at cs.cmu.edu
I'm a PhD student at the Language Technologies Institute in the School of Computer Science at CMU, part of the ARK research group and advised by Noah Smith. My research interests lie in the areas of natural language processing and machine learning, especially as applied to empirical questions in the humanities and social sciences.
I'm especially interested in questions of linguistic variation (both over long periods of time [JCDL2011] and short [Lexicalist]), literary influence [LaTeCH2008], and censorship [First Monday 2012] -- in general, the tension between linguistic creativity and the forces outside of our own free will that shape our use of language. Prior to CMU, I was a senior researcher in computational linguistics at the Perseus Project at Tufts University, where I led the development of the Ancient Greek and Latin Dependency Treebanks and the Dynamic Lexicon.
- Fall 2013: Digital Literary and Cultural Studies (76-429/829)
Bamman, David, Brendan O'Connor and Noah Smith, "Learning Latent Personas of Film Characters," in: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, August 2013 [pdf] [data]
- Bamman, David, Adam Anderson, and Noah Smith, "Inferring Social Rank in an Old Assyrian Trade Network," Digital Humanities (2013) [ArXiv]
- Schneider, Nathan, Brendan O'Connor, Naomi Saphra, David Bamman, Manaal Faruqui, Jason Baldridge, Noah A. Smith, and Chris Dyer, "A Framework for (Under)specifying Dependency Syntax without Overloading Annotators," In Proceedings of the ACL Linguistic Annotation Workshop (LAW 2013), Sofia, Bulgaria, August 2013. [Extended version]
- Press: [Boston Globe]
O'Connor, Brendan, David Bamman and Noah A. Smith, "Computational Text Analysis for Social Science: Model Assumptions and Complexity," NIPS Workshop on Computational Social Science and the Wisdom of Crowds (2011). [pdf] [bib]
Bamman, David, and Gregory Crane, "Measuring Historical Word Sense Variation," in: Proceedings of the 11th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2011). Runner up, Best Paper Award. [pdf] [bib]
Bamman, David, and Gregory Crane, "The Ancient Greek and Latin Dependency Treebanks," in: Caroline Sporleder, Antal van den Bosch and Kalliopi Zervanou (eds.), Language Technology for Cultural Heritage (Springer, 2011). [pdf] [bib]
- Bamman, David, "Mapping the Demographics of American English with Twitter," Language Log, May 18, 2010. [html]
Bamman, David, Alison Babeu, and Gregory Crane, "Transferring Structural Markup Across Translations Using Multilingual Alignment and Projection," in: Proceedings of the 10th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2010). Winner, Best Paper Award. [pdf] [bib]
Bamman, David, Francesco Mambrini and Gregory Crane, "An Ownership Model of Annotation: The Ancient Greek Dependency Treebank," in: Proceedings of the Eighth International Workshop on Treebanks and Linguistic Theories (TLT8) (Milan, Italy: 2009). [pdf] [bib]
Bamman, David, Marco Passarotti and Gregory Crane, "A Case Study in Treebank Collaboration and Comparison: Accusativus cum Infinitivo and Subordination in Latin," Prague Bulletin of Mathematical Linguistics 90 (2008). [pdf] [bib]
CMU Book Summary Dataset. 16,559 book plot summaries + metadata.
CMU Movie Summary Dataset. 42,306 movie plot summaries + metadata
Twitter14K Dataset. Aggregated word counts from 14,464 Twitter users (9.2M tweets)