Date(s) - 25/07/2013
4:00 pm - 5:00 pm
An increasing amount of informal communication is conducted in written form through computer-mediated channels. With the rise of publicly readable social media platforms like Twitter, it is now possible to apply computational methods to investigate language variation on a very large scale. I will describe a series of studies that document lexical variation on Twitter across a number of different social variables. In some cases, social media writing tracks spoken language variation: in particular, TD-deletion occurs in social media, and displays the same sensitivity to phonological context as in spoken language. In addition, relatively novel “netspeak” terms like emoticons and abbreviations can also be strongly affiliated with demographics and geography. Our recent work concerns language change over time, using a new dataset of hundreds of thousands of authors over nearly three years. Aggregating across thousands of words, we build a unified model of the geographic and demographic factors that drive the spread of words between cities.
This research was performed in collaboration with David Bamman, Brendan O’Connor, Tyler Schnoebelen, Noah A. Smith, and Eric P. Xing.