Date(s) - 12/07/2013
11:00 am - 12:00 pm
Languages vary by speaker and situation, and change over time. While variation and change are inhibited in written corpora such as news text, they are endemic to social media, enabling large-scale investigation of language’s social and temporal dimensions. The first part of this talk will describe a method for characterizing group-level language differences, using the Sparse Additive Generative Model (SAGE). SAGE is based on a re-parametrization of the multinomial distribution that is amenable to sparsity-inducing regularization and facilitates joint modeling across many author characteristics. The second part of the talk concerns change and influence. Using a novel dataset of geotagged word counts, we induce a network of linguistic influence between cities, aggregating across thousands of words. We then explore the demographic and geographic factors that drive spread of new words between cities.
This work is in collaboration with Amr Ahmed, Brendan O’Connor, Noah A. Smith, and Eric P. Xing.