Aarhus University Seal / Aarhus Universitets segl

|C|=1000 and Other Brown Clustering Fallacies

Activity: Talk or presentation typesLecture and oral contribution

See relations at Aarhus University

Sean Chester - Invited speaker

Brown clustering has recently re-emerged as a competitive, unsupervised method for learning distributional word representations from an input corpus. It applies a greedy heuristic based on mutual information to group words into clusters, thereby reducing the sparsity of bigram information. Using the clusters as features has been shown over again to incur excellent performance on downstream NLP tasks. In this talk, however, I expose the naivety in how features are currently generated from Brown clusters. With a look into hyperparameter selection, the reality of Brown clustering output, and the algorithm itself, I will show that the space for improving the resultant word representations is predominantly unexplored.
29 Sep 2015

Event (Seminar)

Title|C|=1000 and Other Brown Clustering Fallacies
LocationSheffield University
Country/TerritoryUnited Kingdom


  • clustering, feature generation, natural language processing, Brown clustering



ID: 93188613