Beskrivelse
Brown clustering has recently re-emerged as a competitive, unsupervised method for learning distributional word representations from an input corpus. It applies a greedy heuristic based on mutual information to group words into clusters, thereby reducing the sparsity of bigram information. Using the clusters as features has been shown over again to incur excellent performance on downstream NLP tasks. In this talk, however, I expose the naivety in how features are currently generated from Brown clusters. With a look into hyperparameter selection, the reality of Brown clustering output, and the algorithm itself, I will show that the space for improving the resultant word representations is predominantly unexplored.Periode | 29 sep. 2015 |
---|---|
Begivenhedstitel | |C|=1000 and Other Brown Clustering Fallacies |
Begivenhedstype | Seminar |
Placering | Sheffield, StorbritannienVis på kort |
Relateret indhold
-
Projekter
-
Improving decision making from massive data collections using wall-sized, highly interactive visualizations
Projekter: Projekt › Forskning
-
Publikation
-
Tune Your Brown Clustering, Please
Publikation: Bidrag til bog/antologi/rapport/proceeding › Konferencebidrag i proceedings › Forskning › peer review