|C|=1000 and Other Brown Clustering Fallacies

Aktivitet: Præsentationer, medlemskaber, ansættelser, ejerskab og andre aktiviteterForedrag og mundtlige bidrag

Beskrivelse

Brown clustering has recently re-emerged as a competitive, unsupervised method for learning distributional word representations from an input corpus. It applies a greedy heuristic based on mutual information to group words into clusters, thereby reducing the sparsity of bigram information. Using the clusters as features has been shown over again to incur excellent performance on downstream NLP tasks. In this talk, however, I expose the naivety in how features are currently generated from Brown clusters. With a look into hyperparameter selection, the reality of Brown clustering output, and the algorithm itself, I will show that the space for improving the resultant word representations is predominantly unexplored.
Periode29 sep. 2015
Begivenhedstitel|C|=1000 and Other Brown Clustering Fallacies
BegivenhedstypeSeminar
PlaceringSheffield, StorbritannienVis på kort