Aarhus University Seal / Aarhus Universitets segl

Generalised Brown Clustering and Roll-up Feature Generation

Research output: Contribution to book/anthology/report/proceedingArticle in proceedingsResearchpeer-review

  • Leon Derczynski, University of Sheffield, Sheffield, United Kingdom
  • Sean Chester
Brown clustering is an established technique, used in hundreds of computational linguistics papers each year, to group word types that have similar distributional information. It is unsupervised and can be used to create powerful word representations for machine learning. Despite its improbable success relative to more complex methods, few have investigated whether Brown clustering has really been applied optimally.

In this paper, we present a subtle but profound generalisation of Brown clustering to improve the overall quality by decoupling the number of output classes from the computational active set size. Moreover, the generalisation permits a novel approach to feature selection from Brown clusters: We show that the standard approach of shearing the Brown clustering output tree at arbitrary bitlengths is lossy and that features should be chosen instead by rolling up Generalised Brown hierarchies. The generalisation and corresponding feature generation is more principled, challenging the way Brown clustering is currently understood and applied.
Original languageEnglish
Title of host publicationThe Thirtieth AAAI Conference on Artificial Intelligence : AAAI
Number of pages7
PublisherAAAI Press
Publication year21 Feb 2016
ISBN (print)978-1-57735-700-1
Publication statusPublished - 21 Feb 2016
EventThe Thirtieth AAAI Conference on Artificial Intelligence - Phoenix Convention Center, Phoenix, United States
Duration: 12 Feb 201617 Feb 2017


ConferenceThe Thirtieth AAAI Conference on Artificial Intelligence
LocationPhoenix Convention Center
LandUnited States

    Research areas

  • clustering, natural language processing, feature generation

See relations at Aarhus University Citationformats

ID: 94175716