Generalised Brown Clustering and Roll-up Feature Generation

Leon Derczynski, Sean Chester

Publikation: Bidrag til bog/antologi/rapport/proceedingKonferencebidrag i proceedingsForskningpeer review

11 Citationer (Scopus)

Abstract

Brown clustering is an established technique, used in hundreds of computational linguistics papers each year, to group word types that have similar distributional information. It is unsupervised and can be used to create powerful word representations for machine learning. Despite its improbable success relative to more complex methods, few have investigated whether Brown clustering has really been applied optimally. In this paper, we present a subtle but profound generalisation of Brown clustering to improve the overall quality by decoupling the number of output classes from the computational active set size. Moreover, the generalisation permits a novel approach to feature selection from Brown clusters: We show that the standard approach of shearing the Brown clustering output tree at arbitrary bitlengths is lossy and that features should be chosen insead by rolling up Generalised Brown hierarchies. The generalisation and corresponding feature generation is more principled, challenging the way Brown clustering is currently understood and applied.

OriginalsprogEngelsk
Titel30th AAAI Conference on Artificial Intelligence, AAAI 2016 : AAAI
Antal sider7
ForlagAAAI Press
Publikationsdato21 feb. 2016
Sider1533-1539
ISBN (Trykt)978-1-57735-700-1
ISBN (Elektronisk)9781577357605
StatusUdgivet - 21 feb. 2016
BegivenhedThe Thirtieth AAAI Conference on Artificial Intelligence - Phoenix Convention Center, Phoenix, USA
Varighed: 12 feb. 201617 feb. 2017
http://www.aaai.org/Conferences/AAAI/aaai16.php

Konference

KonferenceThe Thirtieth AAAI Conference on Artificial Intelligence
LokationPhoenix Convention Center
Land/OmrådeUSA
ByPhoenix
Periode12/02/201617/02/2017
Internetadresse

Fingeraftryk

Dyk ned i forskningsemnerne om 'Generalised Brown Clustering and Roll-up Feature Generation'. Sammen danner de et unikt fingeraftryk.

Citationsformater