Research output: Contribution to book/anthology/report/proceeding › Article in proceedings › Research › peer-review

**Low rank approximation of binary matrices : Column subset selection and generalizations.** / Dan, Chen; Hansen, Kristoffer Arnsfelt; Jiang, He; Wang, Liwei; Zhou, Yuchen.

Research output: Contribution to book/anthology/report/proceeding › Article in proceedings › Research › peer-review

Dan, C, Hansen, KA, Jiang, H, Wang, L & Zhou, Y 2018, Low rank approximation of binary matrices: Column subset selection and generalizations. in I Potapov, J Worrell & P Spirakis (eds), *43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018.* vol. 117, 41, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, Leibniz International Proceedings in Informatics, no. 117, pp. 41:1-41:16, 43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018, Liverpool, United Kingdom, 27/08/2018. https://doi.org/10.4230/LIPIcs.MFCS.2018.41

Dan, C., Hansen, K. A., Jiang, H., Wang, L., & Zhou, Y. (2018). Low rank approximation of binary matrices: Column subset selection and generalizations. In I. Potapov, J. Worrell, & P. Spirakis (Eds.), *43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018 *(Vol. 117, pp. 41:1-41:16). [41] Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. Leibniz International Proceedings in Informatics, No. 117 https://doi.org/10.4230/LIPIcs.MFCS.2018.41

Dan C, Hansen KA, Jiang H, Wang L, Zhou Y. 2018. Low rank approximation of binary matrices: Column subset selection and generalizations. Potapov I, Worrell J, Spirakis P, editors. In 43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. pp. 41:1-41:16. (Leibniz International Proceedings in Informatics; No. 117). https://doi.org/10.4230/LIPIcs.MFCS.2018.41

Dan, Chen et al. "Low rank approximation of binary matrices: Column subset selection and generalizations"., Potapov, Igor Worrell, James Spirakis, Paul (editors). *43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018.* Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. (Leibniz International Proceedings in Informatics; Journal number 117). 2018, 41:1-41:16. https://doi.org/10.4230/LIPIcs.MFCS.2018.41

Dan C, Hansen KA, Jiang H, Wang L, Zhou Y. Low rank approximation of binary matrices: Column subset selection and generalizations. In Potapov I, Worrell J, Spirakis P, editors, 43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018. Vol. 117. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. 2018. p. 41:1-41:16. 41. (Leibniz International Proceedings in Informatics; No. 117). https://doi.org/10.4230/LIPIcs.MFCS.2018.41

Dan, Chen ; Hansen, Kristoffer Arnsfelt ; Jiang, He ; Wang, Liwei ; Zhou, Yuchen. / **Low rank approximation of binary matrices : Column subset selection and generalizations**. 43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018. editor / Igor Potapov ; James Worrell ; Paul Spirakis. Vol. 117 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2018. pp. 41:1-41:16 (Leibniz International Proceedings in Informatics; No. 117).

@inproceedings{33346961cd094b01af35c6cd152d042e,

title = "Low rank approximation of binary matrices: Column subset selection and generalizations",

abstract = "Low rank approximation of matrices is an important tool in machine learning. Given a data matrix, low rank approximation helps to find factors, patterns, and provides concise representations for the data. Research on low rank approximation usually focuses on real matrices. However, in many applications data are binary (categorical) rather than continuous. This leads to the problem of low rank approximation of binary matrices. Here we are given a d × n binary matrix A and a small integer k < d. The goal is to find two binary matrices U and V of sizes d × k and k × n respectively, so that the Frobenius norm of A − UV is minimized. There are two models of this problem, depending on the definition of the dot product of binary vectors: The GF(2) model and the Boolean semiring model. Unlike low rank approximation of a real matrix which can be efficiently solved by Singular Value Decomposition, we show that approximation of a binary matrix is NP-hard, even for k = 1. In this paper, our main concern is the problem of Column Subset Selection (CSS), in which the low rank matrix U must be formed by k columns of the data matrix, and we are interested in the approximation ratio achievable by CSS for binary matrices. For the GF(2) model, we show that CSS has approximation ratio bounded by k 2 + 1 + 2(2k k −1) and this is asymptotically tight. For the Boolean model, it turns out that CSS is no longer sufficient to obtain a bound. We then develop a Generalized CSS (GCSS) procedure in which the columns of U are generated from Boolean formulas operating bitwise on selected columns of the data matrix. We show that the approximation ratio achieved by GCSS is bounded by 2k−1 + 1, and argue that an exponential dependency on k is seems inherent.",

keywords = "Approximation algorithms, Binary matrices, Low rank approximation",

author = "Chen Dan and Hansen, {Kristoffer Arnsfelt} and He Jiang and Liwei Wang and Yuchen Zhou",

year = "2018",

doi = "10.4230/LIPIcs.MFCS.2018.41",

language = "English",

isbn = "9783959770866",

volume = "117",

series = "Leibniz International Proceedings in Informatics",

publisher = "Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing",

number = "117",

pages = "41:1--41:16",

editor = "Igor Potapov and James Worrell and Paul Spirakis",

booktitle = "43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018",

note = "43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018 ; Conference date: 27-08-2018 Through 31-08-2018",

}

TY - GEN

T1 - Low rank approximation of binary matrices

T2 - 43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018

AU - Dan, Chen

AU - Hansen, Kristoffer Arnsfelt

AU - Jiang, He

AU - Wang, Liwei

AU - Zhou, Yuchen

PY - 2018

Y1 - 2018

N2 - Low rank approximation of matrices is an important tool in machine learning. Given a data matrix, low rank approximation helps to find factors, patterns, and provides concise representations for the data. Research on low rank approximation usually focuses on real matrices. However, in many applications data are binary (categorical) rather than continuous. This leads to the problem of low rank approximation of binary matrices. Here we are given a d × n binary matrix A and a small integer k < d. The goal is to find two binary matrices U and V of sizes d × k and k × n respectively, so that the Frobenius norm of A − UV is minimized. There are two models of this problem, depending on the definition of the dot product of binary vectors: The GF(2) model and the Boolean semiring model. Unlike low rank approximation of a real matrix which can be efficiently solved by Singular Value Decomposition, we show that approximation of a binary matrix is NP-hard, even for k = 1. In this paper, our main concern is the problem of Column Subset Selection (CSS), in which the low rank matrix U must be formed by k columns of the data matrix, and we are interested in the approximation ratio achievable by CSS for binary matrices. For the GF(2) model, we show that CSS has approximation ratio bounded by k 2 + 1 + 2(2k k −1) and this is asymptotically tight. For the Boolean model, it turns out that CSS is no longer sufficient to obtain a bound. We then develop a Generalized CSS (GCSS) procedure in which the columns of U are generated from Boolean formulas operating bitwise on selected columns of the data matrix. We show that the approximation ratio achieved by GCSS is bounded by 2k−1 + 1, and argue that an exponential dependency on k is seems inherent.

AB - Low rank approximation of matrices is an important tool in machine learning. Given a data matrix, low rank approximation helps to find factors, patterns, and provides concise representations for the data. Research on low rank approximation usually focuses on real matrices. However, in many applications data are binary (categorical) rather than continuous. This leads to the problem of low rank approximation of binary matrices. Here we are given a d × n binary matrix A and a small integer k < d. The goal is to find two binary matrices U and V of sizes d × k and k × n respectively, so that the Frobenius norm of A − UV is minimized. There are two models of this problem, depending on the definition of the dot product of binary vectors: The GF(2) model and the Boolean semiring model. Unlike low rank approximation of a real matrix which can be efficiently solved by Singular Value Decomposition, we show that approximation of a binary matrix is NP-hard, even for k = 1. In this paper, our main concern is the problem of Column Subset Selection (CSS), in which the low rank matrix U must be formed by k columns of the data matrix, and we are interested in the approximation ratio achievable by CSS for binary matrices. For the GF(2) model, we show that CSS has approximation ratio bounded by k 2 + 1 + 2(2k k −1) and this is asymptotically tight. For the Boolean model, it turns out that CSS is no longer sufficient to obtain a bound. We then develop a Generalized CSS (GCSS) procedure in which the columns of U are generated from Boolean formulas operating bitwise on selected columns of the data matrix. We show that the approximation ratio achieved by GCSS is bounded by 2k−1 + 1, and argue that an exponential dependency on k is seems inherent.

KW - Approximation algorithms

KW - Binary matrices

KW - Low rank approximation

UR - http://www.scopus.com/inward/record.url?scp=85053213811&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.MFCS.2018.41

DO - 10.4230/LIPIcs.MFCS.2018.41

M3 - Article in proceedings

AN - SCOPUS:85053213811

SN - 9783959770866

VL - 117

T3 - Leibniz International Proceedings in Informatics

SP - 41:1-41:16

BT - 43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018

A2 - Potapov, Igor

A2 - Worrell, James

A2 - Spirakis, Paul

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

Y2 - 27 August 2018 through 31 August 2018

ER -