Dynamic dictionary-based compression schemes are the most daily used data compression schemes since they appeared in the foundational paper of Ziv and Lempel in 1977, commonly referred to as LZ77. In dynamic setting, LZ77 considers a portion of the previous text as a dictionary and it uses a greedy approach to select, at each step, the longest match between the text and the dictionary. Compression is achieved by replacing matches with encoded dictionary pointers. LZ77 is the base of gzip, zip, rar, 7zip and many others compression software. All these compression schemes use variants of the greedy approach to parse (or factorise) the text into dictionary phrases. Greedy parsing optimality with respect to the number of phrases was proved by Storer et al. (1982) for unbounded LZ77-based dictionaries and by Cohn et al. (1996) for static suffix-closed dictionaries. The optimality of the greedy parsing was never proved for bounded size dictionary which is actually required by all of these practical schemes. In this article, we define the suffix-closed property for dynamic dictionaries, and we show that LZ77-based compression schemes, including the bounded dictionary variants, satisfy this property. Under this condition we prove the optimality of the greedy parsing as a variant of the proof by Cohn et al. © 2014 Elsevier B.V.

Note on the greedy parsing optimality for dictionary-based text compression

Langiu Alessio;
2014

Abstract

Dynamic dictionary-based compression schemes are the most daily used data compression schemes since they appeared in the foundational paper of Ziv and Lempel in 1977, commonly referred to as LZ77. In dynamic setting, LZ77 considers a portion of the previous text as a dictionary and it uses a greedy approach to select, at each step, the longest match between the text and the dictionary. Compression is achieved by replacing matches with encoded dictionary pointers. LZ77 is the base of gzip, zip, rar, 7zip and many others compression software. All these compression schemes use variants of the greedy approach to parse (or factorise) the text into dictionary phrases. Greedy parsing optimality with respect to the number of phrases was proved by Storer et al. (1982) for unbounded LZ77-based dictionaries and by Cohn et al. (1996) for static suffix-closed dictionaries. The optimality of the greedy parsing was never proved for bounded size dictionary which is actually required by all of these practical schemes. In this article, we define the suffix-closed property for dynamic dictionaries, and we show that LZ77-based compression schemes, including the bounded dictionary variants, satisfy this property. Under this condition we prove the optimality of the greedy parsing as a variant of the proof by Cohn et al. © 2014 Elsevier B.V.
2014
Data compression
Greedy parsing
Lempel-Ziv factorisation
LZ77 algorithm
Optimal parsing
Text compression
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/269885
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? ND
social impact