ⓘ Compressed pattern matching. In computer science, compressed pattern matching is the process of searching for patterns in compressed data with little or no deco ..


ⓘ Compressed pattern matching

In computer science, compressed pattern matching is the process of searching for patterns in compressed data with little or no decompression. Searching in a compressed string is faster than searching an uncompressed string and requires less space.


1. Compressed matching problem

If the compressed file uses a variable width encoding it could be present a problem: for example, let" 100” be the codeword for a and let" 110100” be the codeword for b. If we are looking for an occurrence of a in the text we could obtain as result also an occurrence that is within the codeword of b: we call this event false match. So we have to verify if the occurrence detected is effectively aligned on a codeword boundary. However we could always decode the entire text and then apply a classic string matching algorithm, but this usually requires more space and time and often is not possible, for example if the compressed file is hosted online. This problem of verifying the match returned by the compressed pattern matching algorithm is a true or a false match together with the impossibility of decoding an entire text is called the compressed matching problem.


2. Strategies

Many strategies exist for finding the boundaries of codewords and avoiding full decompression of the text, for example:

  • List of the indices of first bit of each codeword with differential coding, so we can take less space within the file;
  • List of the indices of first bit of each codeword, where we can apply a binary search;
  • Subdivision in blocks, for a partial and aimed decompression.
  • Mask of bit, where bit 1 marks the starting bit of each codeword;
  • In computer science, a compressed suffix array is a compressed data structure for pattern matching Compressed suffix arrays are a general class of data
  • compressed suffix array and the FM - index, both of which can represent an arbitrary text of characters T for pattern matching Given any input pattern
  • string - matching algorithms, are an important class of string algorithms that try to find a place where one or several strings also called patterns are
  • systems DOS Compressed pattern matching searching for strings in compressed text without decompression Combinatorial pattern matching a research area
  • Vein matching also called vascular technology, is a technique of biometric identification through the analysis of the patterns of blood vessels visible
  • halftones are typically compressed using a context - dependent arithmetic coding algorithm called the MQ coder. Textual regions are compressed as follows: the foreground
  • tried to solve the compressed matching problem. In contrast, the FM - index is a compressed self - index, which means that it compresses the data and indexes
  • storage space, so the data is compressed Due to the density matching property of vector quantization, the compressed data has errors that are inversely
  • inverted file for all N - grams of the text Compressed suffix array FM - index LZ - index R. Grossi and J. S. Vitter, Compressed Suffix Arrays and Suffix Trees, with
  • his work in streaming algorithms, suffix tree construction, pattern matching in compressed data, cache - oblivious algorithms, and lowest common ancestor