Compressed Data Structures for Strings: On Searching and by Rossano Venturini

By Rossano Venturini

Data compression is obligatory to regulate great datasets, indexing is prime to question them. despite the fact that, their pursuits seem as counterposed: the previous goals at minimizing info redundancies, while the latter augments the dataset with auxiliary details to hurry up the question solution. during this monograph we introduce options that conquer this dichotomy. we commence by way of proposing using optimization suggestions to enhance the compression of classical info compression algorithms, then we movement to the layout of compressed info buildings supplying quickly random entry or effective trend matching queries at the compressed dataset. those theoretical reviews are supported via experimental evidences in their effect in sensible scenarios.

Show description

Read Online or Download Compressed Data Structures for Strings: On Searching and Extracting Strings from Compressed Textual Data PDF

Best logic books

Lectures on Algebraic Model Theory

In recent times, version thought has had outstanding luck in fixing very important difficulties in addition to in laying off new gentle on our figuring out of them. the 3 lectures accrued right here current fresh advancements in 3 such parts: Anand Pillay on differential fields, Patrick Speissegger on o-minimality and Matthias Clasen and Matthew Valeriote on tame congruence idea.

Foundations of Computing

Presents an creation to the idea of computing technology. overlaying the most components of complexity thought, automata and formal languages in a coherent approach, the textual content additionally covers the theoretical points of extra utilized components. The author's procedure is to stimulate scholars' realizing of the relevance of thought to big software parts - for instance, picture processing, verbal exchange networks and cryptography are all mentioned.

Compressed Data Structures for Strings: On Searching and Extracting Strings from Compressed Textual Data

Information compression is crucial to regulate tremendous datasets, indexing is prime to question them. notwithstanding, their objectives look as counterposed: the previous goals at minimizing information redundancies, while the latter augments the dataset with auxiliary details to hurry up the question answer. during this monograph we introduce options that triumph over this dichotomy.

A Boole Anthology: Recent and Classical Studies in the Logic of George Boole

Sleek mathematical common sense wouldn't exist with no the analytical instruments first constructed via George Boole within the Mathematical research of common sense and The legislation of notion. The impression of the Boolean institution at the improvement of common sense, regularly acknowledged yet lengthy underestimated, has lately develop into an important examine subject.

Extra resources for Compressed Data Structures for Strings: On Searching and Extracting Strings from Compressed Textual Data

Example text

Therefore, the latter is ε( √ our proposed √ partition. Since σ ∈ n, the ratio among the two partitions is ε( log n). 4 On k-th Order Compressors In this section we make one step further and consider the more powerful k-th order compressors, for which there exist Hk bounds for estimating the size of their compressed output (see Sect. 1). Here Size(wi ) must compute |C(T [l : ri ])| which is estimated by (ri − l + 1)Hk (T [l : ri ]) + f k (ri − l + 1, |σT [l:ri ] |), where σT [l,ri ] denotes the number of different symbols in T [l : ri ].

Given this, we will call ε-maximal the edges of Gε (T ). Clearly, each vertex of Gε (T ) has at most log1+ε n = O( 1ε log n) outgoing edges, which are ε-maximal by definition. Therefore the total size of Gε (T ) is at most O( nε log n). Hereafter, we will denote with dG (−, −) the shortest path distance between any two nodes in a graph G. 1. For any triple of indices 1 ≤ i ≤ j ≤ q ≤ n + 1 we have: (1) dG(T ) (v j , vq ) ≤ dG(T ) (vi , vq ) (2) dG(T ) (vi , v j ) ≤ dG(T ) (vi , vq ) Proof. We prove just 1, since 2 is symmetric.

Since σ ∈ n, the ratio among the two partitions is ε( log n). 4 On k-th Order Compressors In this section we make one step further and consider the more powerful k-th order compressors, for which there exist Hk bounds for estimating the size of their compressed output (see Sect. 1). Here Size(wi ) must compute |C(T [l : ri ])| which is estimated by (ri − l + 1)Hk (T [l : ri ]) + f k (ri − l + 1, |σT [l:ri ] |), where σT [l,ri ] denotes the number of different symbols in T [l : ri ]. Let us denote with Tq [1 : n − q] the text whose i-th symbol T [i] is equal to the q-gram T [i : i + q − 1].

Download PDF sample

Rated 4.16 of 5 – based on 37 votes