Algorithms for aggregate information extraction from sequences

Sammanfattning: In this thesis, we propose efficient algorithms for aggregate information extraction from sequences and multidimensional arrays. The algorithms proposed are applicable in several important areas, including large databases and DNA sequence segmentation. We first study the problem of efficiently computing, for a given range, the range-sum in a multidimensional array as well as computing the k maximum values, called the top-k values. We design two efficient data structures for these problems. For the range-sum problem, our structure supports fast update while preserving low complexity of range-sum query. The proposed top-k structure provides fast query computation in linear time proportional to the sum of the sizes of a two-dimensional query region. We also study the k maximum sum subsequences problem and develop several efficient algorithms. In this problem, the k subsegments of consecutive elements with largest sum are to be found. The segments can potentially overlap, which allows for a large number of possible candidate segments. Moreover, we design an optimal algorithm for ranking the k maximum sum subsequences. Our solution does not require the value of k to be known a priori. Furthermore, an optimal linear-time algorithm is developed for the maximum cover problem of finding k subsequences of consecutive elements of maximum total element sum.

  KLICKA HÄR FÖR ATT SE AVHANDLINGEN I FULLTEXT. (PDF-format)