PEG ratio: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
fix link
en>Funandtrvl
fx ref
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
'''R*-trees''' are a variant of [[R-tree]]s used for indexing spatial information. R*-trees support point and spatial data at the same time with a slightly higher cost than other R-trees.{{Clarify|reason=slightly higher compared to what? and are 'other R-trees' referring just to 'standard R-trees'?|date=June 2012}}
The title of the author is Garland. Her family members lives in Idaho. Interviewing is what I do in my day job. The thing I adore most bottle tops collecting and now I have time to consider on new issues.<br><br>My web page: [http://Xp5.net/profile/gwkuykenda http://Xp5.net]
It was proposed by Norbert Beckmann, [[Hans-Peter Kriegel]], Ralf Schneider, and Bernhard Seeger in 1990.<ref name="rstar">{{Cite doi | 10.1145.2F93597.98741}}</ref>
 
==Difference between R*-trees and R-trees==
[[Image:RTree 2D.svg|thumb|350px|R*-Tree built by repeated insertion (in [[Environment for DeveLoping KDD-Applications Supported by Index-Structures|ELKI]]). There is little overlap in this tree, resulting in good query performance. Red and blue MBRs are index pages, green MBRs are leaf nodes.]]
Minimization of both coverage and overlap is crucial to the performance of R-trees. Overlap means that, on data query or insertion, more than one branch of the tree needs to be expanded (due to the way data is being split in regions which may overlap). A minimized coverage improves pruning performance, allowing to exclude whole pages from search more often, in particular for negative range queries.
 
The R*-tree attempts to reduce both, using a combination of a revised node split algorithm and the concept of forced reinsertion
at node overflow. This is based on the observation that R-tree structures are highly susceptible
to the order in which their entries are inserted, so an insertion-built (rather than bulk-loaded) structure
is likely to be sub-optimal. Deletion and reinsertion of entries allows them to "find" a place in the tree
that may be more appropriate than their original location.
 
When a node overflows, a portion of its entries are removed from the node and reinserted into the tree.
(In order to avoid an indefinite cascade of reinsertions caused by subsequent node overflow, the reinsertion
routine may be called only once in each level of the tree when inserting any one new entry.) This has the
effect of producing more well-clustered groups of entries in nodes, reducing node coverage. Furthermore,
actual node splits are often postponed, causing average node occupancy to rise.
Re-insertion can be seen as a method of incremental tree optimization triggered on node overflow.
 
==Performance==
*Improved split heuristic produces pages that are more rectangular and thus better for many applications.
*Reinsertion method optimizes the existing tree, but increases complexity.
*Efficiently supports point and spatial data at the same time.
{{clear}}
 
{{Gallery
|title=Effect of different splitting heuristics on a database with Germany postal districts
|width=300 | height=300 | align=center | lines=6
|File:Zipcodes-Germany-GuttmanRTree.svg|R-Tree with Guttman quadratic split.<ref name="guttman">{{cite doi | 10.1145/602259.602266 }}</ref><br /> There are many pages that extend from east to west all over Germany, and pages overlap a lot. This is not beneficial for most applications, that often only need a small rectangular area that intersects with many slices.
|File:Zipcodes-Germany-AngTanSplit.svg|R-Tree with Ang-Tan linear split.<ref name="ang-tan">{{cite doi | 10.1007/3-540-63238-7_38}}</ref><br /> While the slices do not extend as far as with Guttman, the slicing problem affects almost every leaf page. Leaf pages overlap little, but directory pages do.
|File:Zipcodes-Germany-RStarTree.svg|'''R* tree''' topological split.<ref name="rstar" /><br /> The pages overlap very little since the R*-tree tries to minimize page overlap, and the reinsertions further optimized the tree. The split strategy also does not prefer slices, the resulting pages are much more useful for common map applications.
}}
 
==Algorithm and complexity==
* The R*-tree uses the same algorithm as the regular [[R-tree]] for query and delete operations.
* When inserting, the R*-tree uses a combined strategy. For leaf nodes, overlap is minimized, while for inner nodes, enlargement and area are minimized.
* When splitting, the R*-tree uses a topological split that chooses a split axis based on perimeter, then minimizes overlap.
* In addition to an improved split strategy, the R*-tree also tries to avoid splits by reinserting objects and subtrees into the tree, inspired by the concept of balancing a [[B-tree]].
 
Obviously, worst case query and delete complexity are thus identical to the R-Tree. The insertion strategy to the R*-tree is with <math>\mathcal{O}(M \log M)</math> more complex than the linear split strategy (<math>\mathcal{O}(M)</math>) of the R-tree, but less complex than the quadratic split strategy (<math>\mathcal{O}(M^2)</math>) for a page size of <math>M</math> objects and has little impact on the total complexity. The total insert complexity is still comparable to the R-tree: reinsertions affect at most one branch of the tree and thus <math>\mathcal{O}(\log n)</math> reinsertions, comparable to performing a split on a regular R-tree. So on overall, the complexity of the R*-tree is the same as that of a regular R-tree.
 
An implementation of the full algorithm must address many corner cases and tie situations not discussed here.
 
==References==
{{reflist}}
 
==External links==
{{commons category}}
*[http://donar.umiacs.umd.edu/quadtree/points/rtrees.html R-tree Demo]
*[http://www.madalgo.au.dk/tpie The TPIE Library contains a C++ R* tree implementation]
*[http://www.virtualroadside.com/blog/index.php/2008/10/04/r-tree-implementation-for-cpp/ A header-only C++ R* Tree Implementation]
*[http://libspatialindex.github.io/ C++ implementation are in the Spatial Index Library]
*[http://www.boost.org/doc/libs/release/libs/geometry/doc/html/index.html Boost.Geometry library containing R*-tree implementation]
 
{{CS trees}}
{{Data structures}}
 
{{DEFAULTSORT:R* Tree}}
[[Category:R-tree]]
[[Category:Database index techniques]]
 
[[de:R-Baum]]

Latest revision as of 01:35, 23 December 2014

The title of the author is Garland. Her family members lives in Idaho. Interviewing is what I do in my day job. The thing I adore most bottle tops collecting and now I have time to consider on new issues.

My web page: http://Xp5.net