Maier's theorem: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>K9re11
mNo edit summary
en>K9re11
 
Line 1: Line 1:
'''BIRCH''' (balanced iterative reducing and clustering using hierarchies) is an unsupervised [[data mining]] algorithm used to perform [[Data clustering|hierarchical clustering]] over particularly large data-sets. An advantage of Birch is its ability to incrementally and dynamically cluster incoming, multi-dimensional metric [[data point]]s in an attempt to produce the best quality clustering for a given set of resources (memory and [[time constraint]]s). In most cases, Birch only requires a single scan of the database. In addition, Birch is recognized<ref>{{Literatur
The types of kayaks in kayak reef fishing may not be substantially distinct from common kayaks. Thus, boat offshore fishing includes involving inflatable kayaks, lay [http://www.youtube.com/watch?v=fv36rb0LK9A best sit on top kayak] [http://www.youtube.com/watch?v=tyeFYMq56x8 best fishing kayak] kayaks and tandem kayaks. Another type of kayak applied entirely concerning offshore fishing may just be the rest angling boat. Fishing originating from a boat also has an good thing about having many of these markets that would not have try to be accessible or else.   <br><br> The organization furthermore possesses a wide array among boat equipment most notably hobby canoe add-on, kayak carts, boat harden handbags, kayak dried out containers, kayak rudders, kayak support, boat flotation, boat armor and weapon upgrades, compasses, shape owners, front connects and more. You can also invest kayak remedy and additionally maintaining equipment like for example adorn rigging products and additionally deplete plug kits. Present, these are actually at marked down pricing. <br><br> Sites giving out Kayaks discounted produce kayaks that can come in all of the different models with 5ft to be able to 12ft, and therefore are manufactured in line with the consumption. The buying price of many of these kayaks moreover fluctuates as per the specific trip stage and also dimension. The minimum worth of a canoe is focused on $300.
| Autor = Tian Zhang, Raghu Ramakrishnan, Miron Livny
| Titel = An Efficient Data Clustering Method for Very Large Databases
}}</ref> as the, "first clustering algorithm proposed in the database area to handle 'noise' (data points that are not part of the underlying pattern) effectively".
 
==Problem with previous methods==
Previous clustering algorithms performed less effectively over very large databases and did not adequately consider the case wherein a data-set was too large to fit in [[Primary storage|main memory]]. As a result, there was a lot of overhead maintaining high clustering quality while minimizing the cost of addition IO (input/output) operations. Furthermore, most of Birch's predecessors inspect all data points (or all currently existing clusters) equally for each 'clustering decision' and do not perform heuristic weighting based on the distance between these data points.
 
==Advantages with BIRCH==
It is local in that each clustering decision is made without scanning all data points and currently existing clusters.
It exploits the observation that data space is not usually uniformly occupied and not every data point is equally important.
It makes full use of available memory to derive the finest possible sub-clusters while minimizing I/O costs.
It is also an incremental method that does not require the whole [[data set]] in advance.
 
==BIRCH Clustering Algorithm==
 
Given a set of N d-dimensional data points, the '''clustering feature''' <math>CF</math> of the set is defined as the triple <math>CF = (N,LS,SS)</math>, where <math>LS</math> is the linear sum and <math>SS</math> is the square sum of data points.
 
Clustering features are organized in a '''CF tree''', which is a height [[Self-balancing binary search tree|balanced tree]] with two parameters: [[branching factor]] <math>B</math> and threshold <math>T</math>. Each non-leaf node contains at most <math>B</math> entries of the form <math>[CF_i,child_i]</math>, where <math>child_i</math> is a pointer to its <math>i</math>th [[Tree (data structure)|child node]] and <math>CF_i</math> the clustering feature representing the associated subcluster. A [[leaf node]] contains at most <math>L</math> entries each of the form <math>[CF_i]</math> . It also has two pointers prev and next which are used to chain all leaf nodes together. The tree size depends on the parameter T. A node is required to fit in a page of size P. B and L are determined by P. So P can be varied for [[performance tuning]]. It is a very compact representation of the dataset because each entry in a leaf node is not a single data point but a subcluster.
 
In the algorithm in the first step it scans all data and builds an initial memory CF tree using the given amount of memory. In the second step it scans all the leaf entries in the initial CF tree to rebuild a smaller CF tree, while removing outliers and grouping crowded subclusters into larger ones. In step three an existing clustering algorithm is used to cluster all leaf entries. Here an agglomerative hierarchical clustering algorithm is applied directly to the subclusters represented by their CF vectors. It also provides the flexibility of allowing the user to specify either the desired number of clusters or the desired diameter threshold for clusters. After this step a set of clusters is obtained that captures major distribution pattern in the data. However there might exist minor and localized inaccuracies which can be handled by an optional step 4. In step 4 the centroids of the clusters produced in step 3 are used as seeds and redistribute the data points to its closest seeds to obtain a new set of clusters. Step 4 also provides us with an option of discarding outliers. That is a point which is too far from its closest seed can be treated as an outlier.
 
==Awards==
It has received
<ref>{{Literatur
| url = http://www.sigmod.org/sigmod-awards/citations/2006-sigmod-test-of-time-award-1
| Titel = http://www.sigmod.org/sigmod-awards/citations/2006-sigmod-test-of-time-award-1
}}</ref> the SIGMOD 10 year test of time award.
 
==External links==
* http://people.cs.ubc.ca/~rap/teaching/504/2005/slides/Birch.pdf
* http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.17.2504
 
==Notes==
{{Reflist}}
 
{{DEFAULTSORT:Birch (Data Clustering)}}
[[Category:Data clustering algorithms]]

Latest revision as of 18:24, 12 December 2014

The types of kayaks in kayak reef fishing may not be substantially distinct from common kayaks. Thus, boat offshore fishing includes involving inflatable kayaks, lay best sit on top kayak best fishing kayak kayaks and tandem kayaks. Another type of kayak applied entirely concerning offshore fishing may just be the rest angling boat. Fishing originating from a boat also has an good thing about having many of these markets that would not have try to be accessible or else.

The organization furthermore possesses a wide array among boat equipment most notably hobby canoe add-on, kayak carts, boat harden handbags, kayak dried out containers, kayak rudders, kayak support, boat flotation, boat armor and weapon upgrades, compasses, shape owners, front connects and more. You can also invest kayak remedy and additionally maintaining equipment like for example adorn rigging products and additionally deplete plug kits. Present, these are actually at marked down pricing.

Sites giving out Kayaks discounted produce kayaks that can come in all of the different models with 5ft to be able to 12ft, and therefore are manufactured in line with the consumption. The buying price of many of these kayaks moreover fluctuates as per the specific trip stage and also dimension. The minimum worth of a canoe is focused on $300.