Oil in place

2013-12-24T05:13:43Z

106.219.46.22: /* Calculation of STOOIP */

{{other uses2|Gain}}
{{Refimprove|date=December 2009}}
In [[information theory]] and [[machine learning]], '''information gain''' is a synonym for ''[[Kullback–Leibler divergence]]''. However, in the context of decision trees, the term is sometimes used synonymously with [[mutual information]], which is the expectation value of the Kullback–Leibler divergence.

In particular, the information gain about a random variable ''X'' obtained from an observation that a random variable ''A'' takes the value ''A=a'' is the Kullback-Leibler divergence ''D''<sub>KL</sub>(''p''(''x'' | ''a'') || ''p''(''x'' | I)) of the [[prior distribution]] ''p''(''x'' | I) for x from the [[posterior distribution]] ''p''(''x'' | ''a'') for ''x'' given ''a''.

The [[expected value]] of the information gain is the mutual information ''I(X; A)'' of ''X'' and ''A'' – i.e. the reduction in the [[information entropy|entropy]] of ''X'' achieved by learning the state of the random variable ''A''.

In machine learning, this concept can be used to define a preferred sequence of attributes to investigate to most rapidly narrow down the state of ''X''. Such a sequence (which depends on the outcome of the investigation of previous attributes at each stage) is called a [[Decision tree learning|decision tree]]. Usually an attribute with high mutual information should be preferred to other attributes.

==General definition==
In general terms, the [[expectation value|expected]] information gain is the change in [[information entropy]] <math>H</math> from a prior state to a state that takes some information as given:

<math> IG(T,a) = H(T) - H(T|a) </math>

==Formal definition==
Let <math>T</math> denote a [[training set|set of training examples]], each of the form <math>(\textbf{x},y) = (x_1, x_2, x_3, ..., x_k, y)</math>
where <math>x_a\in vals(a)</math> is the value of the <math>a</math>th attribute of example <math>\textbf{x}</math> and <math>y</math> is the corresponding class label. The information gain for an attribute <math>a</math> is defined in terms of entropy <math>H()</math> as follows:

<math>IG(T,a) = H(T)-\sum_{v\in vals(a)}\frac{|\{\textbf{x}\in T|x_a=v\}|}{|T|} \cdot H(\{\textbf{x}\in T|x_a=v\}) </math>

The mutual information is equal to the total entropy for an attribute if for each of the attribute values a unique [[statistical classification|classification]] can be made for the result attribute. In this case, the relative entropies subtracted from the total entropy are 0.

==Drawbacks==
Although information gain is usually a good measure for deciding the [[relevance]] of an attribute, it is not perfect. A notable problem occurs when information gain is applied to attributes that can take on a large number of distinct values. For example, suppose that one is building a decision tree for some data describing the customers of a business. Information gain is often used to decide which of the attributes are the most relevant, so they can be tested near the root of the tree. One of the input attributes might be the customer's credit card number. This attribute has a high mutual information, because it uniquely identifies each customer, but we do ''not'' want to include it in the decision tree: deciding how to treat a customer based on their credit card number is unlikely to generalize to customers we haven't seen before ([[overfitting]]).

[[Information gain ratio]] is sometimes used instead. This biases the decision tree against considering attributes with a large number of distinct values. However, attributes with very low information values then appeared to receive an unfair advantage. In addition, methods such as [[permutation tests]] have been proposed to correct the bias.<ref>{{cite conference
|author=Deng,H.|coauthors=Runger, G.; Tuv, E.
|title=Bias of importance measures for multi-valued attributes and solutions
|conference=Proceedings of the 21st International Conference on Artificial Neural Networks (ICANN2011)
|year=2011|pages= 293-300|doi=10.1007/978-3-642-21738-8_38}}</ref>

== Notes ==
{{Reflist}}

==References==
* {{cite book|authorlink=Tom Mitchell|author=Mitchell, Tom M.|title=Machine Learning|publisher=The Mc-Graw-Hill Companies, Inc.|year=1997|url=https://www.cs.cmu.edu/~tom/mlbook.html|isbn=0070428077}}

{{DEFAULTSORT:Information Gain In Decision Trees}}
[[Category:Decision trees]]
[[Category:Classification algorithms]]

formulasearchengine - User contributions [en]

Oil in place