Popoviciu's inequality on variances: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Melcombe
m typo, add stub tag
 
en>Solomon7968
wikilink for Tiberiu Popoviciu
 
Line 1: Line 1:
{{Notability|date=January 2010}}


'''Linear Belief Function''' is an extension of the [[Dempster-Shafer theory]] of [[belief functions]] to the case when variables of interest are [[continuous]]{{disambiguation needed|date=October 2012}}. Examples of such variables include financial asset prices, portfolio performance, and other antecedent and consequent variables. The theory was originally proposed by [[Arthur P. Dempster]]<ref>A. P. Dempster, "Normal belief functions and the [[Kalman filter]]," in ''Data Analysis from Statistical Foundations'', A. K. M. E. Saleh, Ed.: Nova Science Publishers, 2001, pp. 65-84.</ref> in the context of Kalman Filters and later was reelaborated, refined, and applied to knowledge representation in artificial intelligence and decision making in finance and accounting by Liping Liu.<ref>Liu, Liping, Catherine Shenoy, and Prakash P. Shenoy, “Knowledge Representation and Integration for Portfolio Evaluation Using Linear Belief Functions,” IEEE Transactions on Systems, Man, and Cybernetics, Series A, vol. 36 (4), 2006, pp. 774-785.</ref>


It involves expertise and knowledge of various tools and technologies used for creating websites. This means you can setup your mailing list and auto-responder on your wordpress site and then you can add your subscription form to any other blog, splash page, capture page or any other site you like. * A community forum for debate of the product together with some other customers in the comments spot. 2- Ask for the designs and graphics that will be provided along with the Word - Press theme. You can easily customize the titles of the posts in Word - Press blog in a way that only title comes in the new post link and not the date or category of posts. <br><br>Luckily, for Word - Press users, WP Touch plugin transforms your site into an IPhone style theme. After all, Word - Press is free, many of the enhancements for Word - Press like themes and plugins are also free, and there is plenty of free information online about how to use Word - Press. It allows Word - Press users to easily use HTML5 the element enable native video playback within the browser. Furthermore, with the launch of Windows 7 Phone is the smart phone market nascent App. Many times the camera is following Mia, taking in her point of view in almost every frame. <br><br>Your Word - Press blog or site will also require a domain name which many hosting companies can also provide. When a business benefits from its own domain name and a tailor-made blog, the odds of ranking higher in the search engines and being visible to a greater number of people is more likely. Are you considering getting your website redesigned. Our skilled expertise, skillfulness and excellence have been well known all across the world. Websites using this content based strategy are always given top scores by Google. <br><br>It is the convenient service through which professionals either improve the position or keep the ranking intact. This plugin allows a webmaster to create complex layouts without having to waste so much time with short codes. When we talk about functional suitability, Word - Press proves itself as one of the strongest contestant among its other rivals. The most important plugins you will need are All-in-One SEO Pack, some social bookmarking plugin, a Feedburner plugin and an RSS sign up button. It does take time to come up having a website that gives you the much needed results hence the web developer must be ready to help you along the route. <br><br>There is no denying that Magento is an ideal platform for building ecommerce websites, as it comes with an astounding number of options that can help your online business do extremely well. An ease of use which pertains to both internet site back-end and front-end users alike. As a result, it is really crucial to just take aid of some experience when searching for superior quality totally free Word - Press themes, Word - Press Premium Themes for your web site. )  Remote LoginWith the process of PSD to Wordpress conversion comes the advantage of flexibility. If you adored this article and you simply would like to get more info relating to [http://deurl.de/wordpress_dropbox_backup_238928 backup plugin] please visit our webpage. For your information, it is an open source web content management system.
== Concept ==
 
A linear belief function intends to represent our belief regarding the location of the true value as follows: We are certain that the truth is on a so-called certainty [[hyperplane]] but we do not know its exact location; along some dimensions of the certainty hyperplane, we believe the true value could be anywhere from –∞ to +∞ and the probability of being at a particular location is described by a [[normal distribution]]; along other dimensions, our knowledge is [[vacuous]], i.e., the true value is somewhere from –∞ to +∞ but the associated probability is unknown. A [[belief function]] in general is defined by a [[mass function]] over a class of [[focal elements]], which may have nonempty intersections. A linear belief function is a special type of [[belief function]] in the sense that its [[focal elements]] are exclusive, parallel sub-hyperplanes over the certainty hyperplane and its [[mass function]] is a [[normal distribution]] across the sub-hyperplanes.
 
Based on the above geometrical description, Shafer<ref>G. Shafer, "A note on Dempster's Gaussian belief functions," School of Business, University of Kansas, Lawrence, KS, Technical Report 1992.</ref> and Liu<ref>L. Liu, "A theory of Gaussian belief functions," ''International Journal of Approximate Reasoning'', vol. 14, pp. 95-126, 1996</ref>  propose two mathematical representations of a LBF: a wide-sense inner product and a linear functional in the variable space, and as their duals over a hyperplane in the sample space. Monney <ref>P. A. Monney, ''A Mathematical Theory of Arguments for Statistical Evidence''. New York, NY: Springer, 2003.</ref> proposes still another structure called Gaussian hints. Although these representations are mathematically neat, they tend to be unsuitable for knowledge representation in expert systems.
 
== Knowledge Representation ==
 
A linear belief function can represent both logical and probabilistic knowledge for three types of variables: deterministic such as an observable or controllable, random whose distribution is normal, and vacuous on which no knowledge bears. Logical knowledge is represented by linear equations, or geometrically, a certainty hyperplane. Probabilistic knowledge is represented by a normal distribution across all parallel focal elements.
 
In general, assume X is a vector of multiple normal variables with mean μ and covariance Σ. Then, the multivariate normal distribution can be equivalently represented as a moment matrix:
 
: <math>
M(X) = \left( {\begin{array}{*{20}c}
  \mu  \\
  \Sigma  \\
\end{array}} \right)
</math> .
 
If the distribution is non-degenerate, i.e., Σ has a full rank and its inverse exists, the moment matrix can be fully swept:
 
: <math>
M(\vec X) = \left( {\begin{array}{*{20}c}
  {\mu \Sigma ^{ - 1} }  \\
  { - \Sigma ^{ - 1} }  \\
\end{array}} \right)
</math>
 
Except for normalization constant, the above equation completely determines the normal density function for ''X''. Therefore,  <math>M(\vec X)</math> represents the probability distribution of ''X'' in the potential form.
 
These two simple matrices allow us to represent three special cases of linear belief functions. First, for an ordinary normal probability distribution M(X) represents it. Second, suppose one makes a direct observation on X and obtains a value μ. In this case, since there is no uncertainty, both variance and covariance vanish, i.e., Σ = 0. Thus, a direct observation can be represented as:
 
: <math>M(X) = \left( {\begin{array}{*{20}c}
  \mu  \\
  0  \\
\end{array}} \right)
</math>
 
Third, suppose one is completely ignorant about X. This is a very thorny case in Bayesian statistics since the density function does not exist. By using the fully swept moment matrix, we represent the vacuous linear belief functions as a zero matrix in the swept form follows:
 
: <math>M(\vec X) = \left[ {\begin{array}{*{20}c}
  0  \\
  0  \\
\end{array}} \right]
</math>
 
One way to understand the representation is to imagine complete ignorance as the limiting case when the variance of X approaches to ∞, where one can show that Σ<sup>−1</sup> = 0 and hence <math>M(\vec X)</math> vanishes. However, the above equation is not the same as an improper prior or normal distribution with infinite variance. In fact, it does not correspond to any unique probability distribution. For this reason, a better way is to understand the vacuous linear belief functions as the neutral element for combination (see later).
 
To represent the remaining three special cases, we need the concept of partial sweeping. Unlike a full sweeping, a partial sweeping is a transformation on a subset of variables. Suppose X and Y are two vectors of normal variables with the joint moment matrix:
 
: <math>M(X,Y) = \left[ {\begin{array}{*{20}c}
  {\begin{array}{*{20}c}
  {\mu _1 }  \\
  {\Sigma _{11} }  \\
  {\Sigma _{21} }  \\
\end{array}} & {\begin{array}{*{20}c}
  {\mu _2 }  \\
  {\Sigma _{12} }  \\
  {\Sigma _{22} }  \\
\end{array}}  \\
\end{array}} \right]</math>
 
Then M(X, Y) may be partially swept. For example, we can define the partial sweeping on X as follows:
 
: <math> M(\vec X,Y) = \left[ {\begin{array}{*{20}c}
  {\begin{array}{*{20}c}
  {\mu _1 (\Sigma _{11} )^{ - 1} }  \\
  { - (\Sigma _{11} )^{ - 1} }  \\
  {\Sigma _{21} (\Sigma _{11} )^{ - 1} }  \\
\end{array}} & {\begin{array}{*{20}c}
  {\mu _2  - \mu _1 (\Sigma _{11} )^{ - 1} \Sigma _{12} }  \\
  {(\Sigma _{11} )^{ - 1} \Sigma _{12} }  \\
  {\Sigma _{22}  - \Sigma _{21} (\Sigma _{11} )^{ - 1} \Sigma _{12} }  \\
\end{array}}  \\
\end{array}} \right]
</math>
 
If X is one-dimensional, a partial sweeping replaces the variance of X by its negative inverse and multiplies the inverse with other elements. If X is multidimensional, the operation involves the inverse of the covariance matrix of X and other multiplications. A swept matrix obtained from a partial sweeping on a subset of variables can be equivalently obtained by a sequence of partial sweepings on each individual variable in the subset and the order of the sequence does not matter. Similarly, a fully swept matrix is the result of partial sweepings on all variables.
 
We can make two observations. First, after the partial sweeping on X, the mean vector and covariance matrix of X are respectively <math> \mu _1 (\Sigma _{11} )^{ - 1} </math> and <math> - (\Sigma _{11} )^{ - 1} </math>, which are the same as that of a full sweeping of the marginal moment matrix of X. Thus, the elements corresponding to X in the above partial sweeping equation represent the marginal distribution of X in potential form. Second, according to statistics, <math> \mu _2  - \mu _1 (\Sigma _{11} )^{ - 1} \Sigma _{12} </math>is the conditional mean of Y given X = 0; <math> \Sigma _{22}  - \Sigma _{21} (\Sigma _{11} )^{ - 1} \Sigma _{12} </math>  is the conditional covariance matrix of Y given X = 0; and  <math>(\Sigma _{11} )^{ - 1} \Sigma _{12} </math> is the slope of the regression model of Y on X. Therefore, the elements corresponding to Y indices and the intersection of X and Y in <math> M(\vec X,Y)</math>represents the conditional distribution of Y given X = 0.
 
These semantics render the partial sweeping operation a useful method for manipulating multivariate normal distributions. They also form the basis of the moment matrix representations for the three remaining important cases of linear belief functions, including proper belief functions, linear equations, and linear regression models.
 
=== Proper Linear Belief Functions ===
For variables X and Y, assume there exists a piece of evidence justifying a normal distribution for variables Y while bearing no opinions for variables X. Also, assume that X and Y are not perfectly linearly related, i.e., their correlation is less than 1. This case involves a mix of an ordinary normal distribution for Y and a vacuous belief function for X. Thus, we represent it using a partially swept matrix  as follows:
 
: <math>M(\vec X,Y) = \left[ {\begin{array}{*{20}c}
  {\begin{array}{*{20}c}
  0  \\
  0  \\
  0  \\
\end{array}} & {\begin{array}{*{20}c}
  {\mu _2 }  \\
  0  \\
  {\Sigma _{22} }  \\
\end{array}}  \\
\end{array}} \right]
</math>
 
This is how we could understand the representation. Since we are ignorant on X, we use its swept form and set  <math> \mu _1 (\Sigma _{11} )^{ - 1}
</math> = 0 and <math> - (\Sigma _{11} )^{ - 1}
</math>  = 0. Since the correlation between X and Y is less than 1, the regression coefficient of X on Y approaches to 0 when the variance of X approaches to ∞. Therefore,  <math>(\Sigma _{11} )^{ - 1} \Sigma _{12}
</math> = 0. Similarly, one can prove that <math>\mu _1 (\Sigma _{11} )^{ - 1} \Sigma _{12}
</math>  = 0 and  <math> \Sigma _{21} (\Sigma _{11} )^{ - 1} \Sigma _{12}
</math> = 0.
 
=== Linear Equations ===
Suppose X and Y are two row vectors, and Y = XA + b, where A and b are the coefficient matrices. We represent the equation using a partially swept matrix as follows:
 
: <math>M(\vec X,Y) = \left[ {\begin{array}{*{20}c}
  {\begin{array}{*{20}c}
  0  \\
  0  \\
  {A^T }  \\
\end{array}} & {\begin{array}{*{20}c}
  b  \\
  A  \\
  0  \\
\end{array}} \\
\end{array}} \right]
</math>
 
We can understand the representation based on the fact that a linear equation contains two pieces of knowledge: (1) complete ignorance about all variables; and (2) a degenerate conditional distribution of dependent variables given independent variables. Since X is an independent vector in the equation, we are completely ignorant about it. Thus, <math> \mu _1 (\Sigma _{11} )^{ - 1}
</math> = 0 and <math>
- (\Sigma _{11} )^{ - 1}
</math>  = 0. Given X = 0, Y is completely determined to be b. Thus, the conditional mean of Y is b and the conditional variance is 0. Also, the regression coefficient matrix is A.
 
Note that the knowledge to be represented in linear equations is very close to that in a proper linear belief functions, except that the former assumes a perfect correlation between X and Y while the latter does not. This observation is interesting; it characterizes the difference between partial ignorance and linear equations in one parameter — correlation.
 
=== Linear Regression Models ===
A linear regression model is a more general and interesting case than previous ones. Suppose X and Y are two vectors and Y = XA + b + E, where A and b are the appropriate coefficient matrices and E is an independent white noise satisfying E ~ N(0, Σ). We represent the model as the following partially swept matrix:
 
: <math>
M(\vec X,Y) = \left[ {\begin{array}{*{20}c}
  {\begin{array}{*{20}c}
  0  \\
  0  \\
  {A^T }  \\
\end{array}} & {\begin{array}{*{20}c}
  b  \\
  A  \\
  \Sigma  \\
\end{array}}  \\
\end{array}} \right]
</math>
 
This linear regression model may be considered as the combination of two pieces of knowledge (see later), one is specified by the linear equation involving three variables X, Y, and E, and the other is a simple normal distribution of E, i.e., E ~ N(0, Σ). Alternatively, one may consider it similar to a linear equation, except that, given X = 0, Y is not completely determined to be b. Instead, the conditional mean of Y is b while the conditional variance is Σ. Note that, in this alternative interpretation, a linear regression model forms a basic building block for knowledge representation and is encoded as one moment matrix. Besides, the noise term E does not appear in the representation. Therefore, it makes the representation more efficient.
 
From representing the six special cases, we see a clear advantage of the moment matrix representation, i.e., it allows a unified representation for seemingly diverse types of knowledge, including linear equations, joint and conditional distributions, and ignorance. The unification is significant not only for knowledge representation in artificial intelligence but also for statistical analysis and engineering computation. For example, the representation treats the typical logical and probabilistic components in statistics — observations, distributions, improper priors (for Bayesian statistics), and linear equation models — not as separate concepts, but as manifestations of a single concept. It allows one to see the inner connections between these concepts or manifestations and to interplay them for computational purposes.
 
== Knowledge Operations ==
 
There are two basic operations for making inferences in expert systems using linear belief functions: combination and marginalization. Combination corresponds to the integration of knowledge whereas marginalization corresponds to the coarsening of knowledge. Making an inference involves combining relevant knowledge into a full body of knowledge and then projecting the full body of knowledge to a partial domain, in which an inference question is to be answered.
 
=== Marginalization ===
Marginalization projects a linear belief function into one with fewer variables. Expressed as a moment matrix, it is simply the restriction of a nonswept moment matrix to a submatrix corresponding to the remaining variables. For example, for the joint distribution M(X, Y), its marginal to Y is:
 
: <math>
M^{ \downarrow Y} (X,Y) = \left[ {\begin{array}{*{20}c}
  {\mu _2 }  \\
  {\Sigma _{22} }  \\
\end{array}} \right]
</math>
 
When removing a variable, it is important that the variable has not been swept on in the corresponding moment matrix, i.e., it does not have an arrow sign above the variable. For example, projecting the matrix <math>M(\vec X,Y)</math> to Y produces:
 
: <math> M^{ \downarrow Y} (\vec X,Y) = \left[ {\begin{array}{*{20}c}
  {\mu _2  - \mu _1 (\Sigma _{11} )^{ - 1} \Sigma _{12} }  \\
  {\Sigma _{22}  - \Sigma _{21} (\Sigma _{11} )^{ - 1} \Sigma _{12} }  \\
\end{array}} \right]
</math>
 
which is not the same linear belief function of Y. However, it is easy to see that removing any or all variables in Y from the partially swept matrix will still produce the correct result — a matrix representing the same function for the remaining variables.
 
To remove a variable that has been already swept on, we have to reverse the sweeping using partial or full reverse sweepings. Assume  <math>M(\vec X)</math> is a fully swept moment matrix,
 
: <math>
M(\vec X) = \left( {\begin{array}{*{20}c}
  {\bar \mu }  \\
  {\bar \Sigma }  \\
\end{array}} \right)
</math>
 
Then a full reverse sweeping of <math>M(\vec X)</math> will recover the moment matrix M(X) as follows:
 
: <math>
M(X) = \left( {\begin{array}{*{20}c}
  { - \bar \mu \bar \Sigma ^{ - 1} }  \\
  { - \bar \Sigma ^{ - 1} }  \\
\end{array}} \right)
</math>
 
If a moment matrix is in a partially swept form, say
 
: <math>
M(\vec X,Y) = \left[ {\begin{array}{*{20}c}
  {\begin{array}{*{20}c}
  {\bar \mu _1 }  \\
  {\bar \Sigma _{11} }  \\
  {\bar \Sigma _{21} }  \\
\end{array}} & {\begin{array}{*{20}c}
  {\bar \mu _2 }  \\
  {\bar \Sigma _{12} }  \\
  {\bar \Sigma _{22} }  \\
\end{array}}  \\
\end{array}} \right]
</math>
 
its partially reverse sweeping on X is defined as follows:
 
: <math> M(X,Y) = \left[ {\begin{array}{*{20}c}
  {\begin{array}{*{20}c}
  { - \bar \mu _1 (\bar \Sigma _{11} )^{ - 1} }  \\
  { - (\bar \Sigma _{11} )^{ - 1} }  \\
  { - \bar \Sigma _{21} (\bar \Sigma _{11} )^{ - 1} }  \\
\end{array}} & {\begin{array}{*{20}c}
  {\bar \mu _2  - \bar \mu _1 (\bar \Sigma _{11} )^{ - 1} \bar \Sigma _{12} }  \\
  { - (\bar \Sigma _{11} )^{ - 1} \bar \Sigma _{12} }  \\
  {\bar \Sigma _{22}  - \bar \Sigma _{21} (\bar \Sigma _{11} )^{ - 1} \bar \Sigma _{12} }  \\
\end{array}}  \\
\end{array}} \right]
</math>
 
Reverse sweepings are similar to those of forward ones, except for a sign difference for some multiplications. However, forward and reverse sweepings are opposite operations. It can be easily shown that applying the fully reverse sweeping to <math>M(\vec X)</math>  will recover the initial moment matrix  M(X). It can also be proved that applying a partial reverse sweeping on X to the matrix  <math> M(\vec X,Y)</math> will recover the moment matrix M(X,Y). As a matter of fact, Liu<ref>L. Liu, "Local Computation of Gaussian Belief Functions," ''International Journal of Approximate Reasoning'', vol. 22, pp. 217-248, 1999</ref> proves that a moment matrix will be recovered through a reverse sweeping after a forward sweeping on the same set of variables. It can be also recovered through a forward sweeping after a reverse sweeping. Intuitively, a partial forward sweeping factorizes a joint into a marginal and a conditional, whereas a partial reverse sweeping multiplies them into a joint.
 
=== Combination ===
According to [[Dempster’s rule]], the combination of belief functions may be expressed as the intersection of focal elements and the multiplication of probability density functions. [[Liping Liu]] applies the rule to linear belief functions in particular and obtains a formula of combination in terms of density functions. Later he proves a claim by [[Arthur P. Dempster]] and reexpresses the formula as the sum of two fully swept matrices. Mathematically, assume <math>M_1 (\vec X) = \left( {\begin{array}{*{20}c}
  {\bar \mu _1 }  \\
  {\bar \Sigma _1 }  \\
\end{array}} \right)
</math>  and <math> M_2 (\vec X) = \left( {\begin{array}{*{20}c}
  {\bar \mu _2 }  \\
  {\bar \Sigma _2 }  \\
\end{array}} \right)
</math>  are two LBFs for the same vector of variables X. Then their combination is a fully swept matrix:
 
: <math> M(\vec X) = \left( {\begin{array}{*{20}c}
  {\bar \mu _1  + \bar \mu _2 }  \\
  {\bar \Sigma _1  + \bar \Sigma _2 }  \\
\end{array}} \right)
</math>
 
This above equation is often used for multiplying two normal distributions. Here we use it to define the combination of two linear belief functions, which include normal distributions as a special case. Also, note that a vacuous linear belief function (0 swept matrix) is the neutral element for combination. When applying the equation, we need to consider two special cases. First, if two matrices to be combined have different dimensions, then one or both matrices must be vacuously extended, i.e., assuming ignorance on the variables that are no present in each matrix. For example, if M<sub>1</sub>(X,Y)  and M<sub>2</sub>(X,Z)  are to be combined, we will first extend them into <math> M_1 (X,Y,\vec Z)</math>  and <math> M_2 (X,\vec Y,Z)</math>  respectively such that  <math> M_1 (X,Y,\vec Z)</math> is ignorant about Z and  <math> M_2 (X,\vec Y,Z)</math> is ignorant about Y. The vacuous extension was initially proposed by Kong <ref>A. Kong, "Multivariate belief functions and graphical models," in Department of Statistics. Cambridge, MA: Harvard University, 1986</ref> for discrete belief functions. Second, if a variable has zero variance, it will not permit a sweeping operation. In this case, we can pretend the variance to be an extremely small number, say ε, and perform the desired sweeping and combination. We can then apply a reverse sweeping to the combined matrix on the same variable and let ε approaches 0. Since zero variance means complete certainty about a variable, this ε-procedure will vanish ε terms in the final result.
 
In general, to combine two linear belief functions, their moment matrices must be fully swept. However, one may combine a fully swept matrix with a partially swept one directly if the variables of the former matrix have been all swept on in the later. We can use the linear regression model — Y = XA + b + E — to illustrate the property. As we mentioned, the regression model may be considered as the combination of two pieces of knowledge: one is specified by the linear equation involving three variables X, Y, and E, and the other is a simple normal distribution of E, i.e., E ~ N(0, Σ). Let  <math>M_1 (\vec X,\vec {\rm E},Y) = \left[ {\begin{array}{*{20}c}
  0 & 0 & b  \\
  0 & 0 & A  \\
  0 & 0 & I  \\
  {A^T } & I & 0  \\
\end{array}} \right]
</math> and <math> M_2 (\vec {\rm E}) = \left[ {\begin{array}{*{20}c}
  0  \\
  { - \Sigma ^{ - 1} }  \\
\end{array}} \right]
</math>  be their moment matrices respectively. Then the two matrices can be combined directly without sweeping <math> M_1 (\vec X,\vec {\rm E},Y)
</math> on Y first. The result of the combination is a partially swept matrix as follows:
 
: <math> M(\vec X,\vec {\rm E},Y) = \left[ {\begin{array}{*{20}c}
  0 & 0 & b  \\
  0 & 0 & A  \\
  0 & { - \Sigma ^{ - 1} } & I \\
  {A^T } & I & 0  \\
\end{array}} \right]
</math>
 
If we apply a reverse sweeping on E and then remove E from the matrix, we will obtain the same representation of the regression model.
 
== Applications ==
 
We may use an audit problem to illustrate the three types of variables as follows. Suppose we want to audit the ending balance of accounts receivable (E). As we saw earlier, E is equal to the beginning balance (''B'') plus the sales (S) for the period minus the cash receipts (''C'') on the sales plus a residual (''R'') that represents insignificant sales returns and cash discounts. Thus, we can represent the logical relation as a linear equation:
 
: <math>E=B+S-C+R</math>
Furthermore, if the auditor believes E and B are 100 thousand dollars on the average with a standard deviation 5 and the covariance 15, we can represent the belief as a multivariate normal distribution. If historical data indicate that the residual R is zero on the average with a standard deviation of 0.5 thousand dollars, we can summarize the historical data by normal distribution R ~ N(0, 0.5<sup>2</sup>).  If there is a direct observation on cash receipts, we can represent the evidence as an equation say, C = 50 (thousand dollars). If the auditor knows nothing about the beginning balance of accounts receivable, we can represent his or her ignorance by a vacuous LBF. Finally, if historical data suggests that, given cash receipts C, the sales S is on the average 8C + 4 and has a standard deviation 4 thousand dollars, we can represent the knowledge as a linear regression model S ~ N(4 + 8C, 16).
 
==References==
<references/>
 
{{DEFAULTSORT:Linear Belief Function}}
[[Category:Knowledge representation]]

Latest revision as of 02:01, 4 July 2013

Template:Notability

Linear Belief Function is an extension of the Dempster-Shafer theory of belief functions to the case when variables of interest are continuousTemplate:Disambiguation needed. Examples of such variables include financial asset prices, portfolio performance, and other antecedent and consequent variables. The theory was originally proposed by Arthur P. Dempster[1] in the context of Kalman Filters and later was reelaborated, refined, and applied to knowledge representation in artificial intelligence and decision making in finance and accounting by Liping Liu.[2]

Concept

A linear belief function intends to represent our belief regarding the location of the true value as follows: We are certain that the truth is on a so-called certainty hyperplane but we do not know its exact location; along some dimensions of the certainty hyperplane, we believe the true value could be anywhere from –∞ to +∞ and the probability of being at a particular location is described by a normal distribution; along other dimensions, our knowledge is vacuous, i.e., the true value is somewhere from –∞ to +∞ but the associated probability is unknown. A belief function in general is defined by a mass function over a class of focal elements, which may have nonempty intersections. A linear belief function is a special type of belief function in the sense that its focal elements are exclusive, parallel sub-hyperplanes over the certainty hyperplane and its mass function is a normal distribution across the sub-hyperplanes.

Based on the above geometrical description, Shafer[3] and Liu[4] propose two mathematical representations of a LBF: a wide-sense inner product and a linear functional in the variable space, and as their duals over a hyperplane in the sample space. Monney [5] proposes still another structure called Gaussian hints. Although these representations are mathematically neat, they tend to be unsuitable for knowledge representation in expert systems.

Knowledge Representation

A linear belief function can represent both logical and probabilistic knowledge for three types of variables: deterministic such as an observable or controllable, random whose distribution is normal, and vacuous on which no knowledge bears. Logical knowledge is represented by linear equations, or geometrically, a certainty hyperplane. Probabilistic knowledge is represented by a normal distribution across all parallel focal elements.

In general, assume X is a vector of multiple normal variables with mean μ and covariance Σ. Then, the multivariate normal distribution can be equivalently represented as a moment matrix:

M(X)=(μΣ) .

If the distribution is non-degenerate, i.e., Σ has a full rank and its inverse exists, the moment matrix can be fully swept:

M(X)=(μΣ1Σ1)

Except for normalization constant, the above equation completely determines the normal density function for X. Therefore, M(X) represents the probability distribution of X in the potential form.

These two simple matrices allow us to represent three special cases of linear belief functions. First, for an ordinary normal probability distribution M(X) represents it. Second, suppose one makes a direct observation on X and obtains a value μ. In this case, since there is no uncertainty, both variance and covariance vanish, i.e., Σ = 0. Thus, a direct observation can be represented as:

M(X)=(μ0)

Third, suppose one is completely ignorant about X. This is a very thorny case in Bayesian statistics since the density function does not exist. By using the fully swept moment matrix, we represent the vacuous linear belief functions as a zero matrix in the swept form follows:

M(X)=[00]

One way to understand the representation is to imagine complete ignorance as the limiting case when the variance of X approaches to ∞, where one can show that Σ−1 = 0 and hence M(X) vanishes. However, the above equation is not the same as an improper prior or normal distribution with infinite variance. In fact, it does not correspond to any unique probability distribution. For this reason, a better way is to understand the vacuous linear belief functions as the neutral element for combination (see later).

To represent the remaining three special cases, we need the concept of partial sweeping. Unlike a full sweeping, a partial sweeping is a transformation on a subset of variables. Suppose X and Y are two vectors of normal variables with the joint moment matrix:

M(X,Y)=[μ1Σ11Σ21μ2Σ12Σ22]

Then M(X, Y) may be partially swept. For example, we can define the partial sweeping on X as follows:

M(X,Y)=[μ1(Σ11)1(Σ11)1Σ21(Σ11)1μ2μ1(Σ11)1Σ12(Σ11)1Σ12Σ22Σ21(Σ11)1Σ12]

If X is one-dimensional, a partial sweeping replaces the variance of X by its negative inverse and multiplies the inverse with other elements. If X is multidimensional, the operation involves the inverse of the covariance matrix of X and other multiplications. A swept matrix obtained from a partial sweeping on a subset of variables can be equivalently obtained by a sequence of partial sweepings on each individual variable in the subset and the order of the sequence does not matter. Similarly, a fully swept matrix is the result of partial sweepings on all variables.

We can make two observations. First, after the partial sweeping on X, the mean vector and covariance matrix of X are respectively μ1(Σ11)1 and (Σ11)1, which are the same as that of a full sweeping of the marginal moment matrix of X. Thus, the elements corresponding to X in the above partial sweeping equation represent the marginal distribution of X in potential form. Second, according to statistics, μ2μ1(Σ11)1Σ12is the conditional mean of Y given X = 0; Σ22Σ21(Σ11)1Σ12 is the conditional covariance matrix of Y given X = 0; and (Σ11)1Σ12 is the slope of the regression model of Y on X. Therefore, the elements corresponding to Y indices and the intersection of X and Y in M(X,Y)represents the conditional distribution of Y given X = 0.

These semantics render the partial sweeping operation a useful method for manipulating multivariate normal distributions. They also form the basis of the moment matrix representations for the three remaining important cases of linear belief functions, including proper belief functions, linear equations, and linear regression models.

Proper Linear Belief Functions

For variables X and Y, assume there exists a piece of evidence justifying a normal distribution for variables Y while bearing no opinions for variables X. Also, assume that X and Y are not perfectly linearly related, i.e., their correlation is less than 1. This case involves a mix of an ordinary normal distribution for Y and a vacuous belief function for X. Thus, we represent it using a partially swept matrix as follows:

M(X,Y)=[000μ20Σ22]

This is how we could understand the representation. Since we are ignorant on X, we use its swept form and set μ1(Σ11)1 = 0 and (Σ11)1 = 0. Since the correlation between X and Y is less than 1, the regression coefficient of X on Y approaches to 0 when the variance of X approaches to ∞. Therefore, (Σ11)1Σ12 = 0. Similarly, one can prove that μ1(Σ11)1Σ12 = 0 and Σ21(Σ11)1Σ12 = 0.

Linear Equations

Suppose X and Y are two row vectors, and Y = XA + b, where A and b are the coefficient matrices. We represent the equation using a partially swept matrix as follows:

M(X,Y)=[00ATbA0]

We can understand the representation based on the fact that a linear equation contains two pieces of knowledge: (1) complete ignorance about all variables; and (2) a degenerate conditional distribution of dependent variables given independent variables. Since X is an independent vector in the equation, we are completely ignorant about it. Thus, μ1(Σ11)1 = 0 and (Σ11)1 = 0. Given X = 0, Y is completely determined to be b. Thus, the conditional mean of Y is b and the conditional variance is 0. Also, the regression coefficient matrix is A.

Note that the knowledge to be represented in linear equations is very close to that in a proper linear belief functions, except that the former assumes a perfect correlation between X and Y while the latter does not. This observation is interesting; it characterizes the difference between partial ignorance and linear equations in one parameter — correlation.

Linear Regression Models

A linear regression model is a more general and interesting case than previous ones. Suppose X and Y are two vectors and Y = XA + b + E, where A and b are the appropriate coefficient matrices and E is an independent white noise satisfying E ~ N(0, Σ). We represent the model as the following partially swept matrix:

M(X,Y)=[00ATbAΣ]

This linear regression model may be considered as the combination of two pieces of knowledge (see later), one is specified by the linear equation involving three variables X, Y, and E, and the other is a simple normal distribution of E, i.e., E ~ N(0, Σ). Alternatively, one may consider it similar to a linear equation, except that, given X = 0, Y is not completely determined to be b. Instead, the conditional mean of Y is b while the conditional variance is Σ. Note that, in this alternative interpretation, a linear regression model forms a basic building block for knowledge representation and is encoded as one moment matrix. Besides, the noise term E does not appear in the representation. Therefore, it makes the representation more efficient.

From representing the six special cases, we see a clear advantage of the moment matrix representation, i.e., it allows a unified representation for seemingly diverse types of knowledge, including linear equations, joint and conditional distributions, and ignorance. The unification is significant not only for knowledge representation in artificial intelligence but also for statistical analysis and engineering computation. For example, the representation treats the typical logical and probabilistic components in statistics — observations, distributions, improper priors (for Bayesian statistics), and linear equation models — not as separate concepts, but as manifestations of a single concept. It allows one to see the inner connections between these concepts or manifestations and to interplay them for computational purposes.

Knowledge Operations

There are two basic operations for making inferences in expert systems using linear belief functions: combination and marginalization. Combination corresponds to the integration of knowledge whereas marginalization corresponds to the coarsening of knowledge. Making an inference involves combining relevant knowledge into a full body of knowledge and then projecting the full body of knowledge to a partial domain, in which an inference question is to be answered.

Marginalization

Marginalization projects a linear belief function into one with fewer variables. Expressed as a moment matrix, it is simply the restriction of a nonswept moment matrix to a submatrix corresponding to the remaining variables. For example, for the joint distribution M(X, Y), its marginal to Y is:

MY(X,Y)=[μ2Σ22]

When removing a variable, it is important that the variable has not been swept on in the corresponding moment matrix, i.e., it does not have an arrow sign above the variable. For example, projecting the matrix M(X,Y) to Y produces:

MY(X,Y)=[μ2μ1(Σ11)1Σ12Σ22Σ21(Σ11)1Σ12]

which is not the same linear belief function of Y. However, it is easy to see that removing any or all variables in Y from the partially swept matrix will still produce the correct result — a matrix representing the same function for the remaining variables.

To remove a variable that has been already swept on, we have to reverse the sweeping using partial or full reverse sweepings. Assume M(X) is a fully swept moment matrix,

M(X)=(μ¯Σ¯)

Then a full reverse sweeping of M(X) will recover the moment matrix M(X) as follows:

M(X)=(μ¯Σ¯1Σ¯1)

If a moment matrix is in a partially swept form, say

M(X,Y)=[μ¯1Σ¯11Σ¯21μ¯2Σ¯12Σ¯22]

its partially reverse sweeping on X is defined as follows:

M(X,Y)=[μ¯1(Σ¯11)1(Σ¯11)1Σ¯21(Σ¯11)1μ¯2μ¯1(Σ¯11)1Σ¯12(Σ¯11)1Σ¯12Σ¯22Σ¯21(Σ¯11)1Σ¯12]

Reverse sweepings are similar to those of forward ones, except for a sign difference for some multiplications. However, forward and reverse sweepings are opposite operations. It can be easily shown that applying the fully reverse sweeping to M(X) will recover the initial moment matrix M(X). It can also be proved that applying a partial reverse sweeping on X to the matrix M(X,Y) will recover the moment matrix M(X,Y). As a matter of fact, Liu[6] proves that a moment matrix will be recovered through a reverse sweeping after a forward sweeping on the same set of variables. It can be also recovered through a forward sweeping after a reverse sweeping. Intuitively, a partial forward sweeping factorizes a joint into a marginal and a conditional, whereas a partial reverse sweeping multiplies them into a joint.

Combination

According to Dempster’s rule, the combination of belief functions may be expressed as the intersection of focal elements and the multiplication of probability density functions. Liping Liu applies the rule to linear belief functions in particular and obtains a formula of combination in terms of density functions. Later he proves a claim by Arthur P. Dempster and reexpresses the formula as the sum of two fully swept matrices. Mathematically, assume M1(X)=(μ¯1Σ¯1) and M2(X)=(μ¯2Σ¯2) are two LBFs for the same vector of variables X. Then their combination is a fully swept matrix:

M(X)=(μ¯1+μ¯2Σ¯1+Σ¯2)

This above equation is often used for multiplying two normal distributions. Here we use it to define the combination of two linear belief functions, which include normal distributions as a special case. Also, note that a vacuous linear belief function (0 swept matrix) is the neutral element for combination. When applying the equation, we need to consider two special cases. First, if two matrices to be combined have different dimensions, then one or both matrices must be vacuously extended, i.e., assuming ignorance on the variables that are no present in each matrix. For example, if M1(X,Y) and M2(X,Z) are to be combined, we will first extend them into M1(X,Y,Z) and M2(X,Y,Z) respectively such that M1(X,Y,Z) is ignorant about Z and M2(X,Y,Z) is ignorant about Y. The vacuous extension was initially proposed by Kong [7] for discrete belief functions. Second, if a variable has zero variance, it will not permit a sweeping operation. In this case, we can pretend the variance to be an extremely small number, say ε, and perform the desired sweeping and combination. We can then apply a reverse sweeping to the combined matrix on the same variable and let ε approaches 0. Since zero variance means complete certainty about a variable, this ε-procedure will vanish ε terms in the final result.

In general, to combine two linear belief functions, their moment matrices must be fully swept. However, one may combine a fully swept matrix with a partially swept one directly if the variables of the former matrix have been all swept on in the later. We can use the linear regression model — Y = XA + b + E — to illustrate the property. As we mentioned, the regression model may be considered as the combination of two pieces of knowledge: one is specified by the linear equation involving three variables X, Y, and E, and the other is a simple normal distribution of E, i.e., E ~ N(0, Σ). Let M1(X,E,Y)=[00b00A00IATI0] and M2(E)=[0Σ1] be their moment matrices respectively. Then the two matrices can be combined directly without sweeping M1(X,E,Y) on Y first. The result of the combination is a partially swept matrix as follows:

M(X,E,Y)=[00b00A0Σ1IATI0]

If we apply a reverse sweeping on E and then remove E from the matrix, we will obtain the same representation of the regression model.

Applications

We may use an audit problem to illustrate the three types of variables as follows. Suppose we want to audit the ending balance of accounts receivable (E). As we saw earlier, E is equal to the beginning balance (B) plus the sales (S) for the period minus the cash receipts (C) on the sales plus a residual (R) that represents insignificant sales returns and cash discounts. Thus, we can represent the logical relation as a linear equation:

E=B+SC+R

Furthermore, if the auditor believes E and B are 100 thousand dollars on the average with a standard deviation 5 and the covariance 15, we can represent the belief as a multivariate normal distribution. If historical data indicate that the residual R is zero on the average with a standard deviation of 0.5 thousand dollars, we can summarize the historical data by normal distribution R ~ N(0, 0.52). If there is a direct observation on cash receipts, we can represent the evidence as an equation say, C = 50 (thousand dollars). If the auditor knows nothing about the beginning balance of accounts receivable, we can represent his or her ignorance by a vacuous LBF. Finally, if historical data suggests that, given cash receipts C, the sales S is on the average 8C + 4 and has a standard deviation 4 thousand dollars, we can represent the knowledge as a linear regression model S ~ N(4 + 8C, 16).

References

  1. A. P. Dempster, "Normal belief functions and the Kalman filter," in Data Analysis from Statistical Foundations, A. K. M. E. Saleh, Ed.: Nova Science Publishers, 2001, pp. 65-84.
  2. Liu, Liping, Catherine Shenoy, and Prakash P. Shenoy, “Knowledge Representation and Integration for Portfolio Evaluation Using Linear Belief Functions,” IEEE Transactions on Systems, Man, and Cybernetics, Series A, vol. 36 (4), 2006, pp. 774-785.
  3. G. Shafer, "A note on Dempster's Gaussian belief functions," School of Business, University of Kansas, Lawrence, KS, Technical Report 1992.
  4. L. Liu, "A theory of Gaussian belief functions," International Journal of Approximate Reasoning, vol. 14, pp. 95-126, 1996
  5. P. A. Monney, A Mathematical Theory of Arguments for Statistical Evidence. New York, NY: Springer, 2003.
  6. L. Liu, "Local Computation of Gaussian Belief Functions," International Journal of Approximate Reasoning, vol. 22, pp. 217-248, 1999
  7. A. Kong, "Multivariate belief functions and graphical models," in Department of Statistics. Cambridge, MA: Harvard University, 1986