> ')&q`bjbjqPqP.::
$Eh4
4
I
&
&
&
v
&
&
&
92%8 :
_
0
,3 X33( &
4
4
$
DATA MINING FINAL TEST
Examples of questions
Clustroid is the point of the cluster:
that maximizes the sum of squares of distances to other points in the cluster,
which is not necessary an element of the dataset,
which is the mean of the cluster,
that minimizes the sum of squares of distances to other points on the dataset.
During construction of the decision tree is chosen the test with
minimal entropy,
maximal entropy,
minimal gain of information,
minimal number of outcomes.
Minimal support has been specified to 40 %. Maximal frequent sets in the following dataset{ {a,b,c}, {b,c}, {a,b}, {a,b} , {c} } are:
{a,b}, {c}
{a,b}, {a}, {b}, {c}
{a,b}. {a}, {b}
{a}, {b}
In the agglomerative clustering algorithm, at each step we select:
2 nearest clusters
2 farest clusters
2 nearest cases,
One case and the nearest cluster.
For a fixed set X, if the entropy E1 of the test t1 is smaller than entropy E2 of test t2
then the gains of information g1, g2 for t1 and t2 satisfy the following condition:
g1>g2
g1>=g2
g1