Next: RANK, Previous: ONEWAY, Up: Statistics [Contents][Index]

QUICK CLUSTERvar_list[/CRITERIA=CLUSTERS(k) [MXITER(max_iter)] CONVERGE(epsilon) [NOINITIAL]] [/MISSING={EXCLUDE,INCLUDE} {LISTWISE, PAIRWISE}] [/PRINT={INITIAL} {CLUSTER}] [/SAVE[=[CLUSTER[(membership_var)]] [DISTANCE[(distance_var)]]]

The `QUICK CLUSTER`

command performs k-means clustering on the
dataset. This is useful when you wish to allocate cases into clusters
of similar values and you already know the number of clusters.

The minimum specification is ‘`QUICK CLUSTER`’ followed by the names
of the variables which contain the cluster data. Normally you will also
want to specify `/CRITERIA=CLUSTERS(`

where `k`)`k` is the
number of clusters. If this is not specified, then `k` defaults to 2.

If you use `/CRITERIA=NOINITIAL`

then a naive algorithm to select
the initial clusters is used. This will provide for faster execution but
less well separated initial clusters and hence possibly an inferior final
result.

`QUICK CLUSTER`

uses an iterative algorithm to select the clusters centers.
The subcommand `/CRITERIA=MXITER(`

sets the maximum number of iterations.
During classification, PSPP will continue iterating until until `max_iter`)`max_iter`
iterations have been done or the convergence criterion (see below) is fulfilled.
The default value of `max_iter` is 2.

If however, you specify `/CRITERIA=NOUPDATE`

then after selecting the initial centers,
no further update to the cluster centers is done. In this case, `max_iter`, if specified.
is ignored.

The subcommand `/CRITERIA=CONVERGE(`

is used
to set the convergence criterion. The value of convergence criterion is `epsilon`)`epsilon`
times the minimum distance between the *initial* cluster centers. Iteration stops when
the mean cluster distance between one iteration and the next
is less than the convergence criterion. The default value of `epsilon` is zero.

The `MISSING`

subcommand determines the handling of missing variables.
If `INCLUDE`

is set, then user-missing values are considered at their face
value and not as missing values.
If `EXCLUDE`

is set, which is the default, user-missing
values are excluded as well as system-missing values.

If `LISTWISE`

is set, then the entire case is excluded from the analysis
whenever any of the clustering variables contains a missing value.
If `PAIRWISE`

is set, then a case is considered missing only if all the
clustering variables contain missing values. Otherwise it is clustered
on the basis of the non-missing values.
The default is `LISTWISE`

.

The `PRINT`

subcommand requests additional output to be printed.
If `INITIAL`

is set, then the initial cluster memberships will
be printed.
If `CLUSTER`

is set, the cluster memberships of the individual
cases will be displayed (potentially generating lengthy output).

You can specify the subcommand `SAVE`

to ask that each case’s cluster membership
and the euclidean distance between the case and its cluster center be saved to
a new variable in the active dataset. To save the cluster membership use the
`CLUSTER`

keyword and to save the distance use the `DISTANCE`

keyword.
Each keyword may optionally be followed by a variable name in parentheses to specify
the new variable which is to contain the saved parameter. If no variable name is specified,
then PSPP will create one.

Next: RANK, Previous: ONEWAY, Up: Statistics [Contents][Index]