Categorical Variables in K Nearest Neighbours

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Categorical Variables in K Nearest Neighbours

Atai Winkler

Hi

 

I am going through the k-Nearest Neighbours example in the Case Studies section of (Help/Topics/Case Studies/Statistics Base Edition/ Nearest Neighbor Analysis/.

 

I can reproduce the results in the example but when I add a categorical variable to the list of variables (type in BY type) as in the syntax below.

 

NOTE: type is ordinal in the file as supplied but I changed it to nominal before running the syntax.

 

DATASET DECLARE NNs_3_with_type.

*Nearest Neighbor Analysis.

KNN BY type  WITH price engine_s horsepow wheelbas width length curb_wgt fuel_cap mpg

  /FOCALCASES VARIABLE=focal

  /CASELABELS VARIABLE=model

  /RESCALE COVARIATE=ADJNORMALIZED

  /MODEL NEIGHBORS=FIXED(K=3) METRIC=EUCLID FEATURES=ALL

  /CRITERIA WEIGHTFEATURES=NO

  /PARTITION  VARIABLE=partition

  /PRINT CPS

  /VIEWMODEL DISPLAY=YES

  /OUTFILE FOCALCASES='NNs_3_with_type'

  /MISSING USERMISSING=EXCLUDE.

 

 

My question is: why is the output dataset (NNs_3_with_type) empty? It has the correct field names but there are no records – there should be two records, one for each focal record.

 

 

Thank you in advance for your help.

 

 

Dr Atai Winkler

PAM Analytics

[hidden email]

 

 

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Categorical Variables in K Nearest Neighbours

Jon Peck
You could get that result if none of your focal variable values are positive.

On Tue, Aug 27, 2019 at 6:32 AM Atai Winkler <[hidden email]> wrote:

Hi

 

I am going through the k-Nearest Neighbours example in the Case Studies section of (Help/Topics/Case Studies/Statistics Base Edition/ Nearest Neighbor Analysis/.

 

I can reproduce the results in the example but when I add a categorical variable to the list of variables (type in BY type) as in the syntax below.

 

NOTE: type is ordinal in the file as supplied but I changed it to nominal before running the syntax.

 

DATASET DECLARE NNs_3_with_type.

*Nearest Neighbor Analysis.

KNN BY type  WITH price engine_s horsepow wheelbas width length curb_wgt fuel_cap mpg

  /FOCALCASES VARIABLE=focal

  /CASELABELS VARIABLE=model

  /RESCALE COVARIATE=ADJNORMALIZED

  /MODEL NEIGHBORS=FIXED(K=3) METRIC=EUCLID FEATURES=ALL

  /CRITERIA WEIGHTFEATURES=NO

  /PARTITION  VARIABLE=partition

  /PRINT CPS

  /VIEWMODEL DISPLAY=YES

  /OUTFILE FOCALCASES='NNs_3_with_type'

  /MISSING USERMISSING=EXCLUDE.

 

 

My question is: why is the output dataset (NNs_3_with_type) empty? It has the correct field names but there are no records – there should be two records, one for each focal record.

 

 

Thank you in advance for your help.

 

 

Dr Atai Winkler

PAM Analytics

[hidden email]

 

 

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Categorical Variables in K Nearest Neighbours

Atai Winkler

Thank you Jon.

 

I understand your reply and will implement it tomorrow and let you know how I get on.

 

Atai

 

 

From: Jon Peck <[hidden email]>
Sent: 27 August 2019 17:23
To: Atai Winkler <[hidden email]>
Cc: SPSS List <[hidden email]>
Subject: Re: [SPSSX-L] Categorical Variables in K Nearest Neighbours

 

You could get that result if none of your focal variable values are positive.

 

On Tue, Aug 27, 2019 at 6:32 AM Atai Winkler <[hidden email]> wrote:

Hi

 

I am going through the k-Nearest Neighbours example in the Case Studies section of (Help/Topics/Case Studies/Statistics Base Edition/ Nearest Neighbor Analysis/.

 

I can reproduce the results in the example but when I add a categorical variable to the list of variables (type in BY type) as in the syntax below.

 

NOTE: type is ordinal in the file as supplied but I changed it to nominal before running the syntax.

 

DATASET DECLARE NNs_3_with_type.

*Nearest Neighbor Analysis.

KNN BY type  WITH price engine_s horsepow wheelbas width length curb_wgt fuel_cap mpg

  /FOCALCASES VARIABLE=focal

  /CASELABELS VARIABLE=model

  /RESCALE COVARIATE=ADJNORMALIZED

  /MODEL NEIGHBORS=FIXED(K=3) METRIC=EUCLID FEATURES=ALL

  /CRITERIA WEIGHTFEATURES=NO

  /PARTITION  VARIABLE=partition

  /PRINT CPS

  /VIEWMODEL DISPLAY=YES

  /OUTFILE FOCALCASES='NNs_3_with_type'

  /MISSING USERMISSING=EXCLUDE.

 

 

My question is: why is the output dataset (NNs_3_with_type) empty? It has the correct field names but there are no records – there should be two records, one for each focal record.

 

 

Thank you in advance for your help.

 

 

Dr Atai Winkler

PAM Analytics

[hidden email]

 

 

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Categorical Variables in K Nearest Neighbours

Atai Winkler
In reply to this post by Jon Peck

Hi Jon

 

I have implemented your suggestion.

 

1.            The procedure works as expected when there aren’t any categorical variables.

2.            When I introduce a categorical variable, for example type, the procedure fails whatever the value of partition (focal is always positive as you said and as in the syntax below).

 

** File car_sales.sav in example files.

 

ALTER TYPE type (A12).

 

DO IF ANY(model, 'new_car', 'new_truck').

COMPUTE focal = 1.

COMPUTE partition = 0.

ELSE.

COMPUTE focal = 2.

COMPUTE partition = 1.

END IF.

EXECUTE.

 

 

*Nearest Neighbor Analysis.

 

DATASET DECLARE NNs_3_with_type.

*Nearest Neighbor Analysis.

KNN BY type  WITH price engine_s horsepow wheelbas width length curb_wgt fuel_cap mpg

  /FOCALCASES VARIABLE=focal

  /CASELABELS VARIABLE=model

  /RESCALE COVARIATE=ADJNORMALIZED

  /MODEL NEIGHBORS=FIXED(K=3) METRIC=EUCLID FEATURES=ALL

  /CRITERIA WEIGHTFEATURES=NO

  /PARTITION  VARIABLE=partition

  /PRINT CPS

  /VIEWMODEL DISPLAY=YES

  /OUTFILE FOCALCASES='NNs_3_with_type'

  /MISSING USERMISSING=EXCLUDE.

 

 

How are categorical variables to be used in the procedure?

 

Thank you.

 

Atai

 

[hidden email]

 

 

 

From: Jon Peck <[hidden email]>
Sent: 27 August 2019 17:23
To: Atai Winkler <[hidden email]>
Cc: SPSS List <[hidden email]>
Subject: Re: [SPSSX-L] Categorical Variables in K Nearest Neighbours

 

You could get that result if none of your focal variable values are positive.

 

On Tue, Aug 27, 2019 at 6:32 AM Atai Winkler <[hidden email]> wrote:

Hi

 

I am going through the k-Nearest Neighbours example in the Case Studies section of (Help/Topics/Case Studies/Statistics Base Edition/ Nearest Neighbor Analysis/.

 

I can reproduce the results in the example but when I add a categorical variable to the list of variables (type in BY type) as in the syntax below.

 

NOTE: type is ordinal in the file as supplied but I changed it to nominal before running the syntax.

 

DATASET DECLARE NNs_3_with_type.

*Nearest Neighbor Analysis.

KNN BY type  WITH price engine_s horsepow wheelbas width length curb_wgt fuel_cap mpg

  /FOCALCASES VARIABLE=focal

  /CASELABELS VARIABLE=model

  /RESCALE COVARIATE=ADJNORMALIZED

  /MODEL NEIGHBORS=FIXED(K=3) METRIC=EUCLID FEATURES=ALL

  /CRITERIA WEIGHTFEATURES=NO

  /PARTITION  VARIABLE=partition

  /PRINT CPS

  /VIEWMODEL DISPLAY=YES

  /OUTFILE FOCALCASES='NNs_3_with_type'

  /MISSING USERMISSING=EXCLUDE.

 

 

My question is: why is the output dataset (NNs_3_with_type) empty? It has the correct field names but there are no records – there should be two records, one for each focal record.

 

 

Thank you in advance for your help.

 

 

Dr Atai Winkler

PAM Analytics

[hidden email]

 

 

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD