Find the mode value for a record across 5 variables

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Find the mode value for a record across 5 variables

Snuffy Dog
Dear all,
 
I'm back again looking for more help, given the following data file:
 
DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).

BEGIN DATA

1             45           23           22           45           23

2             28           1000      1000      1000      1000

3             2000      2000      66           .              .

4             33           33           23           45           1000

5             1             1             2             1             1

6             5             5             5             5             5

7             .              3             2             3             2

8             .              .              .              47           14

END DATA.

LIST.

 

I want to work out the mode value for columns V1-V5.  For example the mode for record 1 is 45 and 22.  Any value within columns v1-v5 that does not equal the mode value should then be deleted.  If there are two or more mode values for a record (eg record 1, 7 and 8) then the mode value that occurs first (moving left to right from v1-v5) should be given preference and again any value that does not equal this mode value should be deleted. The resulting datafile should look like this:

 

DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).

BEGIN DATA

1             45             .              .             45          .
2             .             1000      1000      1000      1000

3             2000      2000      .             .              .

4             33           33           .             .              .

5             1             1             .             1             1

6             5             5             5             5             5

7             .              3             .              3             .

8             .              .              .              47           .

END DATA.

LIST.
 
Anybody have an idea how to do this.  I had hoped aggregate would help but spss does not aggregate in a way that lets you find the mode.
 
Kind regards,
 
Jack Thomas
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Find the mode value for a record across 5 variables

Art Kendall
I don't have the time right now to work out finding the mode, since this is a tricky thing.  I'll be back home in a few hours and will look to see if you have received a workaround.

Try FLIPing the file and use FREQUENCIES and OMS.

Or
make a vector of 5 new variables for counts and 5 new variables for the results.
 
vector kounts (5,f2).
vector results(5,f4).
for each variable in v1-v5 create kounts.
loop through kounts to find the modes etc.

missing values results1 to results 5 (lo thru -1).
value labels results1 to results5
   -1 'suppressed'
  -2 'missing on input'.


However, two things to keep in mind
1) good programming rarely writes over input values.  Today's computers make it usually unnecessary to conserve memory and storage that much.
  I rarely gets it right on the first try, so  I keep it possible to go back to the beginning. 
2) Since it is known why the results that are suppressed are "missing", they should be user missing.

Art Kendall
Social Research Consultants

On 12/11/2010 11:41 PM, Snuffy Dog wrote:
Dear all,
 
I'm back again looking for more help, given the following data file:
 
DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).

BEGIN DATA

1             45           23           22           45           23

2             28           1000      1000      1000      1000

3             2000      2000      66           .              .

4             33           33           23           45           1000

5             1             1             2             1             1

6             5             5             5             5             5

7             .              3             2             3             2

8             .              .              .              47           14

END DATA.

LIST.

 

I want to work out the mode value for columns V1-V5.  For example the mode for record 1 is 45 and 22.  Any value within columns v1-v5 that does not equal the mode value should then be deleted.  If there are two or more mode values for a record (eg record 1, 7 and 8) then the mode value that occurs first (moving left to right from v1-v5) should be given preference and again any value that does not equal this mode value should be deleted. The resulting datafile should look like this:

 

DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).

BEGIN DATA

1             45             .              .             45          .
2             .             1000      1000      1000      1000

3             2000      2000      .             .              .

4             33           33           .             .              .

5             1             1             .             1             1

6             5             5             5             5             5

7             .              3             .              3             .

8             .              .              .              47           .

END DATA.

LIST.
 
Anybody have an idea how to do this.  I had hoped aggregate would help but spss does not aggregate in a way that lets you find the mode.
 
Kind regards,
 
Jack Thomas
 
 
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Find the mode value for a record across 5 variables

Jon K Peck
In reply to this post by Snuffy Dog
This problem seems a little weird - why wouldn't you just create a new variable that is the modal value, but this code should do it.

begin program.
def mode(*args):
  counts={}  #key is var value, value is the count
  for v in args:
    if not v is None:
      counts[v] = counts.get(v, 0) + 1
  themax = max([v for k,v in counts.items()])
  modes = [k for k,v in counts.items() if v == themax]
  if len(modes) == 0:
    return
  indexes = sorted([args.index(item) for item in modes])
  themode = args[indexes[0]]
  args = list(args)
  for i in range(len(args)):
    if args[i] != themode:
      args[i] = None
  return args
end program.

spssinc trans result=v1 to v5
/formula mode(v1, v2,v3,v4,v5).


Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Snuffy Dog <[hidden email]>
To:        [hidden email]
Date:        12/11/2010 09:53 PM
Subject:        [SPSSX-L] Find the mode value for a record across 5 variables
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Dear all,
 
I'm back again looking for more help, given the following data file:
 
DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).
BEGIN DATA
1             45           23           22           45           23
2             28           1000      1000      1000      1000
3             2000      2000      66           .              .
4             33           33           23           45           1000
5             1             1             2             1             1
6             5             5             5             5             5
7             .              3             2             3             2
8             .              .              .              47           14
END DATA.
LIST.
 
I want to work out the mode value for columns V1-V5.  For example the mode for record 1 is 45 and 22.  Any value within columns v1-v5 that does not equal the mode value should then be deleted.  If there are two or more mode values for a record (eg record 1, 7 and 8) then the mode value that occurs first (moving left to right from v1-v5) should be given preference and again any value that does not equal this mode value should be deleted. The resulting datafile should look like this:
 
DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).
BEGIN DATA
1             45             .              .             45          .
2             .             1000      1000      1000      1000
3             2000      2000      .             .              .
4             33           33           .             .              .
5             1             1             .             1             1
6             5             5             5             5             5
7             .              3             .              3             .
8             .              .              .              47           .
END DATA.
LIST.
 
Anybody have an idea how to do this.  I had hoped aggregate would help but spss does not aggregate in a way that lets you find the mode.
 
Kind regards,
 
Jack Thomas
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Find the mode value for a record across 5 variables

Jon K Peck
If you want to follow Art's methodology, the following will do the job.  Just add a missing values declaration for 99999 for the new variables.

DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).
BEGIN DATA
1             45           23           22           45           23
2             28           1000      1000      1000      1000
3             2000      2000      66           .              .
4             33           33           23           45           1000
5             1             1             2             1             1
6             5             5             5             5             5
7             .              3             2             3             2
8             .              .              .              47           14
END DATA.

begin program.
def mode(*args):
  counts={}  #key is var value, value is the count
  for v in args:
    if not v is None:
      counts[v] = counts.get(v, 0) + 1
  themax = max([v for k,v in counts.items()])
  modes = [k for k,v in counts.items() if v == themax]
  if len(modes) == 0:
    return len(args) * [99999]
  indexes = sorted([args.index(item) for item in modes])
  themode = args[indexes[0]]
  print themode
  args = list(args)
  for i in range(len(args)):
    if args[i] != themode:
      args[i] = 99999
  return args
end program.

spssinc trans result=w1 to w10
/formula mode(v1, v2,v3,v4,v5).



Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Art Kendall <[hidden email]>
To:        Jon K Peck/Chicago/IBM@IBMUS
Date:        12/12/2010 09:12 AM
Subject:        Re: [SPSSX-L] Find the mode value for a record across 5 variables




Jon
given my soapboxes, could I ask you to repost modifying the syntax to take into account the two programming points I made in my reply.
I don't know whether you agree with them but in my experience writing over variables has frequently led to problems especially for beginners.
Also in my experience the distinction between user and system missing is one of the strength of SPSS.  Occurrence of system missing codes as a debugging aid has been very useful over the years.

P.S. I also find the task a little weird and hope the OP will give more explanation.

Art

On 12/12/2010 10:56 AM, Jon K Peck wrote:

This problem seems a little weird - why wouldn't you just create a new variable that is the modal value, but this code should do it.

begin program.

def mode(*args):

 counts={}  #key is var value, value is the count

 for v in args:

   if not v is None:

     counts[v] = counts.get(v, 0) + 1

 themax = max([v for k,v in counts.items()])

 modes = [k for k,v in counts.items() if v == themax]

 if len(modes) == 0:

   return

 indexes = sorted([args.index(item) for item in modes])

 themode = args[indexes[0]]

 args = list(args)

 for i in range(len(args)):

   if args[i] != themode:

     args[i] = None

 return args

end program.


spssinc trans result=v1 to v5

/formula mode(v1, v2,v3,v4,v5).



Jon Peck
Senior Software Engineer, IBM

peck@...
312-651-3435




From:        
Snuffy Dog <snuffythespssdog@...>
To:        
[hidden email]
Date:        
12/11/2010 09:53 PM
Subject:        
[SPSSX-L] Find the mode value for a record across 5 variables
Sent by:        
"SPSSX(r) Discussion" [hidden email]




Dear all,

 

I'm back again looking for more help, given the following data file:

 

DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).

BEGIN DATA

1             45           23           22           45           23

2             28           1000      1000      1000      1000

3             2000      2000      66           .              .

4             33           33           23           45           1000

5             1             1             2             1             1

6             5             5             5             5             5

7             .              3             2             3             2

8             .              .              .              47           14

END DATA.

LIST.

 
I want to work out the mode value for columns V1-V5.  For example the mode for record 1 is 45 and 22.  Any value within columns v1-v5 that does not equal the mode value should then be deleted.  If there are two or more mode values for a record (eg record 1, 7 and 8) then the mode value that occurs first (moving left to right from v1-v5) should be given preference and again any value that does not equal this mode value should be deleted. The resulting datafile should look like this:

 
DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).

BEGIN DATA

1             45             .              .             45          .

2             .             1000      1000      1000      1000

3             2000      2000      .             .              .

4             33           33           .             .              .

5             1             1             .             1             1

6             5             5             5             5             5

7             .              3             .              3             .
8             .              .              .              47           .
END DATA.

LIST.

 

Anybody have an idea how to do this.  I had hoped aggregate would help but spss does not aggregate in a way that lets you find the mode.

 

Kind regards,

 

Jack Thomas

 
 

Reply | Threaded
Open this post in threaded view
|

Re: Find the mode value for a record across 5 variables

hillel vardi
In reply to this post by Snuffy Dog
Shalom

here is a way to do what you asked for using syntax .

Hillel Vardi
bgu

DATA LIST fixed /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).
BEGIN DATA
   1  45  23  22  45  23
   2  281000100010001000
   320002000  66    
   4  33  33  23  451000
   5   1   1   2   1   1
   6   5   5   5   5   5
   7       3   2   3   2
   8              47  14
END DATA.
sort cases by id .
dataset name orig .
DATASET COPY agg .
DATASET ACTIVATE agg .
VARSTOCASES
  /MAKE val FROM v1 v2 v3 v4 v5
  /INDEX=seq(5)
  /KEEP=ID
  /NULL=KEEP.

AGGREGATE
  /OUTFILE=*
  /BREAK=ID val
  /seq_n=N(val)
  /seq_min=MIN(seq).

AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=ID
  /seq_n_max=MAX(seq_n).
select if   seq_n =seq_n_max .
sort cases by id seq_min .
add files file=* / by id  / first=first.
select if first eq 1.
execute .
match file file=orig/ table=* / by id / keep=id to v5 val.
do repeat vv=v1 to v5.
if vv ne val   vv=$sysmis.
end repeat .
execute .


On 12/12/2010 06:41, Snuffy Dog wrote:
Dear all,
 
I'm back again looking for more help, given the following data file:
 
DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).

BEGIN DATA

1             45           23           22           45           23

2             28           1000      1000      1000      1000

3             2000      2000      66           .              .

4             33           33           23           45           1000

5             1             1             2             1             1

6             5             5             5             5             5

7             .              3             2             3             2

8             .              .              .              47           14

END DATA.

LIST.

 

I want to work out the mode value for columns V1-V5.  For example the mode for record 1 is 45 and 22.  Any value within columns v1-v5 that does not equal the mode value should then be deleted.  If there are two or more mode values for a record (eg record 1, 7 and 8) then the mode value that occurs first (moving left to right from v1-v5) should be given preference and again any value that does not equal this mode value should be deleted. The resulting datafile should look like this:

 

DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).

BEGIN DATA

1             45             .              .             45          .
2             .             1000      1000      1000      1000

3             2000      2000      .             .              .

4             33           33           .             .              .

5             1             1             .             1             1

6             5             5             5             5             5

7             .              3             .              3             .

8             .              .              .              47           .

END DATA.

LIST.
 
Anybody have an idea how to do this.  I had hoped aggregate would help but spss does not aggregate in a way that lets you find the mode.
 
Kind regards,
 
Jack Thomas
 
 

Reply | Threaded
Open this post in threaded view
|

Re: Find the mode value for a record across 5 variables

Albert-Jan Roskam
In reply to this post by Jon K Peck
Hi Jon,

Interesting way of keeping a count. Until recently, I usually did:
counts = {}
for v in args:
    try:
        counts[v] += 1
    except KeyError:
        counts[v] = 0
 
Until I discovered this:

import collections
counts = collections.defaultdict(int)
for v in args:
    counts[v] += 1

But your solution is better in terms of backward compatibility (Python < 2.6).
 
Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



From: Jon K Peck <[hidden email]>
To: [hidden email]
Sent: Sun, December 12, 2010 4:56:12 PM
Subject: Re: [SPSSX-L] Find the mode value for a record across 5 variables

This problem seems a little weird - why wouldn't you just create a new variable that is the modal value, but this code should do it.

begin program.
def mode(*args):
  counts={}  #key is var value, value is the count
  for v in args:
    if not v is None:
      counts[v] = counts.get(v, 0) + 1
  themax = max([v for k,v in counts.items()])
  modes = [k for k,v in counts.items() if v == themax]
  if len(modes) == 0:
    return
  indexes = sorted([args.index(item) for item in modes])
  themode = args[indexes[0]]
  args = list(args)
  for i in range(len(args)):
    if args[i] != themode:
      args[i] = None
  return args
end program.

spssinc trans result=v1 to v5
/formula mode(v1, v2,v3,v4,v5).


Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Snuffy Dog <[hidden email]>
To:        [hidden email]
Date:        12/11/2010 09:53 PM
Subject:        [SPSSX-L] Find the mode value for a record across 5 variables
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Dear all,
 
I'm back again looking for more help, given the following data file:
 
DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).
BEGIN DATA
1             45           23           22           45           23
2             28           1000      1000      1000      1000
3             2000      2000      66           .              .
4             33           33           23           45           1000
5             1             1             2             1             1
6             5             5             5             5             5
7             .              3             2             3             2
8             .              .              .              47           14
END DATA.
LIST.
 
I want to work out the mode value for columns V1-V5.  For example the mode for record 1 is 45 and 22.  Any value within columns v1-v5 that does not equal the mode value should then be deleted.  If there are two or more mode values for a record (eg record 1, 7 and 8) then the mode value that occurs first (moving left to right from v1-v5) should be given preference and again any value that does not equal this mode value should be deleted. The resulting datafile should look like this:
 
DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).
BEGIN DATA
1             45             .              .             45          .
2             .             1000      1000      1000      1000
3             2000      2000      .             .              .
4             33           33           .             .              .
5             1             1             .             1             1
6             5             5             5             5             5
7             .              3             .              3             .
8             .              .              .              47           .
END DATA.
LIST.
 
Anybody have an idea how to do this.  I had hoped aggregate would help but spss does not aggregate in a way that lets you find the mode.
 
Kind regards,
 
Jack Thomas
 
 

Reply | Threaded
Open this post in threaded view
|

Re: Find the mode value for a record across 5 variables

David Marso
Administrator
In reply to this post by Snuffy Dog
Going at it old school ;-)
This works in ALL versions of SPSS (at least back to 11.5).  
---
"Anybody have an idea how to do this.  I had hoped aggregate would help
but spss does not aggregate in a way that lets you find the mode."
----
Sure it does see below ;-)
---
DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).
BEGIN DATA
1 45 23 22 45 23
2 28 1000 1000 1000 1000
3 2000 2000 66 . .
4 33 33 23 45 1000
5 1 1 2 1 1
6 5 5 5 5 5
7 . 3 2 3 2
8 . . . 47 14
END DATA.
SAVE OUTFILE "TMP.SAV".
VARSTOCASES  /MAKE value FROM v1 v2 v3 v4 v5
 /INDEX = Index1(5)
 /KEEP =  id
 /NULL = DROP.

SORT CASES BY ID value .
SAVE OUTFILE "LONG.SAV".

AGGREGATE OUTFILE *
        / PRESORTED
        / BREAK ID value
        / N=N.
MATCH FILES
    / FILE "LONG.SAV"
    / TABLE= *
    / BY ID value .
SORT CASES BY ID (A) N (D) Index1 (A).
MATCH FILES / FILE * / BY ID /FIRST=TOP.
SELECT IF TOP.
MATCH FILES / FILE * / FILE "TMP.SAV" / BY ID.
DO REPEAT V=V1 TO V5.
IF (V NE value ) V=$SYSMIS.
END REPEAT.
MATCH FILES / FILE * / KEEP ID V1 TO V5.
LIST.

Snuffy Dog wrote
Dear all,

I'm back again looking for more help, given the following data file:

DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).

BEGIN DATA

1             45           23           22           45           23

2             28           1000      1000      1000      1000

3             2000      2000      66           .              .

4             33           33           23           45           1000

5             1             1             2             1             1

6             5             5             5             5             5

7             .              3             2             3             2

8             .              .              .              47           14

END DATA.

LIST.



I want to work out the mode value for columns V1-V5.  For example the mode
for record 1 is 45 and 22.  Any value within columns v1-v5 that does not
equal the mode value should then be deleted.  If there are two or more mode
values for a record (eg record 1, 7 and 8) then the mode value that occurs
first (moving left to right from v1-v5) should be given preference and again
any value that does not equal this mode value should be deleted. The
resulting datafile should look like this:



DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).

BEGIN DATA
1             45             .              .             45          .
2             .             1000      1000      1000      1000

3             2000      2000      .             .              .

4             33           33           .             .              .

5             1             1             .             1             1

6             5             5             5             5             5

7             .              3             .              3             .

8             .              .              .              47           .

END DATA.
LIST.

Anybody have an idea how to do this.  I had hoped aggregate would help
but spss does not aggregate in a way that lets you find the mode.

Kind regards,

Jack Thomas
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Find the mode value for a record across 5 variables

Bruce Weaver
Administrator
Very nice, David.  You'll have to get that new-fangled VARSTOCASES out of there if you want it to work for people with REALLY old versions though.  I'm sure there are some version 6 die-hards still kicking around somewhere.  ;-)


David Marso wrote
Going at it old school ;-)
This works in ALL versions of SPSS (at least back to 11.5).  
---
"Anybody have an idea how to do this.  I had hoped aggregate would help
but spss does not aggregate in a way that lets you find the mode."
----
Sure it does see below ;-)
---
DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).
BEGIN DATA
1 45 23 22 45 23
2 28 1000 1000 1000 1000
3 2000 2000 66 . .
4 33 33 23 45 1000
5 1 1 2 1 1
6 5 5 5 5 5
7 . 3 2 3 2
8 . . . 47 14
END DATA.
SAVE OUTFILE "TMP.SAV".
VARSTOCASES  /MAKE value FROM v1 v2 v3 v4 v5
 /INDEX = Index1(5)
 /KEEP =  id
 /NULL = DROP.

SORT CASES BY ID value .
SAVE OUTFILE "LONG.SAV".

AGGREGATE OUTFILE *
        / PRESORTED
        / BREAK ID value
        / N=N.
MATCH FILES
    / FILE "LONG.SAV"
    / TABLE= *
    / BY ID value .
SORT CASES BY ID (A) N (D) Index1 (A).
MATCH FILES / FILE * / BY ID /FIRST=TOP.
SELECT IF TOP.
MATCH FILES / FILE * / FILE "TMP.SAV" / BY ID.
DO REPEAT V=V1 TO V5.
IF (V NE value ) V=$SYSMIS.
END REPEAT.
MATCH FILES / FILE * / KEEP ID V1 TO V5.
LIST.

Snuffy Dog wrote
Dear all,

I'm back again looking for more help, given the following data file:

DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).

BEGIN DATA

1             45           23           22           45           23

2             28           1000      1000      1000      1000

3             2000      2000      66           .              .

4             33           33           23           45           1000

5             1             1             2             1             1

6             5             5             5             5             5

7             .              3             2             3             2

8             .              .              .              47           14

END DATA.

LIST.



I want to work out the mode value for columns V1-V5.  For example the mode
for record 1 is 45 and 22.  Any value within columns v1-v5 that does not
equal the mode value should then be deleted.  If there are two or more mode
values for a record (eg record 1, 7 and 8) then the mode value that occurs
first (moving left to right from v1-v5) should be given preference and again
any value that does not equal this mode value should be deleted. The
resulting datafile should look like this:



DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).

BEGIN DATA
1             45             .              .             45          .
2             .             1000      1000      1000      1000

3             2000      2000      .             .              .

4             33           33           .             .              .

5             1             1             .             1             1

6             5             5             5             5             5

7             .              3             .              3             .

8             .              .              .              47           .

END DATA.
LIST.

Anybody have an idea how to do this.  I had hoped aggregate would help
but spss does not aggregate in a way that lets you find the mode.

Kind regards,

Jack Thomas
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Find the mode value for a record across 5 variables

David Marso
Administrator
VECTOR... LOOP... XSAVE... GET blahblahblah ;-)))
I tend to use the XSAVE rather than VARSTOCASES.
Here's a version for all the fossils ;-)

DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).
BEGIN DATA
1 45 23 22 45 23
2 28 1000 1000 1000 1000
3 2000 2000 66 . .
4 33 33 23 45 1000
5 1 1 2 1 1
6 5 5 5 5 5
7 . 3 2 3 2
8 . . . 47 14
END DATA.
SAVE OUTFILE "TMP.SAV".

VECTOR V=V1 TO V5.
LOOP INDEX1=1 TO 5.
+  COMPUTE VALUE=V(INDEX1).
+  DO IF NOT MISSING(VALUE).
+    XSAVE OUTFILE "LONG.SAV" / KEEP ID INDEX1 VALUE.
+  END IF.
END LOOP.
EXECUTE.
GET FILE "LONG.SAV" .
SORT CASES BY ID VALUE.
XSAVE OUTFILE "LONG.SAV" .

AGGREGATE OUTFILE * / PRESORTED / BREAK ID VALUE  / N=N.
MATCH FILES  / FILE "LONG.SAV" / TABLE= * / BY ID VALUE.
SORT CASES BY ID (A) N (D) Index1 (A).

*SELECT based on $CASENUM or LAGS will frequently bite you.  This however works.
SELECT IF $CASENUM=1 OR ID NE LAG(ID).
MATCH FILES / FILE * / FILE "TMP.SAV" / BY ID.
DO REPEAT V=V1 TO V5.
IF (V NE VALUE) V=$SYSMIS.
END REPEAT.
MATCH FILES / FILE * / KEEP ID V1 TO V5.

list.

<quote author="Bruce Weaver">
Very nice, David.  You'll have to get that new-fangled VARSTOCASES out of there if you want it to work for people with REALLY old versions though.  I'm sure there are some version 6 die-hards still kicking around somewhere.  ;-)


David Marso wrote
Going at it old school ;-)
This works in ALL versions of SPSS (at least back to 11.5).  
---
"Anybody have an idea how to do this.  I had hoped aggregate would help
but spss does not aggregate in a way that lets you find the mode."
----
Sure it does see below ;-)
.... (SNIP)
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Find the mode value for a record across 5 variables

David Marso
Administrator
This post was updated on .
In reply to this post by Snuffy Dog
Here is a solution which doesn't require slaughtering the data file and no data pass.
DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).
BEGIN DATA
1 45 23 22 45 23
2 28 1000 1000 1000 1000
3 2000 2000 66 . .
4 33 33 23 45 1000
5 1 1 2 1 1
6 5 5 5 5 5
7 . 3 2 3 2
8 . . . 47 14
END DATA.
VECTOR V=V1 TO V5 /#C(5)/#VL(5).
COMPUTE #NV=0.
* Build Values Collection and counters *.
LOOP #=1 TO 5.
+  DO IF NOT MISSING(V(#)).
+    COMPUTE #FOUND=0.
+    LOOP ##=1 TO #NV.
*      Check existing value collection (#VL) for current value *.
+      DO IF #VL(##)=V(#).
*        Increment counter for current value and set #FOUND flag* .
+        COMPUTE #C(##)=#C(##)+1.
+        COMPUTE #FOUND=1.
+      END IF.
+    END LOOP IF #FOUND.
*    Current value NOT located so allocate and initialize new item in collection *.
+    DO IF #FOUND=0.
+      COMPUTE #NV=#NV+1.
+      COMPUTE #VL(#NV)=V(#).
+      COMPUTE #C(#NV)=1.
+    END IF.
+  END IF.
END LOOP.

* Determine max count  and set Mode.
COMPUTE #MX=MAX(#C1 TO #C5).
COMPUTE #FOUND=0.
LOOP #=1 TO 5.
+  DO IF #C(#)=#MX.
+    COMPUTE MODE=#VL(#).
+    COMPUTE #FOUND=1.
+  END IF.
END LOOP IF #FOUND.
*ZAP all non mode values *.
LOOP #=1 to 5.
IF V(#) NE MODE V(#)=$SYSMIS.
END LOOP.
LIST.


Snuffy Dog wrote
Dear all,

I'm back again looking for more help, given the following data file:

DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).

BEGIN DATA

1             45           23           22           45           23

2             28           1000      1000      1000      1000

3             2000      2000      66           .              .

4             33           33           23           45           1000

5             1             1             2             1             1

6             5             5             5             5             5

7             .              3             2             3             2

8             .              .              .              47           14

END DATA.

LIST.



I want to work out the mode value for columns V1-V5.  For example the mode
for record 1 is 45 and 22.  Any value within columns v1-v5 that does not
equal the mode value should then be deleted.  If there are two or more mode
values for a record (eg record 1, 7 and 8) then the mode value that occurs
first (moving left to right from v1-v5) should be given preference and again
any value that does not equal this mode value should be deleted. The
resulting datafile should look like this:



DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).

BEGIN DATA
1             45             .              .             45          .
2             .             1000      1000      1000      1000

3             2000      2000      .             .              .

4             33           33           .             .              .

5             1             1             .             1             1

6             5             5             5             5             5

7             .              3             .              3             .

8             .              .              .              47           .

END DATA.
LIST.

Anybody have an idea how to do this.  I had hoped aggregate would help
but spss does not aggregate in a way that lets you find the mode.

Kind regards,

Jack Thomas
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

GPL, parallel coordinates and same axes?

Robert L
I have got this small dataset with values before and after treatment. I have set up a line chart with parallel coordinates, and almost everything works fine. However, the two axes are slightly shifted, i e the tick marks at the right axis are placed a bit lower than their counterparts at the left axis. I want them to be on the same level. Functions tested have included min(), max(), origin(), include(), but nothing seems to work. Any suggestions?

Furthermore, any suggestions if I want to add 1) the mean for each variable on each axis, and 2) possibly even a line between the two means?

GGRAPH
 /GRAPHDATASET NAME="DataSet1" VARIABLES=before after patient
 /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s = userSource(id("DataSet1"))
DATA: before=col(source(s), name("before"))
DATA: after=col(source(s), name("after"))
DATA: patient=col(source(s), name("patient"), unit.category())
TRANS: caseid = index()
COORD: parallel()
SCALE: y1=linear(dim(1))
SCALE: y2=linear(dim(2))
GUIDE: axis(scale(y1), label("Before"))
GUIDE: axis(scale(y2), label("After"))
ELEMENT: line(position(before*after),
              split(caseid), color(patient),
              transparency(transparency."0.5"))
END GPL.

Robert
****************************
Robert Lundqvist
Norrbotten regional council
Lulea
Sweden

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Robert Lundqvist
Reply | Threaded
Open this post in threaded view
|

Re: GPL, parallel coordinates and same axes?

Jon K Peck
Two possibilities come to mind.  For parallel coordinates, sometimes the default chart template interferes with the alignment.  Turn this off in Edit>Options>Charts or via the SET command.  Second, try generating this plot with Graphs>Graphboard Template Chooser.

HTH,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Robert Lundqvist <[hidden email]>
To:        [hidden email]
Date:        12/15/2010 03:13 AM
Subject:        [SPSSX-L] GPL, parallel coordinates and same axes?
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I have got this small dataset with values before and after treatment. I have set up a line chart with parallel coordinates, and almost everything works fine. However, the two axes are slightly shifted, i e the tick marks at the right axis are placed a bit lower than their counterparts at the left axis. I want them to be on the same level. Functions tested have included min(), max(), origin(), include(), but nothing seems to work. Any suggestions?

Furthermore, any suggestions if I want to add 1) the mean for each variable on each axis, and 2) possibly even a line between the two means?

GGRAPH
/GRAPHDATASET NAME="DataSet1" VARIABLES=before after patient
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s = userSource(id("DataSet1"))
DATA: before=col(source(s), name("before"))
DATA: after=col(source(s), name("after"))
DATA: patient=col(source(s), name("patient"), unit.category())
TRANS: caseid = index()
COORD: parallel()
SCALE: y1=linear(dim(1))
SCALE: y2=linear(dim(2))
GUIDE: axis(scale(y1), label("Before"))
GUIDE: axis(scale(y2), label("After"))
ELEMENT: line(position(before*after),
             split(caseid), color(patient),
             transparency(transparency."0.5"))
END GPL.

Robert
****************************
Robert Lundqvist
Norrbotten regional council
Lulea
Sweden

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: GPL, parallel coordinates and same axes?

ViAnn Beadle

I’ve run into the exact same problem—the lower margin for y is set differently for the axes on the right. You can create a chart template that fixes this via the chart editor by changing the properties of the left y axis. This will set the margin for all axes. Save your template and then specify that in the GGRAPH command. Here’s the main set of elements to do it:

 

                <setAxisMargin categorical="false" lowerMargin="5%" role="x" upperMargin="5%"/>

                <setAxisMargin categorical="false" lowerMargin="5%" role="y" upperMargin="5%"/>

 

Look at the template in a text editor and if it doesn’t contain this, just copy it between the start of the <template> tag and the end (</template>).

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon K Peck
Sent: Wednesday, December 15, 2010 7:03 AM
To: [hidden email]
Subject: Re: GPL, parallel coordinates and same axes?

 

Two possibilities come to mind.  For parallel coordinates, sometimes the default chart template interferes with the alignment.  Turn this off in Edit>Options>Charts or via the SET command.  Second, try generating this plot with Graphs>Graphboard Template Chooser.

HTH,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Robert Lundqvist <[hidden email]>
To:        [hidden email]
Date:        12/15/2010 03:13 AM
Subject:        [SPSSX-L] GPL, parallel coordinates and same axes?
Sent by:        "SPSSX(r) Discussion" <[hidden email]>





I have got this small dataset with values before and after treatment. I have set up a line chart with parallel coordinates, and almost everything works fine. However, the two axes are slightly shifted, i e the tick marks at the right axis are placed a bit lower than their counterparts at the left axis. I want them to be on the same level. Functions tested have included min(), max(), origin(), include(), but nothing seems to work. Any suggestions?

Furthermore, any suggestions if I want to add 1) the mean for each variable on each axis, and 2) possibly even a line between the two means?

GGRAPH
/GRAPHDATASET NAME="DataSet1" VARIABLES=before after patient
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s = userSource(id("DataSet1"))
DATA: before=col(source(s), name("before"))
DATA: after=col(source(s), name("after"))
DATA: patient=col(source(s), name("patient"), unit.category())
TRANS: caseid = index()
COORD: parallel()
SCALE: y1=linear(dim(1))
SCALE: y2=linear(dim(2))
GUIDE: axis(scale(y1), label("Before"))
GUIDE: axis(scale(y2), label("After"))
ELEMENT: line(position(before*after),
             split(caseid), color(patient),
             transparency(transparency."0.5"))
END GPL.

Robert
****************************
Robert Lundqvist
Norrbotten regional council
Lulea
Sweden

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Find the mode value for a record across 5 variables

Snuffy Dog
In reply to this post by David Marso
Thank you Hillel, David and Bruce
 
I think you are all brilliant and generous with your ideas and with your help I have been able to make good progress preparing my PhD data for analysis
 
I wish you all a lovely and hopefully relaxing holiday season
 
Jack Thomas

On Wed, Dec 15, 2010 at 12:54 PM, David Marso <[hidden email]> wrote:
Here is a solution which doesn't require slaughtering the data file and no
data pass.
DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).
BEGIN DATA
1 45 23 22 45 23
2 28 1000 1000 1000 1000
3 2000 2000 66 . .
4 33 33 23 45 1000
5 1 1 2 1 1
6 5 5 5 5 5
7 . 3 2 3 2
8 . . . 47 14
END DATA.
VECTOR V=V1 TO V5 /#C(5)/#VL(5).
COMPUTE #NV=0.
LOOP #=1 TO 5.
*  +  COMPUTE #VI=V(#).
+  DO IF NOT MISSING(V(#)).
+    COMPUTE #FOUND=0.
+    LOOP ##=1 TO #NV.
+      DO IF #VL(##)=V(#).
+        COMPUTE #C(##)=#C(##)+1.
+        COMPUTE #FOUND=1.
+      END IF.
+    END LOOP IF #FOUND.
+    DO IF #FOUND=0.
+      COMPUTE #NV=#NV+1.
+      COMPUTE #VL(#NV)=V(#).
+      COMPUTE #C(#NV)=1.
+    END IF.
+  END IF.
END LOOP.
COMPUTE #MX=MAX(#C1 TO #C5).
COMPUTE #FOUND=0.
LOOP #=1 TO 5.
+  DO IF #C(#)=#MX.
+    COMPUTE MODE=#VL(#).
+    COMPUTE #FOUND=1.
END IF.
END LOOP IF #FOUND.
LOOP #=1 to 5.
IF V(#) NE MODE V(#)=$SYSMIS.
END LOOP.
LIST.



Snuffy Dog wrote:
>
> Dear all,
>
> I'm back again looking for more help, given the following data file:
>
> DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).
>
> BEGIN DATA
>
> 1             45           23           22           45           23
>
> 2             28           1000      1000      1000      1000
>
> 3             2000      2000      66           .              .
>
> 4             33           33           23           45           1000
>
> 5             1             1             2             1             1
>
> 6             5             5             5             5             5
>
> 7             .              3             2             3             2
>
> 8             .              .              .              47           14
>
> END DATA.
>
> LIST.
>
>
>
> I want to work out the mode value for columns V1-V5.  For example the mode
> for record 1 is 45 and 22.  Any value within columns v1-v5 that does not
> equal the mode value should then be deleted.  If there are two or more
> mode
> values for a record (eg record 1, 7 and 8) then the mode value that occurs
> first (moving left to right from v1-v5) should be given preference and
> again
> any value that does not equal this mode value should be deleted. The
> resulting datafile should look like this:
>
>
>
> DATA LIST LIST /ID v1 v2 v3 v4 v5(F4.0, F4.0, F4.0, F4.0, F4.0, F4.0).
>
> BEGIN DATA
> 1             45             .              .             45          .
> 2             .             1000      1000      1000      1000
>
> 3             2000      2000      .             .              .
>
> 4             33           33           .             .              .
>
> 5             1             1             .             1             1
>
> 6             5             5             5             5             5
>
> 7             .              3             .              3             .
>
> 8             .              .              .              47           .
>
> END DATA.
> LIST.
>
> Anybody have an idea how to do this.  I had hoped aggregate would help
> but spss does not aggregate in a way that lets you find the mode.
>
> Kind regards,
>
> Jack Thomas
>
>

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Find-the-mode-value-for-a-record-across-5-variables-tp3301962p3305641.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD