Maximum number of cases/variables in DISTANCE procedure

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Maximum number of cases/variables in DISTANCE procedure

Luís Faísca

Can anyone kindly inform me about the maximum number of cases or variables that can be used in SPSS DISTANCES procedure (the maximum size allowed in SPSS for a distance/similarity matrix)?

Thank you very much

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Maximum number of cases/variables in DISTANCE procedure

David Marso
Administrator
Probably depends upon amount of available memory.
Just try it and see what happens!
--
Luís Faísca wrote
Can anyone kindly inform me about the maximum number of cases or variables
that can be used in SPSS DISTANCES procedure (the maximum size allowed in
SPSS for a distance/similarity matrix)?

Thank you very much
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Maximum number of cases/variables in DISTANCE procedure

Art Kendall
In reply to this post by Luís Faísca
A David said that it would depend on the available memory (mostly RAM although if it went virtual that could really eat time).

What size data matrix were you thinking of?

You can fix the simulation below to match what you are interested in. by changing these three lines.
   vector x (25,f3).
   loop id = 1 to 3500.
      loop #p = 1 to 25.


I had this time and memory from the syntax below.
Processor Time    00:00:01.66
Elapsed Time    00:00:01.69
Workspace Bytes    49742400

new file.
input program.
   vector x (25,f3).
   loop id = 1 to 3500.
      loop #p = 1 to 25.
         compute x(#p) = rnd(rv.normal(50,10)).
      end loop.
      end case.
   end loop.
   end file.
end input program.
string casename (a4).
compute casename = string(Id,N4).
DO IF $CASENUM=1.
   PRINT /"'start generation'"   $time (time20.3).
END IF.
execute.
DO IF $CASENUM=1.
   PRINT /"'start Proximities 1'"   $time (time20.3).
END IF.
dataset declare DistMatrix.
PROXIMITIES x1 to x25
  /ID=casename
  /VIEW=CASE
  /MEASURE=EUCLID
  /STANDARDIZE=NONE
  /print = none
  /matrix = OUT (DistMatrix).
DO IF $CASENUM=1.
   PRINT /"'start descriptives'"   $time (time20.3).
END IF.
descriptives variables = x1  /statistics=all.


Art Kendall
Social Research Consultants
On 1/12/2013 12:25 AM, Luís Faísca wrote:

Can anyone kindly inform me about the maximum number of cases or variables that can be used in SPSS DISTANCES procedure (the maximum size allowed in SPSS for a distance/similarity matrix)?

Thank you very much

 

 


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Maximum number of cases/variables in DISTANCE procedure

David Marso
Administrator
This post was updated on .
FWIW:  Your exact numbers will vary as I have an OLD system.
Take several reasonable values for N and use Art's IP /Prox prog and extract the workspace values.
Use curvefit or regression to estimate a quadratic.
use COMPUTE to calculate any value (CF also will impute those).
DATA LIST LIST / N M .
BEGIN DATA
3500 4972830
4500 8193630
5500 12214430
6500 17035230
10000 40208030
15000 .
30000 .
END DATA.
CURVEFIT /VARIABLES=m  WITH n
  /CONSTANT
  /MODEL=QUADRATIC
  /PLOT FIT.

COMPUTE Est=30.0000 + 20.8 * N + .4 * N**2 .
FORMAT est (F12.0).
LIST.

       N        M          EST

 3500.00  4972830      4972830
 4500.00  8193630      8193630
 5500.00 12214430     12214430
 6500.00 17035230     17035230
10000.00 40208030     40208030
15000.00      .       90312030
30000.00      .      360624030


Art Kendall wrote
A David said that it would depend on the available memory (mostly
          RAM although if it went virtual that could really eat time).
     
      What size data matrix were you thinking of?
     
      You can fix the simulation below to match what you are interested
      in. by changing these three lines.
         vector x (25,f3).
         loop id = 1 to 3500.
            loop #p = 1 to 25.
     
     
      I had this time and memory from the syntax below.
      Processor Time    00:00:01.66
      Elapsed Time    00:00:01.69
      Workspace Bytes    49742400
     
      new file.
      input program.
         vector x (25,f3).
         loop id = 1 to 3500.
            loop #p = 1 to 25.
               compute x(#p) = rnd(rv.normal(50,10)).
            end loop.
            end case.
         end loop.
         end file.
      end input program.
      string casename (a4).
      compute casename = string(Id,N4).
      DO IF $CASENUM=1.
         PRINT /"'start generation'"   $time (time20.3).
      END IF.
      execute.
      DO IF $CASENUM=1.
         PRINT /"'start Proximities 1'"   $time (time20.3).
      END IF.
      dataset declare DistMatrix.
      PROXIMITIES x1 to x25
        /ID=casename
        /VIEW=CASE
        /MEASURE=EUCLID
        /STANDARDIZE=NONE
        /print = none
        /matrix = OUT (DistMatrix).
      DO IF $CASENUM=1.
         PRINT /"'start descriptives'"   $time (time20.3).
      END IF.
      descriptives variables = x1  /statistics=all.
     
     
      Art Kendall
Social Research Consultants
      On 1/12/2013 12:25 AM, Luís Faísca wrote:
   
   
     
     
     
     
        Can anyone kindly inform
            me about the maximum number of cases or variables that can
            be used in SPSS DISTANCES procedure (the maximum size
            allowed in SPSS for a distance/similarity matrix)?
        Thank you very much
         
         
     
   
   
 


=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"