Fw: [SPSSX-L] Frequencies for subgroups

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Fw: [SPSSX-L] Frequencies for subgroups

Jon K Peck


Sorting is actually a slow process and may take many data passes, and those data passes have to write intermediate datasets, so splitting up the dataset may actually be faster.  I have not done any timing tests, though.  SPLIT DATASETS relies on XSAVE, so there is some transformation overhead, too.  But the big savings come when you want to use these different splits repeatedly - or just some of them, since you then only pass the data you need.  Sorting and split files has to pass all the data every time, although for a single command, all the splits are calculated on a single pass.

You would never, never use a bubble sort in Statistics: it's just about the slowest sorting algorithm known.  Statistics is already tuned to sort reasonably efficiently, although it has to write scratch files that can get very large sometimes.

As for the URL, as soon as the lawyers clear it, we will turn on the new group on myDeveloperWorks.  It actually has a short, friendly url, which I will announce once the group goes live.  DevCentral will have a permanent redirect, though.


Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435



From: Albert-Jan Roskam <[hidden email]>
To: Jon K Peck/Chicago/IBM@IBMUS, [hidden email]
Date: 08/31/2010 12:39 PM
Subject: Re: [SPSSX-L] Frequencies for subgroups





Hi Jon,

Isn't it very slow to write all those datasets? Or is it faster than sorting and then using SPLIT FILE? Related to this: is it possible to use different sorting algorithms in SPSS (bubble or quick sort, etc)?

 
Btw, so www.spss.com/devcentral will remain a valid link? I was already worried I had to memorize that very long ibm link.
Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




From: Jon K Peck <[hidden email]>
To:
[hidden email]
Sent:
Tue, August 31, 2010 6:48:53 PM
Subject:
Re: [SPSSX-L] Frequencies for subgroups



One possibility would be to use the SPSSINC SPLIT DATASET and SPSSINC PROCESS FILES extension commands.  The first one creates a dataset for each split value, and the second processes each of those datasets.


Sorting is not required for the split: the command does one data pass to determine the distribution of the split values and a second to construct the separate files.  SPSSINC PROCESS FILES can then process each of these datasets in turn using a file list constructed by the first command.  However, disk space will be required to hold the various split datasets, which could be an issue for very large files.  


Using this structure you can get the output from multiple commands together for each split in contrast to what the SPLIT FILES command requires.  There are options to produce a single Viewer file across the datasets or to produce separate files.  I wrote about this on my blog, insideout.spss.com.


These extension commands require the Python programmability plugin and can be downloaded from SPSS Developer Central (
www.spss.com/devcentral).

HTH,


Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435


From: Mark Vande Kamp <[hidden email]>
To: [hidden email]
Date: 08/31/2010 10:31 AM
Subject: [SPSSX-L] Frequencies for subgroups
Sent by: "SPSSX(r) Discussion" <[hidden email]>






I am currently using the code below to get frequency outputs for specified subgroups

TEMPORARY.
SELECT IF (Month1stFIC = 1).
FREQUENCIES Sessions.
TEMPORARY.
SELECT IF (Month1stFIC = 2).
FREQUENCIES Sessions.
TEMPORARY.
SELECT IF (Month1stFIC = 3).
FREQUENCIES Sessions.

I'd like to make the syntax work for added months without repeating the blocks of code as above.

If I could avoid the SplitFile command (and its required sort) that would be good too -- the sort takes forever because it's a huge file.

Is there another way?

Mark

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD