Ordering of Variables

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Ordering of Variables

Jean Hanson
Hello Listers,
Is it possible through syntax to sort the variables in a particular order? If so, could someone point me in the right direction?
Thank you,
Jean Hanson
Reply | Threaded
Open this post in threaded view
|

Re: Ordering of Variables

Richard Ristow
At 10:18 PM 8/23/2007, Jean Hanson wrote:

>Is it possible through syntax to sort the variables in a particular
>order? If so, could someone point me in the right direction?

It takes a pass through the data, but you can do it with ADD FILES:

ADD FILES
    /FILE=*
    /KEEP= <and list of variables> ALL.

Your new file will have "<and list of variables>" at the beginning, in
the order you listed them; keyword ALL means that variables not on the
list will follow, in their original relative order. (It's always a good
idea to end with ALL, even if you think you have listed all the
variables, so you don't lose any variables by mistake.)
Reply | Threaded
Open this post in threaded view
|

AW: Re: Ordering of Variables

Georg Maubach
Dear Listers,

The ordering of variables is a common task. We have developed a macro - based on the know-how of Raynald Levesque's Data Management Book - that can identify the variables that are present in a dataset and sorts them according to their type (numeric or string).

We share this routine with you as a thank you for the help we got from this list during the last three years.

If you have any questions please feel free to contact us over the list.

Best regards

Georg Maubach
Research Manager

---- cut ----
/*--------------------------------------------------------*/.
/* sortvar.mac (SORT VARiable List)
/* -------------------------------------------------------*/.
/* Author   : Georg Maubach
/* Date     : 04-May-2005
/* Status   : released
/* Version  : 3
/* Release  : 1.0
/*--------------------------------------------------------*/.
/* Source System : SPSS (Win), Rel.13.0.1 (20 Nov 2004) .
/* Target System : SPSS (Win), Rel.12 .
/*--------------------------------------------------------*/.
/* Parameter:
/* pinctmp  (quoted)   = path for include file
/* finctmp  (quoted)   = name of include file
/*                       containing the macro
/*                       definition
/* pdattmp  (quoted)   = path for data set to
/*                       be saved temporarily
/* fdattmp  (quoted)   = file name for temporarily
/*                       saved data set
/* lstname  (unquoted) = name of macro to contain variable
/*                       names
/* varlst   (unquoted) = list of variable to be sorted
/*--------------------------------------------------------*/.
/* Result: Variables sorted alphabetical in data set.
/*--------------------------------------------------------*/.
/* This macro sorts the variables alphabetical in the data
/* set.
/*--------------------------------------------------------*/.
/* Notes:
/* The list of variables can be given explicit like
/* 'var1 var2 var3' or as a range using TO like
/* 'var1 TO var3'. TO can be written in lower or upper case.
/*--------1---------2---------3---------4---------5-------*/.

DEFINE @sortvar (pdattmp  !TOKENS(1) !DEFAULT('c:\temp\')
                /fdattmp  !TOKENS(1) !DEFAULT('tmp.sav')
                /pinctmp  !TOKENS(1) !DEFAULT('c:\temp\')
                /finctmp  !TOKENS(1) !DEFAULT('dynamic.inc')
                /mode     !TOKENS(1) !DEFAULT(ascending)
                /varlst   !CMDEND    !DEFAULT(ALL)) .

/* @trace '*===== [ Begin @sortvar ] =====*' . */ .

* Save current data set to temporary file.
SAVE OUTFILE = !QUOTE(!CONCAT(!UNQUOTE(!EVAL(!pdattmp)),!UNQUOTE(!EVAL(!fdattmp)))).

* Keep only variables to be sorted.
!IF (!INDEX(!UPCASE(!varlst),' TO ') > 0) !THEN
!LET !firstv = !HEAD(!varlst).
!LET !lastv  = !TAIL(!varlst).
!LET !lastv  = !TAIL(!lastv).
MATCH FILES FILE=* /KEEP !firstv TO !lastv.
!ELSE
MATCH FILES FILE=* /KEEP !varlst.
!IFEND.

* Transpose the data set to get the variable names.
N OF CASES 1.
FLIP.

* Mark first und last variable in data set.
COMPUTE nobreak=1.
MATCH FILES FILE=*
 /BY nobreak
 /FIRST=first
 /LAST=last.

* Create a data set derived from the list of variables.
DO IF first.
WRITE OUTFILE = !QUOTE(!CONCAT(!UNQUOTE(!EVAL(!pinctmp)), !UNQUOTE(!EVAL(!finctmp))))   / 'DATA LIST FIXED /variable (A64).'.
WRITE OUTFILE = !QUOTE(!CONCAT(!UNQUOTE(!EVAL(!pinctmp)), !UNQUOTE(!EVAL(!finctmp))))   / 'BEGIN DATA'.
END IF.
WRITE OUTFILE = !QUOTE(!CONCAT(!UNQUOTE(!EVAL(!pinctmp)), !UNQUOTE(!EVAL(!finctmp))))   / case_lbl.
DO IF last.
WRITE OUTFILE = !QUOTE(!CONCAT(!UNQUOTE(!EVAL(!pinctmp)), !UNQUOTE(!EVAL(!finctmp))))   / 'END DATA.'.
WRITE OUTFILE = !QUOTE(!CONCAT(!UNQUOTE(!EVAL(!pinctmp)), !UNQUOTE(!EVAL(!finctmp))))   / 'EXECUTE.'.
END IF.
EXECUTE.

* Call include file to create data set of variable list.
INCLUDE FILE = !QUOTE(!CONCAT(!UNQUOTE(!EVAL(!pinctmp)),!UNQUOTE(!EVAL(!finctmp)))).

* Sort the variables in the data set.
!IF (!UPCASE(!mode) = 'ASCENDING'  !OR !UPCASE(!mode) = 'A')  !THEN SORT CASES BY variable (A) !IFEND.
!IF (!UPCASE(!mode) = 'DESCENDING' !OR !UPCASE(!mode) = 'D') !THEN SORT CASES BY variable (D) !IFEND.

* Mark first und last variable in data set.
COMPUTE nobreak=1.
MATCH FILES FILE=*
 /BY nobreak
 /FIRST=first
 /LAST=last.

* Write macro definition for sorted variable list in include file.
DO IF first.
- WRITE OUTFILE = !QUOTE(!CONCAT(!UNQUOTE(!EVAL(!pinctmp)), !UNQUOTE(!EVAL(!finctmp)))) / 'DEFINE @tmpmac ()' .
END IF.
WRITE OUTFILE = !QUOTE(!CONCAT(!UNQUOTE(!EVAL(!pinctmp)), !UNQUOTE(!EVAL(!finctmp))))   / " "variable.
DO IF last.
- WRITE OUTFILE = !QUOTE(!CONCAT(!UNQUOTE(!EVAL(!pinctmp)), !UNQUOTE(!EVAL(!finctmp)))) / '!ENDDEFINE.'.
END IF.
EXECUTE.

* Call include file to call the macro and define the list.
INCLUDE FILE = !QUOTE(!CONCAT(!UNQUOTE(!EVAL(!pinctmp)),!UNQUOTE(!EVAL(!finctmp)))).

* Restore data set.
GET FILE = !QUOTE(!CONCAT(!UNQUOTE(!EVAL(!pdattmp)),!UNQUOTE(!EVAL(!fdattmp)))).
EXECUTE.

* Sort variables of variable list using the sorted macro.
MATCH FILES FILE=* /KEEP @tmpmac.

EXECUTE.

/* @trace '*===== [ End @sortvar ] =====*' . */ .

!ENDDEFINE.

---- cut ----


-----Ursprüngliche Nachricht-----
Von: SPSSX(r) Discussion [mailto:[hidden email]] Im Auftrag von Richard Ristow
Gesendet: Freitag, 24. August 2007 05:29
An: [hidden email]
Betreff: Re: Ordering of Variables

At 10:18 PM 8/23/2007, Jean Hanson wrote:

>Is it possible through syntax to sort the variables in a particular
>order? If so, could someone point me in the right direction?

It takes a pass through the data, but you can do it with ADD FILES:

ADD FILES
    /FILE=*
    /KEEP= <and list of variables> ALL.

Your new file will have "<and list of variables>" at the beginning, in the order you listed them; keyword ALL means that variables not on the list will follow, in their original relative order. (It's always a good idea to end with ALL, even if you think you have listed all the variables, so you don't lose any variables by mistake.)
Reply | Threaded
Open this post in threaded view
|

By-pass "MATCH FILES" with SQL?

Albert-Jan Roskam
Hi list,

I regularly work with very large files and I am trying
to find a faster alternative to MATCH FILES. I am
assuming that our databaseserver is a multi-cored
beast (multiple CPU cores can be effectively used by
sql server, not by spss) with lots of RAM (more than
my own desktop has).

In the example below I am linking a relatively small
codelist to a  100M+ table in order to retrieve a
table with only those codes in the list (perhaps I
could also have used an inner join or an equi join for
this, I am not 100% sure, but I guess so).

* get the codelist and save it to the DB.
get file = 'd:\temp\codelist.sav'.

save translate /type=odbc /connect=
 'driver=sql server;'
  'server=myserver;'
  'database=mydb;'
  'trusted_connection=yes'
  / table = 'spss2sql' / keep mycode / replace  .
* -- Question: is it possible/advisable to use a
temporary table here?. I.e. "#spss2sql".

* get the actual data, but only those records that
have values of variable 'mycode' in common with those
in table spss2sql.
get data /type=ODBC /connect=
  'driver=SQL Server;'
  'server=myserver;'
  'database = mydb;'
  'Trusted_Connection=Yes'
 /SQL =
 'SELECT * FROM some_very_large_table '
 ' WHERE code IN (SELECT mycode FROM spss2sql) '
 /ASSUMEDSTRWIDTH=255

save outfile 'd:\temp\matched.sav'.

Does this approach make sense? Remember that my main
objective was to *save computer time*. I find that
SORTing and MATCHing such large tables is really too
time-consuming. But I also prefer to use (and switch
between) as few applications as possible, so it's
nicer if I can do everything from SPSS.

One additional (small) question. If I used "ORDER BY
myvar" in the GET DATA--SQL statement, why does SPSS
give an error when I use AGGREGATE--PRESORTED? Do spss
files have some kind of header info about the 'sorting
status' (comparable to SAS tables)? Can I just ignore
that error message when it occurs in cases like this?

Thanks in advance for your replies!

Cheers!!
Albert-Jan

Cheers!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Did you know that 87.166253% of all statistics claim a precision of results that is not justified by the method employed? [HELMUT RICHTER]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


      ____________________________________________________________________________________
Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center.
http://autos.yahoo.com/green_center/
Reply | Threaded
Open this post in threaded view
|

Re: By-pass "MATCH FILES" with SQL?

ViAnn Beadle
If your data are already stored in SPSS save files, MATCH FILES should be
faster than SQL via ODBC. I've seen benchmarks in which case processing of
SPSS data files is 40 times as fast as ODBC. YMMV, so do your benchmarks.

In some organizations where data are already stored in a warehouse, a
scheduled extract is done daily to create an analysis "data mart" comprised
of SPSS data files.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Albert-jan Roskam
Sent: Friday, August 24, 2007 6:35 AM
To: [hidden email]
Subject: By-pass "MATCH FILES" with SQL?

Hi list,

I regularly work with very large files and I am trying
to find a faster alternative to MATCH FILES. I am
assuming that our databaseserver is a multi-cored
beast (multiple CPU cores can be effectively used by
sql server, not by spss) with lots of RAM (more than
my own desktop has).

In the example below I am linking a relatively small
codelist to a  100M+ table in order to retrieve a
table with only those codes in the list (perhaps I
could also have used an inner join or an equi join for
this, I am not 100% sure, but I guess so).

* get the codelist and save it to the DB.
get file = 'd:\temp\codelist.sav'.

save translate /type=odbc /connect=
 'driver=sql server;'
  'server=myserver;'
  'database=mydb;'
  'trusted_connection=yes'
  / table = 'spss2sql' / keep mycode / replace  .
* -- Question: is it possible/advisable to use a
temporary table here?. I.e. "#spss2sql".

* get the actual data, but only those records that
have values of variable 'mycode' in common with those
in table spss2sql.
get data /type=ODBC /connect=
  'driver=SQL Server;'
  'server=myserver;'
  'database = mydb;'
  'Trusted_Connection=Yes'
 /SQL =
 'SELECT * FROM some_very_large_table '
 ' WHERE code IN (SELECT mycode FROM spss2sql) '
 /ASSUMEDSTRWIDTH=255

save outfile 'd:\temp\matched.sav'.

Does this approach make sense? Remember that my main
objective was to *save computer time*. I find that
SORTing and MATCHing such large tables is really too
time-consuming. But I also prefer to use (and switch
between) as few applications as possible, so it's
nicer if I can do everything from SPSS.

One additional (small) question. If I used "ORDER BY
myvar" in the GET DATA--SQL statement, why does SPSS
give an error when I use AGGREGATE--PRESORTED? Do spss
files have some kind of header info about the 'sorting
status' (comparable to SAS tables)? Can I just ignore
that error message when it occurs in cases like this?

Thanks in advance for your replies!

Cheers!!
Albert-Jan

Cheers!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Did you know that 87.166253% of all statistics claim a precision of results
that is not justified by the method employed? [HELMUT RICHTER]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



____________________________________________________________________________
________
Park yourself in front of a world of choices in alternative vehicles. Visit
the Yahoo! Auto Green Center.
http://autos.yahoo.com/green_center/