request for PYTHON to stack (pool) variable views from a list of files into a system files

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

request for PYTHON to stack (pool) variable views from a list of files into a system files

Art Kendall
Earlier I posted a similar request and received from users a macro/OMS
approach   and an excel macro approach which I will try this weekend if
I cannot get a PYTHON approach.
However, I thought this would be a natural for PYTHON and might be
useful for many users, so I am making another request.

I realized that part of my problem might be that I asked for all of the
columns from the variable view and this might be problematic for missing
values and value labels.
Now I am making a simpler request asking only for a 4 column data file.
source_file position variable_name variable_label
and could get by with just 3 columns. and later use LAG to get the position.
source_file position variable_name variable_label


*One approach* would be to
by PYTHON
read a list of filenames from a file
open a target SPSS file
for each file
        bring it into SPSS
        put the needed variables from the variables view into the target
file
next file.
save the target file.



*Another approach *would be for to write syntax either directly or via
macro that
opens the target file
then for each file
     bring the file in
     then do some python magic to write the info to the target file
next file.
save the target file.

this is an idea of that approach. *Then the question is what to put into
the file* 'd:\project\python_stuff.abc'.

dataset declare target.
GET SAS DATA='D:\project\2004\Alabama.sas7bdat'.
dataset name source.
string source_file (a60).
compute source_file = 'D:\project\2004\Alabama.sas7bdat'.
insert 'd:\project\python_stuff.abc'.
dataset close source.

GET SAS DATA='D:\project\2204\Alaska.sas7bdat'.
dataset name source.
string source_file (a60).
compute source_file = 'D:\project\2004\Alaska.sas7bdat'.
insert 'd:\project\python_stuff.abc'.
dataset close source.

. . .

GET SAS DATA='D:\project\2004\West Virginia.sas7bdat'.
dataset name source.
string source_file (a60).
compute source_file = 'D:\project\2004\West Virginia.sas7bdat'.
insert 'd:\project\python_stuff.abc'.
dataset close source.

dataset activate target.
save outfile= 'd:\project\big dictionary.sav'.


- - - -
In the long run it would be handy if  SPSS were enhanced so that in
variable view one could do <file> <save as> .
 It would be even better if there were a syntax way to save the variable
view to a data file.

Art Kendall
Social Research Consultants
Celebrating the 60th Anniversary of the UN's*/ Universal Declaration of
Human Rights/*

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: request for PYTHON to stack (pool) variable views from a list of files into a system files

Peck, Jon
Art,

 

Take a look at APPLY DICTIONARY.  It allows you to copy a variable dictionary – selected properties or all – to a new data file.  We enhanced this command a few releases back in part to make it easy to use a sav file as a template for a new file.

 

That does not deal with the looping part, but if you loop over a bunch of files and just keep using APPLY DICTIONARY to transfer the variable information to the same dataset, you are done.

 

Regards,

Jon

 

________________________________

From: Art Kendall [mailto:[hidden email]]
Sent: Friday, May 23, 2008 9:51 AM
To: SPSSX-L post
Cc: Peck, Jon
Subject: request for PYTHON to stack (pool) variable views from a list of files into a system files

 

Earlier I posted a similar request and received from users a macro/OMS approach   and an excel macro approach which I will try this weekend if I cannot get a PYTHON approach.
However, I thought this would be a natural for PYTHON and might be useful for many users, so I am making another request.

I realized that part of my problem might be that I asked for all of the columns from the variable view and this might be problematic for missing values and value labels.
Now I am making a simpler request asking only for a 4 column data file.
source_file position variable_name variable_label
and could get by with just 3 columns. and later use LAG to get the position.
source_file position variable_name variable_label


One approach would be to
by PYTHON
read a list of filenames from a file
open a target SPSS file
for each file
        bring it into SPSS
        put the needed variables from the variables view into the target file
next file.
save the target file.



Another approach would be for to write syntax either directly or via macro that
opens the target file
then for each file
     bring the file in
     then do some python magic to write the info to the target file
next file.
save the target file.

this is an idea of that approach. Then the question is what to put into the file 'd:\project\python_stuff.abc'.

dataset declare target.
GET SAS DATA='D:\project\2004\Alabama.sas7bdat'.
dataset name source.
string source_file (a60).
compute source_file = 'D:\project\2004\Alabama.sas7bdat'.
insert 'd:\project\python_stuff.abc'.
dataset close source.

GET SAS DATA='D:\project\2204\Alaska.sas7bdat'.
dataset name source.
string source_file (a60).
compute source_file = 'D:\project\2004\Alaska.sas7bdat'.
insert 'd:\project\python_stuff.abc'.
dataset close source.

. . .

GET SAS DATA='D:\project\2004\West Virginia.sas7bdat'.
dataset name source.
string source_file (a60).
compute source_file = 'D:\project\2004\West Virginia.sas7bdat'.
insert 'd:\project\python_stuff.abc'.
dataset close source.

dataset activate target.
save outfile= 'd:\project\big dictionary.sav'.


- - - -
In the long run it would be handy if  SPSS were enhanced so that in variable view one could do <file> <save as> .
 It would be even better if there were a syntax way to save the variable view to a data file.

Art Kendall
Social Research Consultants
Celebrating the 60th Anniversary of the UN's Universal Declaration of Human Rights

Reply | Threaded
Open this post in threaded view
|

Re: request for PYTHON to stack (pool) variable views from a list of files into a system files

Peck, Jon
In reply to this post by Art Kendall
Art,

Take a look at APPLY DICTIONARY.  It allows you to copy a variable dictionary – selected properties or all – to a new data file.  We enhanced this command a few releases back in part to make it easy to use a sav file as a template for a new file.

That does not deal with the looping part, but if you loop over a bunch of files and just keep using APPLY DICTIONARY to transfer the variable information to the same dataset, you are done.

Regards,

Jon


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Art Kendall
Sent: Friday, May 23, 2008 9:51 AM
To: [hidden email]
Subject: [SPSSX-L] request for PYTHON to stack (pool) variable views from a list of files into a system files

Earlier I posted a similar request and received from users a macro/OMS
approach   and an excel macro approach which I will try this weekend if
I cannot get a PYTHON approach.
However, I thought this would be a natural for PYTHON and might be
useful for many users, so I am making another request.

I realized that part of my problem might be that I asked for all of the
columns from the variable view and this might be problematic for missing
values and value labels.
Now I am making a simpler request asking only for a 4 column data file.
source_file position variable_name variable_label
and could get by with just 3 columns. and later use LAG to get the position.
source_file position variable_name variable_label


*One approach* would be to
by PYTHON
read a list of filenames from a file
open a target SPSS file
for each file
        bring it into SPSS
        put the needed variables from the variables view into the target
file
next file.
save the target file.



*Another approach *would be for to write syntax either directly or via
macro that
opens the target file
then for each file
     bring the file in
     then do some python magic to write the info to the target file
next file.
save the target file.

this is an idea of that approach. *Then the question is what to put into
the file* 'd:\project\python_stuff.abc'.

dataset declare target.
GET SAS DATA='D:\project\2004\Alabama.sas7bdat'.
dataset name source.
string source_file (a60).
compute source_file = 'D:\project\2004\Alabama.sas7bdat'.
insert 'd:\project\python_stuff.abc'.
dataset close source.

GET SAS DATA='D:\project\2204\Alaska.sas7bdat'.
dataset name source.
string source_file (a60).
compute source_file = 'D:\project\2004\Alaska.sas7bdat'.
insert 'd:\project\python_stuff.abc'.
dataset close source.

. . .

GET SAS DATA='D:\project\2004\West Virginia.sas7bdat'.
dataset name source.
string source_file (a60).
compute source_file = 'D:\project\2004\West Virginia.sas7bdat'.
insert 'd:\project\python_stuff.abc'.
dataset close source.

dataset activate target.
save outfile= 'd:\project\big dictionary.sav'.


- - - -
In the long run it would be handy if  SPSS were enhanced so that in
variable view one could do <file> <save as> .
 It would be even better if there were a syntax way to save the variable
view to a data file.

Art Kendall
Social Research Consultants
Celebrating the 60th Anniversary of the UN's*/ Universal Declaration of
Human Rights/*

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: request for PYTHON to stack (pool) variable views from a list of files into a system files

Art Kendall
let me try to be a little clearer.

"APPLY DICTIONARY can apply variable and file-based dictionary
information from an external SPSS-format data file or open dataset to
the current active dataset. Variable-based dictionary information in the
current active dataset can be applied to other variables in the current
active dataset."

Unless there is a trick I don't see at first that does not do what I am
looking for.  Wouldn't that give me a very wide file, but only have
UNIQUE variable names so that if two or more files use the same name for
a variable it will only occur once and not be tied to which States used
that name?

*Background.*
States are not consistent   from state to state nor from year to year in
what info they make available.  Nor in the names that they apply to
variables.  Nor in which grades they report about.  However, the labels
do mention critical info such as the grade and subject. For all subjects
reported  for all grades reported  by gender and race and poverty and
disability status there end up being several tens of thousand of variables.

some examples
one state may report
  c3g3r 'percent students making pass score in reading for grade 3'.
another may report
   c2rg3 'reading grade3 percent not passing all students'
or
    c4.3.r  'all students passing or above in grade 3 reading as percent.'
or
    c4.3.r  'all students at passing level in grade 3 reading as percent.'
*
Goal.*

 What I want to end up with a data file with 3 columns: source_file
v_name   v_label  and say 50,000 rows.
where the columns are
source_file    the name of the file part of which is the name of the State.
v_name         the cell contents for a row in the variable view in the
variable name column.
v_label         the cell contents for a row in the variable view in the
variable label column.

*Why I want such a file.*
Then I can eliminate all rows from this list that are about science,
math, history, etc.; all grades that are outside the range I am
interested in; and all subgroups I am not interested in; etc.
Then I will be able to be able to do sorts and eyeball groups of similar
variables so that I can use transformations to create in each state new
variables with consistent names that  represent what I am looking for.

Art Kendall
Social Research Consultants
Celebrating the 60th Anniversary of the UN's*/ Universal Declaration of
Human Rights/*



Peck, Jon wrote:

> Art,
>
> Take a look at APPLY DICTIONARY.  It allows you to copy a variable dictionary – selected properties or all – to a new data file.  We enhanced this command a few releases back in part to make it easy to use a sav file as a template for a new file.
>
> That does not deal with the looping part, but if you loop over a bunch of files and just keep using APPLY DICTIONARY to transfer the variable information to the same dataset, you are done.
>
> Regards,
>
> Jon
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Art Kendall
> Sent: Friday, May 23, 2008 9:51 AM
> To: [hidden email]
> Subject: [SPSSX-L] request for PYTHON to stack (pool) variable views from a list of files into a system files
>
> Earlier I posted a similar request and received from users a macro/OMS
> approach   and an excel macro approach which I will try this weekend if
> I cannot get a PYTHON approach.
> However, I thought this would be a natural for PYTHON and might be
> useful for many users, so I am making another request.
>
> I realized that part of my problem might be that I asked for all of the
> columns from the variable view and this might be problematic for missing
> values and value labels.
> Now I am making a simpler request asking only for a 4 column data file.
> source_file position variable_name variable_label
> and could get by with just 3 columns. and later use LAG to get the position.
> source_file position variable_name variable_label
>
>
> *One approach* would be to
> by PYTHON
> read a list of filenames from a file
> open a target SPSS file
> for each file
>         bring it into SPSS
>         put the needed variables from the variables view into the target
> file
> next file.
> save the target file.
>
>
>
> *Another approach *would be for to write syntax either directly or via
> macro that
> opens the target file
> then for each file
>      bring the file in
>      then do some python magic to write the info to the target file
> next file.
> save the target file.
>
> this is an idea of that approach. *Then the question is what to put into
> the file* 'd:\project\python_stuff.abc'.
>
> dataset declare target.
> GET SAS DATA='D:\project\2004\Alabama.sas7bdat'.
> dataset name source.
> string source_file (a60).
> compute source_file = 'D:\project\2004\Alabama.sas7bdat'.
> insert 'd:\project\python_stuff.abc'.
> dataset close source.
>
> GET SAS DATA='D:\project\2204\Alaska.sas7bdat'.
> dataset name source.
> string source_file (a60).
> compute source_file = 'D:\project\2004\Alaska.sas7bdat'.
> insert 'd:\project\python_stuff.abc'.
> dataset close source.
>
> . . .
>
> GET SAS DATA='D:\project\2004\West Virginia.sas7bdat'.
> dataset name source.
> string source_file (a60).
> compute source_file = 'D:\project\2004\West Virginia.sas7bdat'.
> insert 'd:\project\python_stuff.abc'.
> dataset close source.
>
> dataset activate target.
> save outfile= 'd:\project\big dictionary.sav'.
>
>
> - - - -
> In the long run it would be handy if  SPSS were enhanced so that in
> variable view one could do <file> <save as> .
>  It would be even better if there were a syntax way to save the variable
> view to a data file.
>
> Art Kendall
> Social Research Consultants
> Celebrating the 60th Anniversary of the UN's*/ Universal Declaration of
> Human Rights/*
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants