SPSSX Discussion

Constructing tables

Classic

List

Threaded

6 messages Options

Staffan Lindberg

Constructing tables

Dear list!

I'm constructing a set of tables (appr 50 vars) comparing each specific
treatment facility (appr. 150) with the sum of total facilities (including
the specific facility). My approach to this would be to use 2 files (the
total file and a selection of the specific treatment facility). Then I would
add the 2 files together setting a marker variable to 1 for the second
instance of the specific file and to 2 for the total file. These selections
and mergings would then have to be done c:a 150 times.

I have a sneaking suspicion that there must be a more elegant and
time-saving solution than this. Some things I ponder about:

1. Is there a possibility to syntactically duplicate the values for a
facility directly into the active file?
2. Is there a possibility to save outfiles by another variable (i.e. Save
outfile='Unit' by Treatfac, thus automatically generating Unit1, Unit2
.......UnitN as outfiles?
3. Am I not seing the forest because of all the trees? (not an uncommon
occurence!). There is of course a much simpler solution than mine, but
which?
4. Python would as usual solve all these problems (although I haven't
started with it yet but will attend Ray's course later) ?

Any hints would be very much appreciated

best

Staffan Lindberg
Sweden

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

ViAnn Beadle

Re: Constructing tables

I'm a bit unclear as to what is in these tables but it seems to me that
AGGREGATE with mode ADDVARIABLES could be used to add total data to facility
data This assumes that your facility data are in a single case. Perhaps you
could provide the list more information on your file structure. Also, what
reporting procedure within SPSS are you using?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Staffan Lindberg
Sent: Sunday, June 01, 2008 4:33 AM
To: [hidden email]
Subject: Constructing tables

Dear list!

I'm constructing a set of tables (appr 50 vars) comparing each specific
treatment facility (appr. 150) with the sum of total facilities (including
the specific facility). My approach to this would be to use 2 files (the
total file and a selection of the specific treatment facility). Then I would
add the 2 files together setting a marker variable to 1 for the second
instance of the specific file and to 2 for the total file. These selections
and mergings would then have to be done c:a 150 times.

I have a sneaking suspicion that there must be a more elegant and
time-saving solution than this. Some things I ponder about:

1. Is there a possibility to syntactically duplicate the values for a
facility directly into the active file?
2. Is there a possibility to save outfiles by another variable (i.e. Save
outfile='Unit' by Treatfac, thus automatically generating Unit1, Unit2
.......UnitN as outfiles?
3. Am I not seing the forest because of all the trees? (not an uncommon
occurence!). There is of course a much simpler solution than mine, but
which?
4. Python would as usual solve all these problems (although I haven't
started with it yet but will attend Ray's course later) ?

Any hints would be very much appreciated

best

Staffan Lindberg
Sweden

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Staffan Lindberg

SV: Constructing tables

Hi ViAnn!

Thanks for your input. I probably was not quite clear on how my material is
organized. I am using Custom Tables. My file is appr. 20.000 individual
cases (patients) with appr. 50 variables on each record. There are appr. 150
treatment facilities with a varying number of patients. These facilities are
numbered but not consecutively (i.e. 5,8,11........2000) with the name of
the facility as value labels. The variables are all nominal or ordinal.

The tables I will produce contain 6 columns. Every facility has 4 sets of
tables with different titles.

1. An abbreviated variable name (i.e Educational level). Only for the first
level.
2. The value labels within this variable (i.e the different levels)
3. Count for the different levels for the specific treatment facility
4. Column percent for the different levels for the specific treatment
facility

Columns 3 and 4 have as a subheading the name of of the specific facility

5. Count for the total material (including the specific facility)
6. Column percent for the total material

Column 5 and 6 have as a subheading "Total material"

After constructing the tables I export them to Excel. This is in order to
get the horisontal lines separating the variables from each other. After
that I run some macros in Excel in order to get the right page breaks and a
lot of layout changes.

I can do this by creating a marker variable for the total file and set this
to 2 (Status=2). This will be the same for all tables.Then I select the
specific tratment facility and make a separate file of this with the marker
variable as 1 (Status=1). Then I merge the files together (Add Cases) and
give the marker variable the value labels (1 'Name of the specific
facility', 2 "Total material" After that I run Custom Tables and export the
table in the output viewer to Excel and run the macros. Maybe I should also
explaing that the purpose of all this is to serve each facility with tables
that makes it easy for them to see there own values and compare them with
the total material

Following this procedure 150 times seems extremly tedious and I'm looking
for solutions to simplify this process. I will look at Aggregate and see
where it leads. Thanks again ViAnn!

Hoping this explanation does not make it any more confusing.

best

Staffan

-----Ursprungligt meddelande-----
Från: ViAnn Beadle [mailto:[hidden email]]
Skickat: den 1 juni 2008 14:48
Till: 'Staffan Lindberg'; [hidden email]
Ämne: RE: Constructing tables

I'm a bit unclear as to what is in these tables but it seems to me that
AGGREGATE with mode ADDVARIABLES could be used to add total data to facility
data This assumes that your facility data are in a single case. Perhaps you
could provide the list more information on your file structure. Also, what
reporting procedure within SPSS are you using?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Staffan Lindberg
Sent: Sunday, June 01, 2008 4:33 AM
To: [hidden email]
Subject: Constructing tables

Dear list!

I'm constructing a set of tables (appr 50 vars) comparing each specific
treatment facility (appr. 150) with the sum of total facilities (including
the specific facility). My approach to this would be to use 2 files (the
total file and a selection of the specific treatment facility). Then I would
add the 2 files together setting a marker variable to 1 for the second
instance of the specific file and to 2 for the total file. These selections
and mergings would then have to be done c:a 150 times.

I have a sneaking suspicion that there must be a more elegant and
time-saving solution than this. Some things I ponder about:

1. Is there a possibility to syntactically duplicate the values for a
facility directly into the active file? 2. Is there a possibility to save
outfiles by another variable (i.e. Save outfile='Unit' by Treatfac, thus
automatically generating Unit1, Unit2 .......UnitN as outfiles? 3. Am I not
seing the forest because of all the trees? (not an uncommon occurence!).
There is of course a much simpler solution than mine, but which? 4. Python
would as usual solve all these problems (although I haven't started with it
yet but will attend Ray's course later) ?

Any hints would be very much appreciated

best

Staffan Lindberg
Sweden

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Staffan Lindberg

Reposting?

In reply to this post by ViAnn Beadle

Apologies for this possible reposting. Ordinarily whem I post to the list I
get an acknoledgement that it has been distributed to so and so many
recipients followed by a stream of "Out of Office" postings. This time
nothing of this happened. This makes me suspect that it was not posted
properly. I'm sitting on a somewhar shaky line. Apologies if You have seen
this before.

best

Staffan Lindberg
Sweden

Hi ViAnn!

Thanks for your input. I probably was not quite clear on how my material is
organized. I am using Custom Tables. My file is appr. 20.000 individual
cases (patients) with appr. 50 variables on each record. There are appr. 150
treatment facilities with a varying number of patients. These facilities are
numbered but not consecutively (i.e. 5,8,11........2000) with the name of
the facility as value labels. The variables are all nominal or ordinal.

The tables I will produce contain 6 columns. Every facility has 4 sets of
tables with different titles.

1. An abbreviated variable name (i.e Educational level). Only for the first
level. 2. The value labels within this variable (i.e the different levels)
3. Count for the different levels for the specific treatment facility 4.
Column percent for the different levels for the specific treatment facility

Columns 3 and 4 have as a subheading the name of of the specific facility

5. Count for the total material (including the specific facility) 6. Column
percent for the total material

Column 5 and 6 have as a subheading "Total material"

After constructing the tables I export them to Excel. This is in order to
get the horisontal lines separating the variables from each other. After
that I run some macros in Excel in order to get the right page breaks and a
lot of layout changes.

I can do this by creating a marker variable for the total file and set this
to 2 (Status=2). This will be the same for all tables.Then I select the
specific tratment facility and make a separate file of this with the marker
variable as 1 (Status=1). Then I merge the files together (Add Cases) and
give the marker variable the value labels (1 'Name of the specific
facility', 2 "Total material" After that I run Custom Tables and export the
table in the output viewer to Excel and run the macros. Maybe I should also
explaing that the purpose of all this is to serve each facility with tables
that makes it easy for them to see there own values and compare them with
the total material

Following this procedure 150 times seems extremly tedious and I'm looking
for solutions to simplify this process. I will look at Aggregate and see
where it leads. Thanks again ViAnn!

Hoping this explanation does not make it any more confusing.

best

Staffan

-----Ursprungligt meddelande-----
Från: ViAnn Beadle [mailto:[hidden email]]
Skickat: den 1 juni 2008 14:48
Till: 'Staffan Lindberg'; [hidden email]
Ämne: RE: Constructing tables

I'm a bit unclear as to what is in these tables but it seems to me that
AGGREGATE with mode ADDVARIABLES could be used to add total data to facility
data This assumes that your facility data are in a single case. Perhaps you
could provide the list more information on your file structure. Also, what
reporting procedure within SPSS are you using?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Staffan Lindberg
Sent: Sunday, June 01, 2008 4:33 AM
To: [hidden email]
Subject: Constructing tables

Dear list!

I'm constructing a set of tables (appr 50 vars) comparing each specific
treatment facility (appr. 150) with the sum of total facilities (including
the specific facility). My approach to this would be to use 2 files (the
total file and a selection of the specific treatment facility). Then I would
add the 2 files together setting a marker variable to 1 for the second
instance of the specific file and to 2 for the total file. These selections
and mergings would then have to be done c:a 150 times.

I have a sneaking suspicion that there must be a more elegant and
time-saving solution than this. Some things I ponder about:

1. Is there a possibility to syntactically duplicate the values for a
facility directly into the active file? 2. Is there a possibility to save
outfiles by another variable (i.e. Save outfile='Unit' by Treatfac, thus
automatically generating Unit1, Unit2 .......UnitN as outfiles? 3. Am I not
seing the forest because of all the trees? (not an uncommon occurence!).
There is of course a much simpler solution than mine, but which? 4. Python
would as usual solve all these problems (although I haven't started with it
yet but will attend Ray's course later) ?

Any hints would be very much appreciated

best

Staffan Lindberg
Sweden

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

ViAnn Beadle

Re: Reposting?

Your original post came through yesterday. You're not likely to get many
responses on a Sunday.

Here's some things to think about. Why not put the facility variable in the
layer and ask for stats on that layer? One table specification will do it
and you can print it directly from within SPSS getting one table per layer
value.

Also, you can use a table look to specify only a border between the column
labels and the table body.

-----Original Message-----
From: Staffan Lindberg [mailto:[hidden email]]
Sent: Monday, June 02, 2008 8:30 AM
To: 'ViAnn Beadle'; [hidden email]
Subject: Reposting?

Apologies for this possible reposting. Ordinarily whem I post to the list I
get an acknoledgement that it has been distributed to so and so many
recipients followed by a stream of "Out of Office" postings. This time
nothing of this happened. This makes me suspect that it was not posted
properly. I'm sitting on a somewhar shaky line. Apologies if You have seen
this before.

best

Staffan Lindberg
Sweden

Hi ViAnn!

Thanks for your input. I probably was not quite clear on how my material is
organized. I am using Custom Tables. My file is appr. 20.000 individual
cases (patients) with appr. 50 variables on each record. There are appr. 150
treatment facilities with a varying number of patients. These facilities are
numbered but not consecutively (i.e. 5,8,11........2000) with the name of
the facility as value labels. The variables are all nominal or ordinal.

The tables I will produce contain 6 columns. Every facility has 4 sets of
tables with different titles.

1. An abbreviated variable name (i.e Educational level). Only for the first
level. 2. The value labels within this variable (i.e the different levels)
3. Count for the different levels for the specific treatment facility 4.
Column percent for the different levels for the specific treatment facility

Columns 3 and 4 have as a subheading the name of of the specific facility

5. Count for the total material (including the specific facility) 6. Column
percent for the total material

Column 5 and 6 have as a subheading "Total material"

After constructing the tables I export them to Excel. This is in order to
get the horisontal lines separating the variables from each other. After
that I run some macros in Excel in order to get the right page breaks and a
lot of layout changes.

I can do this by creating a marker variable for the total file and set this
to 2 (Status=2). This will be the same for all tables.Then I select the
specific tratment facility and make a separate file of this with the marker
variable as 1 (Status=1). Then I merge the files together (Add Cases) and
give the marker variable the value labels (1 'Name of the specific
facility', 2 "Total material" After that I run Custom Tables and export the
table in the output viewer to Excel and run the macros. Maybe I should also
explaing that the purpose of all this is to serve each facility with tables
that makes it easy for them to see there own values and compare them with
the total material

Following this procedure 150 times seems extremly tedious and I'm looking
for solutions to simplify this process. I will look at Aggregate and see
where it leads. Thanks again ViAnn!

Hoping this explanation does not make it any more confusing.

best

Staffan

-----Ursprungligt meddelande-----
Från: ViAnn Beadle [mailto:[hidden email]]
Skickat: den 1 juni 2008 14:48
Till: 'Staffan Lindberg'; [hidden email]
Ämne: RE: Constructing tables

I'm a bit unclear as to what is in these tables but it seems to me that
AGGREGATE with mode ADDVARIABLES could be used to add total data to facility
data This assumes that your facility data are in a single case. Perhaps you
could provide the list more information on your file structure. Also, what
reporting procedure within SPSS are you using?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Staffan Lindberg
Sent: Sunday, June 01, 2008 4:33 AM
To: [hidden email]
Subject: Constructing tables

Dear list!

I'm constructing a set of tables (appr 50 vars) comparing each specific
treatment facility (appr. 150) with the sum of total facilities (including
the specific facility). My approach to this would be to use 2 files (the
total file and a selection of the specific treatment facility). Then I would
add the 2 files together setting a marker variable to 1 for the second
instance of the specific file and to 2 for the total file. These selections
and mergings would then have to be done c:a 150 times.

I have a sneaking suspicion that there must be a more elegant and
time-saving solution than this. Some things I ponder about:

1. Is there a possibility to syntactically duplicate the values for a
facility directly into the active file? 2. Is there a possibility to save
outfiles by another variable (i.e. Save outfile='Unit' by Treatfac, thus
automatically generating Unit1, Unit2 .......UnitN as outfiles? 3. Am I not
seing the forest because of all the trees? (not an uncommon occurence!).
There is of course a much simpler solution than mine, but which? 4. Python
would as usual solve all these problems (although I haven't started with it
yet but will attend Ray's course later) ?

Any hints would be very much appreciated

best

Staffan Lindberg
Sweden

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Peck, Jon

Case-Control Matching

In reply to this post by Staffan Lindberg

I have posted a beta version of a case-control matching module to SPSS Developer Central (www.spss.com/devcentral). It is implemented as an SPSS Extension command, for those not familiar with Python, but its main function can be used directly in Python code as well.

Te syntax description is below.

Comments and bug reports should be sent to [hidden email]

I hope you will find this useful.

Regards,
Jon Peck

CASECTRL
Match cases from two datasets drawing randomly from matching cases.

CASECTRL DEMANDERDS=dsname SUPPLIERDS=dsname BY=list of keys
SUPPLIERID=varname NEWDEMANDERIDVARS=list of variable names

Optional parameters:
COPYTODEMANDER=list of supplier variable names
MATCHGROUPVAR = variable name (default is "matchgroup")
DEMANDERID = variable name
DS3 = dataset name
/OPTIONS
SAMPLEWITHREPLACEMENT=TRUE or FALSE (default)
MINIMIZEMEMORY = TRUE (default) or FALSE
SHUFFLE = TRUE or FALSE (default)
SEED = number

/HELP.

Example:
CASECTRL DEMANDERDS=demand SUPPLIERDS = supply BY=agegroup gender
SUPPLIERID = id NEWDEMANDERIDVARS=supplierId.

CASECTRL takes two datasets, a demander and a supplier. It attempts to find a match for each
demander case from the supplier dataset based on the variables named in BY. If more than one
candidate matches, it picks randomly. No sorting of either dataset is required.

This procedure builds some possibly large tables in memory, so it may not be appropriate for very
large datasets.

There are several output options.
The ID or IDs of matching cases are appended to the demander dataset variables. The number of
variables listed as NEWDEMANDERIDVARS determines how many matches are attempted. These variables
must not already exist in the demander dataset.

The variables in the supplier dataset that are listed in COPYTODEMANDER are copied to the
demander dataset as new variables or replacement values. For existing variables, the types
must agree. If no match is found, existing demander dataset values are not changed and new
variable values will be sysmis or blank.
Only one MATCHGROUPVAR may be specified if COPYTODEMANDER is used.
None of the metadata such as variable and value labels is copied. Use APPLY DICTIONARY
to bring over variable properties.

If DS3 is specified, a new dataset is created containing the cases in the supplier dataset actually
used for the matches. It will be the active dataset after the command is run.
(This implies that any unnamed dataset will be closed.)
It contains all the variables from the supplier dataset plus the MATCHGROUPVAR.
If DEMANDERID is specified, it also contains the ID variable from the demander dataset. These
variable names must all be unique.
DS3 is only a dataset: use the SAVE command to turn it into a file.

By default, sampling from the supplier dataset is done without replacement. Specify
SAMPLEWITHREPLACEMENT=TRUE to sample with replacement.

By default, memory usage is minimized in picking supplier dataset candidate match cases
(all eligible cases have an equal probability of selection). This requires an extra data pass. If
the possible number of matching cases for a demander case is small or the supplier dataset is
not large, specifying
MINIMIZEMEMORY=FALSE may improve performance by eliminating the extra data pass. In the
case of 1-1 matching, this is recommended.

By default, cases in the demander dataset are processed in case order. If there are insufficient
supplier cases, you may specify SHUFFLE=TRUE to process the demander cases in random order.
This ensures that earlier cases do not have an advantage over later ones in matching.
SHUFFLE increases the memory requirement and will take longer to execute.

Use SEED=number to set the random number generator to a known state for repeatability.

CASECTRL /HELP.
prints this output and does nothing else.

Example:
CASECTRL DEMANDERDS=demander SUPPLIERDS=supplier
BY=origin cylinder SUPPLIERID=id
NEWDEMANDERIDVARS=matchedcaseid
COPYTODEMANDER=mpg randomnumber randomstring
DS3=dsextra DEMANDERID=demanderid.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD