|
Dear list!
I'm constructing a set of tables (appr 50 vars) comparing each specific treatment facility (appr. 150) with the sum of total facilities (including the specific facility). My approach to this would be to use 2 files (the total file and a selection of the specific treatment facility). Then I would add the 2 files together setting a marker variable to 1 for the second instance of the specific file and to 2 for the total file. These selections and mergings would then have to be done c:a 150 times. I have a sneaking suspicion that there must be a more elegant and time-saving solution than this. Some things I ponder about: 1. Is there a possibility to syntactically duplicate the values for a facility directly into the active file? 2. Is there a possibility to save outfiles by another variable (i.e. Save outfile='Unit' by Treatfac, thus automatically generating Unit1, Unit2 .......UnitN as outfiles? 3. Am I not seing the forest because of all the trees? (not an uncommon occurence!). There is of course a much simpler solution than mine, but which? 4. Python would as usual solve all these problems (although I haven't started with it yet but will attend Ray's course later) ? Any hints would be very much appreciated best Staffan Lindberg Sweden ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
I'm a bit unclear as to what is in these tables but it seems to me that
AGGREGATE with mode ADDVARIABLES could be used to add total data to facility data This assumes that your facility data are in a single case. Perhaps you could provide the list more information on your file structure. Also, what reporting procedure within SPSS are you using? -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Staffan Lindberg Sent: Sunday, June 01, 2008 4:33 AM To: [hidden email] Subject: Constructing tables Dear list! I'm constructing a set of tables (appr 50 vars) comparing each specific treatment facility (appr. 150) with the sum of total facilities (including the specific facility). My approach to this would be to use 2 files (the total file and a selection of the specific treatment facility). Then I would add the 2 files together setting a marker variable to 1 for the second instance of the specific file and to 2 for the total file. These selections and mergings would then have to be done c:a 150 times. I have a sneaking suspicion that there must be a more elegant and time-saving solution than this. Some things I ponder about: 1. Is there a possibility to syntactically duplicate the values for a facility directly into the active file? 2. Is there a possibility to save outfiles by another variable (i.e. Save outfile='Unit' by Treatfac, thus automatically generating Unit1, Unit2 .......UnitN as outfiles? 3. Am I not seing the forest because of all the trees? (not an uncommon occurence!). There is of course a much simpler solution than mine, but which? 4. Python would as usual solve all these problems (although I haven't started with it yet but will attend Ray's course later) ? Any hints would be very much appreciated best Staffan Lindberg Sweden ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi ViAnn!
Thanks for your input. I probably was not quite clear on how my material is organized. I am using Custom Tables. My file is appr. 20.000 individual cases (patients) with appr. 50 variables on each record. There are appr. 150 treatment facilities with a varying number of patients. These facilities are numbered but not consecutively (i.e. 5,8,11........2000) with the name of the facility as value labels. The variables are all nominal or ordinal. The tables I will produce contain 6 columns. Every facility has 4 sets of tables with different titles. 1. An abbreviated variable name (i.e Educational level). Only for the first level. 2. The value labels within this variable (i.e the different levels) 3. Count for the different levels for the specific treatment facility 4. Column percent for the different levels for the specific treatment facility Columns 3 and 4 have as a subheading the name of of the specific facility 5. Count for the total material (including the specific facility) 6. Column percent for the total material Column 5 and 6 have as a subheading "Total material" After constructing the tables I export them to Excel. This is in order to get the horisontal lines separating the variables from each other. After that I run some macros in Excel in order to get the right page breaks and a lot of layout changes. I can do this by creating a marker variable for the total file and set this to 2 (Status=2). This will be the same for all tables.Then I select the specific tratment facility and make a separate file of this with the marker variable as 1 (Status=1). Then I merge the files together (Add Cases) and give the marker variable the value labels (1 'Name of the specific facility', 2 "Total material" After that I run Custom Tables and export the table in the output viewer to Excel and run the macros. Maybe I should also explaing that the purpose of all this is to serve each facility with tables that makes it easy for them to see there own values and compare them with the total material Following this procedure 150 times seems extremly tedious and I'm looking for solutions to simplify this process. I will look at Aggregate and see where it leads. Thanks again ViAnn! Hoping this explanation does not make it any more confusing. best Staffan -----Ursprungligt meddelande----- Från: ViAnn Beadle [mailto:[hidden email]] Skickat: den 1 juni 2008 14:48 Till: 'Staffan Lindberg'; [hidden email] Ämne: RE: Constructing tables I'm a bit unclear as to what is in these tables but it seems to me that AGGREGATE with mode ADDVARIABLES could be used to add total data to facility data This assumes that your facility data are in a single case. Perhaps you could provide the list more information on your file structure. Also, what reporting procedure within SPSS are you using? -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Staffan Lindberg Sent: Sunday, June 01, 2008 4:33 AM To: [hidden email] Subject: Constructing tables Dear list! I'm constructing a set of tables (appr 50 vars) comparing each specific treatment facility (appr. 150) with the sum of total facilities (including the specific facility). My approach to this would be to use 2 files (the total file and a selection of the specific treatment facility). Then I would add the 2 files together setting a marker variable to 1 for the second instance of the specific file and to 2 for the total file. These selections and mergings would then have to be done c:a 150 times. I have a sneaking suspicion that there must be a more elegant and time-saving solution than this. Some things I ponder about: 1. Is there a possibility to syntactically duplicate the values for a facility directly into the active file? 2. Is there a possibility to save outfiles by another variable (i.e. Save outfile='Unit' by Treatfac, thus automatically generating Unit1, Unit2 .......UnitN as outfiles? 3. Am I not seing the forest because of all the trees? (not an uncommon occurence!). There is of course a much simpler solution than mine, but which? 4. Python would as usual solve all these problems (although I haven't started with it yet but will attend Ray's course later) ? Any hints would be very much appreciated best Staffan Lindberg Sweden ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by ViAnn Beadle
Apologies for this possible reposting. Ordinarily whem I post to the list I
get an acknoledgement that it has been distributed to so and so many recipients followed by a stream of "Out of Office" postings. This time nothing of this happened. This makes me suspect that it was not posted properly. I'm sitting on a somewhar shaky line. Apologies if You have seen this before. best Staffan Lindberg Sweden Hi ViAnn! Thanks for your input. I probably was not quite clear on how my material is organized. I am using Custom Tables. My file is appr. 20.000 individual cases (patients) with appr. 50 variables on each record. There are appr. 150 treatment facilities with a varying number of patients. These facilities are numbered but not consecutively (i.e. 5,8,11........2000) with the name of the facility as value labels. The variables are all nominal or ordinal. The tables I will produce contain 6 columns. Every facility has 4 sets of tables with different titles. 1. An abbreviated variable name (i.e Educational level). Only for the first level. 2. The value labels within this variable (i.e the different levels) 3. Count for the different levels for the specific treatment facility 4. Column percent for the different levels for the specific treatment facility Columns 3 and 4 have as a subheading the name of of the specific facility 5. Count for the total material (including the specific facility) 6. Column percent for the total material Column 5 and 6 have as a subheading "Total material" After constructing the tables I export them to Excel. This is in order to get the horisontal lines separating the variables from each other. After that I run some macros in Excel in order to get the right page breaks and a lot of layout changes. I can do this by creating a marker variable for the total file and set this to 2 (Status=2). This will be the same for all tables.Then I select the specific tratment facility and make a separate file of this with the marker variable as 1 (Status=1). Then I merge the files together (Add Cases) and give the marker variable the value labels (1 'Name of the specific facility', 2 "Total material" After that I run Custom Tables and export the table in the output viewer to Excel and run the macros. Maybe I should also explaing that the purpose of all this is to serve each facility with tables that makes it easy for them to see there own values and compare them with the total material Following this procedure 150 times seems extremly tedious and I'm looking for solutions to simplify this process. I will look at Aggregate and see where it leads. Thanks again ViAnn! Hoping this explanation does not make it any more confusing. best Staffan -----Ursprungligt meddelande----- Från: ViAnn Beadle [mailto:[hidden email]] Skickat: den 1 juni 2008 14:48 Till: 'Staffan Lindberg'; [hidden email] Ämne: RE: Constructing tables I'm a bit unclear as to what is in these tables but it seems to me that AGGREGATE with mode ADDVARIABLES could be used to add total data to facility data This assumes that your facility data are in a single case. Perhaps you could provide the list more information on your file structure. Also, what reporting procedure within SPSS are you using? -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Staffan Lindberg Sent: Sunday, June 01, 2008 4:33 AM To: [hidden email] Subject: Constructing tables Dear list! I'm constructing a set of tables (appr 50 vars) comparing each specific treatment facility (appr. 150) with the sum of total facilities (including the specific facility). My approach to this would be to use 2 files (the total file and a selection of the specific treatment facility). Then I would add the 2 files together setting a marker variable to 1 for the second instance of the specific file and to 2 for the total file. These selections and mergings would then have to be done c:a 150 times. I have a sneaking suspicion that there must be a more elegant and time-saving solution than this. Some things I ponder about: 1. Is there a possibility to syntactically duplicate the values for a facility directly into the active file? 2. Is there a possibility to save outfiles by another variable (i.e. Save outfile='Unit' by Treatfac, thus automatically generating Unit1, Unit2 .......UnitN as outfiles? 3. Am I not seing the forest because of all the trees? (not an uncommon occurence!). There is of course a much simpler solution than mine, but which? 4. Python would as usual solve all these problems (although I haven't started with it yet but will attend Ray's course later) ? Any hints would be very much appreciated best Staffan Lindberg Sweden ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Your original post came through yesterday. You're not likely to get many
responses on a Sunday. Here's some things to think about. Why not put the facility variable in the layer and ask for stats on that layer? One table specification will do it and you can print it directly from within SPSS getting one table per layer value. Also, you can use a table look to specify only a border between the column labels and the table body. -----Original Message----- From: Staffan Lindberg [mailto:[hidden email]] Sent: Monday, June 02, 2008 8:30 AM To: 'ViAnn Beadle'; [hidden email] Subject: Reposting? Apologies for this possible reposting. Ordinarily whem I post to the list I get an acknoledgement that it has been distributed to so and so many recipients followed by a stream of "Out of Office" postings. This time nothing of this happened. This makes me suspect that it was not posted properly. I'm sitting on a somewhar shaky line. Apologies if You have seen this before. best Staffan Lindberg Sweden Hi ViAnn! Thanks for your input. I probably was not quite clear on how my material is organized. I am using Custom Tables. My file is appr. 20.000 individual cases (patients) with appr. 50 variables on each record. There are appr. 150 treatment facilities with a varying number of patients. These facilities are numbered but not consecutively (i.e. 5,8,11........2000) with the name of the facility as value labels. The variables are all nominal or ordinal. The tables I will produce contain 6 columns. Every facility has 4 sets of tables with different titles. 1. An abbreviated variable name (i.e Educational level). Only for the first level. 2. The value labels within this variable (i.e the different levels) 3. Count for the different levels for the specific treatment facility 4. Column percent for the different levels for the specific treatment facility Columns 3 and 4 have as a subheading the name of of the specific facility 5. Count for the total material (including the specific facility) 6. Column percent for the total material Column 5 and 6 have as a subheading "Total material" After constructing the tables I export them to Excel. This is in order to get the horisontal lines separating the variables from each other. After that I run some macros in Excel in order to get the right page breaks and a lot of layout changes. I can do this by creating a marker variable for the total file and set this to 2 (Status=2). This will be the same for all tables.Then I select the specific tratment facility and make a separate file of this with the marker variable as 1 (Status=1). Then I merge the files together (Add Cases) and give the marker variable the value labels (1 'Name of the specific facility', 2 "Total material" After that I run Custom Tables and export the table in the output viewer to Excel and run the macros. Maybe I should also explaing that the purpose of all this is to serve each facility with tables that makes it easy for them to see there own values and compare them with the total material Following this procedure 150 times seems extremly tedious and I'm looking for solutions to simplify this process. I will look at Aggregate and see where it leads. Thanks again ViAnn! Hoping this explanation does not make it any more confusing. best Staffan -----Ursprungligt meddelande----- Från: ViAnn Beadle [mailto:[hidden email]] Skickat: den 1 juni 2008 14:48 Till: 'Staffan Lindberg'; [hidden email] Ämne: RE: Constructing tables I'm a bit unclear as to what is in these tables but it seems to me that AGGREGATE with mode ADDVARIABLES could be used to add total data to facility data This assumes that your facility data are in a single case. Perhaps you could provide the list more information on your file structure. Also, what reporting procedure within SPSS are you using? -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Staffan Lindberg Sent: Sunday, June 01, 2008 4:33 AM To: [hidden email] Subject: Constructing tables Dear list! I'm constructing a set of tables (appr 50 vars) comparing each specific treatment facility (appr. 150) with the sum of total facilities (including the specific facility). My approach to this would be to use 2 files (the total file and a selection of the specific treatment facility). Then I would add the 2 files together setting a marker variable to 1 for the second instance of the specific file and to 2 for the total file. These selections and mergings would then have to be done c:a 150 times. I have a sneaking suspicion that there must be a more elegant and time-saving solution than this. Some things I ponder about: 1. Is there a possibility to syntactically duplicate the values for a facility directly into the active file? 2. Is there a possibility to save outfiles by another variable (i.e. Save outfile='Unit' by Treatfac, thus automatically generating Unit1, Unit2 .......UnitN as outfiles? 3. Am I not seing the forest because of all the trees? (not an uncommon occurence!). There is of course a much simpler solution than mine, but which? 4. Python would as usual solve all these problems (although I haven't started with it yet but will attend Ray's course later) ? Any hints would be very much appreciated best Staffan Lindberg Sweden ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Staffan Lindberg
I have posted a beta version of a case-control matching module to SPSS Developer Central (www.spss.com/devcentral). It is implemented as an SPSS Extension command, for those not familiar with Python, but its main function can be used directly in Python code as well.
Te syntax description is below. Comments and bug reports should be sent to [hidden email] I hope you will find this useful. Regards, Jon Peck CASECTRL Match cases from two datasets drawing randomly from matching cases. CASECTRL DEMANDERDS=dsname SUPPLIERDS=dsname BY=list of keys SUPPLIERID=varname NEWDEMANDERIDVARS=list of variable names Optional parameters: COPYTODEMANDER=list of supplier variable names MATCHGROUPVAR = variable name (default is "matchgroup") DEMANDERID = variable name DS3 = dataset name /OPTIONS SAMPLEWITHREPLACEMENT=TRUE or FALSE (default) MINIMIZEMEMORY = TRUE (default) or FALSE SHUFFLE = TRUE or FALSE (default) SEED = number /HELP. Example: CASECTRL DEMANDERDS=demand SUPPLIERDS = supply BY=agegroup gender SUPPLIERID = id NEWDEMANDERIDVARS=supplierId. CASECTRL takes two datasets, a demander and a supplier. It attempts to find a match for each demander case from the supplier dataset based on the variables named in BY. If more than one candidate matches, it picks randomly. No sorting of either dataset is required. This procedure builds some possibly large tables in memory, so it may not be appropriate for very large datasets. There are several output options. The ID or IDs of matching cases are appended to the demander dataset variables. The number of variables listed as NEWDEMANDERIDVARS determines how many matches are attempted. These variables must not already exist in the demander dataset. The variables in the supplier dataset that are listed in COPYTODEMANDER are copied to the demander dataset as new variables or replacement values. For existing variables, the types must agree. If no match is found, existing demander dataset values are not changed and new variable values will be sysmis or blank. Only one MATCHGROUPVAR may be specified if COPYTODEMANDER is used. None of the metadata such as variable and value labels is copied. Use APPLY DICTIONARY to bring over variable properties. If DS3 is specified, a new dataset is created containing the cases in the supplier dataset actually used for the matches. It will be the active dataset after the command is run. (This implies that any unnamed dataset will be closed.) It contains all the variables from the supplier dataset plus the MATCHGROUPVAR. If DEMANDERID is specified, it also contains the ID variable from the demander dataset. These variable names must all be unique. DS3 is only a dataset: use the SAVE command to turn it into a file. By default, sampling from the supplier dataset is done without replacement. Specify SAMPLEWITHREPLACEMENT=TRUE to sample with replacement. By default, memory usage is minimized in picking supplier dataset candidate match cases (all eligible cases have an equal probability of selection). This requires an extra data pass. If the possible number of matching cases for a demander case is small or the supplier dataset is not large, specifying MINIMIZEMEMORY=FALSE may improve performance by eliminating the extra data pass. In the case of 1-1 matching, this is recommended. By default, cases in the demander dataset are processed in case order. If there are insufficient supplier cases, you may specify SHUFFLE=TRUE to process the demander cases in random order. This ensures that earlier cases do not have an advantage over later ones in matching. SHUFFLE increases the memory requirement and will take longer to execute. Use SEED=number to set the random number generator to a known state for repeatability. CASECTRL /HELP. prints this output and does nothing else. Example: CASECTRL DEMANDERDS=demander SUPPLIERDS=supplier BY=origin cylinder SUPPLIERID=id NEWDEMANDERIDVARS=matchedcaseid COPYTODEMANDER=mpg randomnumber randomstring DS3=dsextra DEMANDERID=demanderid. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
