|
Hello,
I have inherited a dataset with ID codes that were poorly constructed. I would like to run some syntax to reorder the information in the string, and also create id codes that are the same length. In the "ID" field I have a string variable (width 10) which contains three pieces of information 1. the year - as two digits (for example 2001-2002 is "0102") 2. the program (training is "T", and advanced training is "AT") 3. the participant # (01, 02, thru about 44) Right now the order is participant#, program, year. This is really bad for sorting. Here is an example of two codes. 02T0102 (second participant in the basic training which took place 2001- 2002). 02AT0102 (second participant in the advanced training which took place 2001-2002). The first problem is that the ID codes are different lengths because for one of the programs "training" they used only 1 character "T" and for the advanced training they used an "AT". The second problem is the order of the information. I would really like them to read: A010202 (program, year, participant) T010202 (for basic training) And THEN I want to be able to split the program and the years out into separate variables. Many thanks for any advice. Diane |
|
At 12:59 PM 4/4/2007, Diane Bartlett wrote:
>I have inherited a dataset with ID codes that were poorly constructed. >In the "ID" field I have a string variable (width 10) which contains >three pieces of information > >1. the year - as two digits (for example 2001-2002 is "0102") That's four digits, and it does matter. Do you want your year codes like this? 0001 -> 2000-2001 0102 -> 2001-2002 0203 -> 2002-2003 For what it's worth, there's a useful convention for fiscal years, which also span calendar years: name them by the highest calendar year. That would assign codes 2001, 2002, and 2003, for the years above, and I think that's a good deal clearer. (I don't advocate two-digit year codes like '01', '02'.) >2. the program (training is "T", and advanced training is "AT"). [A] >problem is that the ID codes are different lengths because "training" >they used only 1 character "T" and for advanced training they used >"AT". I take it, you want to use "T" and "A". Makes sense. >3. the participant # (01, 02, thru about 44) If you're making a string, allow three digits for participant number. '44' is within a factor of two of overflowing a two-digit field, and that's dangerously close. >Here is an example of two codes. >02T0102 (second participant in the basic training which took place >2001- >2002). > >02AT0102 (second participant in the advanced training which took place >2001-2002). > >I would really like them to read: >A010202 (program, year, participant) >T010202 (for basic training) OK, except I'd strongly consider, for reasons I gave above, A2002001 T2002001 >And THEN I want to be able to split the program and the years out into >separate variables. Would it be OK just to do this, or do you want the ID string as well? I'll leave this, at least for now. There are lots of other good programmers here. -Best of wishes, Richard |
|
In reply to this post by D Bartlett
Hi Diane, as I thought about a syntax solution to this using Substr(a1,a2,a3) as seen in the syntax manual, it occured to me that with so few cases you could code this by hand faster that with syntax. If this were my task, I would make 3 new variables, year, program, id. Then put your current id field next to the new field and quickly add in the values. I would also recommend the coding changes that Richard mentioned. However, this is a fun syntax problem, so I am sure you might yet get that solution also.
Good luck. meljr
|
|
In reply to this post by D Bartlett
Since the first 2 digits are always the participant number, you could
use a simple substring (as I'm sure you know) for that portion. (TESTED SYNTAX). In the following, you create the individual items, then string them together in the order you desire. string pids (a2). compute pids=Substr(ID,1,2). *(If you want numeric ids instead, use this, but note that these will not be useable in creating newid. * string #pid (A2). * compute #pid=Substr(ID,1,2). * recode #pid (convert) into pid. * freq pid.) **If all of your program ids are present as either A or T and as appears, always begin in col 3--you can use. **If any program ids are missing or something else, un-star the 2 commands before running. string program (A1). do if (substr(ID,3,1)="A"). compute program='A'. else if (substr(ID,3,1)="T"). compute program='T'. *else. * compute program='X'. end if. freq program. *** then based on the value of program, select the years from the right place in the ID string. string years (a4). do if (program="A"). compute years=substr(ID,5,4). else if (program="T"). compute years=substr(ID,4,4). end if. freq years. ** Finally put them back in the order you want. String newid (A8). compute newid=concat(program,years,pids). freq newid. > Here is an example of two codes. > 02T0102 > 02AT0102 > > The second problem is the order of the information. I would really > like them to read: > A010202 (program, year, participant) > T010202 (for basic training) Melissa The bubbling brook would lose its song if you removed the rocks. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of meljr Sent: Wednesday, April 04, 2007 1:43 PM To: [hidden email] Subject: Re: [SPSSX-L] Transforming a poorly created string variable Hi Diane, as I thought about a syntax solution to this using Substr(a1,a2,a3) as seen in the syntax manual, it occured to me that with so few cases you could code this by hand faster that with syntax. If this were my task, I would make 3 new variables, year, program, id. Then put your current id field next to the new field and quickly add in the values. I would also recommend the coding changes that Richard mentioned. However, this is a fun syntax problem, so I am sure you might yet get that solution also. Good luck. meljr Diane Bartlett wrote: > > Hello, > > I have inherited a dataset with ID codes that were poorly constructed. > I would like to run some syntax to reorder the information in the > string, and also create id codes that are the same length. > > In the "ID" field I have a string variable (width 10) which contains > three pieces of information > > 1. the year - as two digits (for example 2001-2002 is "0102") 2. the > program (training is "T", and advanced training is "AT") 3. the > participant # (01, 02, thru about 44) > > Right now the order is participant#, program, year. This is really > bad for sorting. > > Here is an example of two codes. > 02T0102 (second participant in the basic training which took place > 2001- 2002). > > 02AT0102 (second participant in the advanced training which took place > 2001-2002). > > > The first problem is that the ID codes are different lengths because > for one of the programs "training" they used only 1 character "T" and > for the advanced training they used an "AT". > > > The second problem is the order of the information. I would really > like them to read: > A010202 (program, year, participant) > T010202 (for basic training) > > And THEN I want to be able to split the program and the years out into > separate variables. > > Many thanks for any advice. > > Diane > > -- View this message in context: http://www.nabble.com/Transforming-a-poorly-created-string-variable-tf35 27049.html#a9843101 Sent from the SPSSX Discussion mailing list archive at Nabble.com. PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance. |
| Free forum by Nabble | Edit this page |
