|
Can anyone point me to methods, references, etc for scrambling id numbers,
such as 9-digit student id numbers, to create new, nonduplicated id numbers. There's probably a number of methods. I'd like to understand a few of them. Thanks, Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Check out the extension command SPSSINC ANON available from SPSS Developer Central (www.spss.com/devcentral). It has several methods of anonymizing variable values, and some of these will guarantee a 1-1 mapping. Available methods are sequential remapping, randomization, and linear transform. Requires at least V17 and the Python programmability plug-in. HTH, Jon Peck SPSS, an IBM Company [hidden email] 312-651-3435
Can anyone point me to methods, references, etc for scrambling id numbers, such as 9-digit student id numbers, to create new, nonduplicated id numbers. There's probably a number of methods. I'd like to understand a few of them. Thanks, Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Maguin, Eugene
one method is to do something like this untested example.
compute randomvariable = rv.uniform(0,2e31). sort cases by randomvariable. compute newid = $casenum. *device j: is some removable medium, thumb drive, floppy, etc. xsave outfile = 'j:\project\idkey.sav' /keep = oldid newid. save outfile = 'd:\project\newworking.sav' /drop = oldid. another is to use the id's as strings. (tested). *use a random set of numbers from 1 to 9 from table of random number, drawing from a hat, etc. *once someone knows one oldid and the corresponding newid they can derive the order. data list list/oldid(a9). begin data. 123456789 987654321 914638527 end data. string newid (a9). do repeat oldorder = 1 to 9/ranorder = 9,1,4,6,3,8,5,2,7. compute substr(newid,oldorder,1) = substr(oldid,ranorder,1). end repeat. LIST . Art Kendall Social Research Consultants Gene Maguin wrote: > Can anyone point me to methods, references, etc for scrambling id numbers, > such as 9-digit student id numbers, to create new, nonduplicated id numbers. > There's probably a number of methods. I'd like to understand a few of them. > > Thanks, Gene Maguin > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
Jon Peck SPSS, an IBM Company [hidden email] 312-651-3435
one method is to do something like this untested example. >>>That will work - in fact sorting might not even be needed, but it has the disadvantage that if new cases are added to the original dataset and the data need to be reanonymized, the IDs for the previous cases will change. If that matters, the old mapping needs to be reapplied first. Also, this assumes that there are no duplicate records. If the data are, say, transactions and there are multiple entries with the same SSN, after remapping, the anonymous IDs will be different. The SPSSINC ANON extension command handles both of these situations. HTH, Jon Peck compute randomvariable = rv.uniform(0,2e31). sort cases by randomvariable. compute newid = $casenum. *device j: is some removable medium, thumb drive, floppy, etc. xsave outfile = 'j:\project\idkey.sav' /keep = oldid newid. save outfile = 'd:\project\newworking.sav' /drop = oldid. another is to use the id's as strings. (tested). *use a random set of numbers from 1 to 9 from table of random number, drawing from a hat, etc. *once someone knows one oldid and the corresponding newid they can derive the order. data list list/oldid(a9). begin data. 123456789 987654321 914638527 end data. string newid (a9). do repeat oldorder = 1 to 9/ranorder = 9,1,4,6,3,8,5,2,7. compute substr(newid,oldorder,1) = substr(oldid,ranorder,1). end repeat. LIST . Art Kendall Social Research Consultants Gene Maguin wrote: > Can anyone point me to methods, references, etc for scrambling id numbers, > such as 9-digit student id numbers, to create new, nonduplicated id numbers. > There's probably a number of methods. I'd like to understand a few of them. > > Thanks, Gene Maguin > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Maguin, Eugene
Gene Maguin wrote:
> Can anyone point me to methods, references, etc for scrambling id numbers, > such as 9-digit student id numbers, to create new, nonduplicated id numbers. > There's probably a number of methods. I'd like to understand a few of them. I'm not an expert in the area, but you might want to read up about hash functions. The Wikipedia entry: http://en.wikipedia.org/wiki/Cryptographic_hash_function is a good starting point. -- Steve Simon, Standard Disclaimer The Monthly Mean is celebrating its first anniversary. Find out more about the newsletter that dares to call itself "average" at www.pmean.com/news ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
How about this possible approach:
Step 1: A rank variable is computed using the ID number: RANK VARIABLES=ID (A) /RANK /PRINT=YES /TIES=CONDENSE. The new variable, RID, would have no duplicates. (I believe configuring RANK this way would yield unique rank numbers even with multiple lines of data per ID.) Step 2: A key file, consisting of the side-by-side SPSS columns ID and RID, is saved. Step 3: The key file is kept secure and separate from the original data by someone with proper clearance to have access to ID numbers. This key file could be saved in a physically separated location, such as on a thumb drive or CD or DVD. Step 4: The ID is then deleted from the working analysis file, so the data in the working analysis file are now identified only by RID. It seems to me that so long as the IDs in the working file are a subset of a relatively large population of possible and actual ID numbers, and so long as the key file is separated from the work file, the list of identifying IDs cannot possibly be reproduced from the work file, alone. The data might be identifiable from other clues in the file, but not from RID, unless the physical security of the key file is compromised. Am I correct? Stevan Lars Nielsen, Ph.D. Clinical Professor Clinical Psychologist 1500 WSC, BYU Provo, UT 84602 801-422-3035; fax 801-422-0175 -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Steve Simon, P.Mean Consulting Sent: Monday, December 07, 2009 1:22 PM To: [hidden email] Subject: Re: 'scrambling' id numbers--methods Gene Maguin wrote: > Can anyone point me to methods, references, etc for scrambling id numbers, > such as 9-digit student id numbers, to create new, nonduplicated id numbers. > There's probably a number of methods. I'd like to understand a few of them. I'm not an expert in the area, but you might want to read up about hash functions. The Wikipedia entry: http://en.wikipedia.org/wiki/Cryptographic_hash_function is a good starting point. -- Steve Simon, Standard Disclaimer The Monthly Mean is celebrating its first anniversary. Find out more about the newsletter that dares to call itself "average" at www.pmean.com/news ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
