SPSSX Discussion

'scrambling' id numbers--methods

Classic

List

Threaded

6 messages Options

Maguin, Eugene

'scrambling' id numbers--methods

Can anyone point me to methods, references, etc for scrambling id numbers,
such as 9-digit student id numbers, to create new, nonduplicated id numbers.
There's probably a number of methods. I'd like to understand a few of them.

Thanks, Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jon K Peck

Re: 'scrambling' id numbers--methods

Check out the extension command SPSSINC ANON available from SPSS Developer Central (www.spss.com/devcentral). It has several methods of anonymizing variable values, and some of these will guarantee a 1-1 mapping. Available methods are sequential remapping, randomization, and linear transform.

Requires at least V17 and the Python programmability plug-in.

HTH,

Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435

From:	Gene Maguin <[hidden email]>
To:	[hidden email]
Date:	12/07/2009 08:37 AM
Subject:	[SPSSX-L] 'scrambling' id numbers--methods
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

Can anyone point me to methods, references, etc for scrambling id numbers, such as 9-digit student id numbers, to create new, nonduplicated id numbers. There's probably a number of methods. I'd like to understand a few of them. Thanks, Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall

Re: 'scrambling' id numbers--methods

In reply to this post by Maguin, Eugene

one method is to do something like this untested example.

compute randomvariable = rv.uniform(0,2e31).
sort cases by randomvariable.
compute newid = $casenum.
*device j: is some removable medium, thumb drive, floppy, etc.
xsave outfile = 'j:\project\idkey.sav' /keep = oldid newid.
save outfile = 'd:\project\newworking.sav' /drop = oldid.

another is to use the id's as strings. (tested).
*use a random set of numbers from 1 to 9 from table of random number,
drawing from a hat, etc.
*once someone knows one oldid and the corresponding newid they can
derive the order.
data list list/oldid(a9).
begin data.
123456789
987654321
914638527
end data.
string newid (a9).
do repeat oldorder = 1 to 9/ranorder = 9,1,4,6,3,8,5,2,7.
compute substr(newid,oldorder,1) = substr(oldid,ranorder,1).
end repeat.
LIST .

Art Kendall
Social Research Consultants

Gene Maguin wrote:

> Can anyone point me to methods, references, etc for scrambling id numbers,
> such as 9-digit student id numbers, to create new, nonduplicated id numbers.
> There's probably a number of methods. I'd like to understand a few of them.
>
> Thanks, Gene Maguin
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall
Social Research Consultants

Jon K Peck

Re: 'scrambling' id numbers--methods

Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435

From:	Art Kendall <[hidden email]>
To:	[hidden email]
Date:	12/07/2009 09:28 AM
Subject:	Re: [SPSSX-L] 'scrambling' id numbers--methods
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

one method is to do something like this untested example.
>>>That will work - in fact sorting might not even be needed, but it has the disadvantage that if new cases are added to the original dataset and the data need to be reanonymized, the IDs for the previous cases will change.

If that matters, the old mapping needs to be reapplied first.
Also, this assumes that there are no duplicate records. If the data are, say, transactions and there are multiple entries with the same SSN, after remapping, the anonymous IDs will be different.

The SPSSINC ANON extension command handles both of these situations.

HTH,
Jon Peck compute randomvariable = rv.uniform(0,2e31). sort cases by randomvariable. compute newid = $casenum. *device j: is some removable medium, thumb drive, floppy, etc. xsave outfile = 'j:\project\idkey.sav' /keep = oldid newid. save outfile = 'd:\project\newworking.sav' /drop = oldid. another is to use the id's as strings. (tested). *use a random set of numbers from 1 to 9 from table of random number, drawing from a hat, etc. *once someone knows one oldid and the corresponding newid they can derive the order. data list list/oldid(a9). begin data. 123456789 987654321 914638527 end data. string newid (a9). do repeat oldorder = 1 to 9/ranorder = 9,1,4,6,3,8,5,2,7. compute substr(newid,oldorder,1) = substr(oldid,ranorder,1). end repeat. LIST . Art Kendall Social Research Consultants Gene Maguin wrote: > Can anyone point me to methods, references, etc for scrambling id numbers, > such as 9-digit student id numbers, to create new, nonduplicated id numbers. > There's probably a number of methods. I'd like to understand a few of them. > > Thanks, Gene Maguin > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Steve Simon, P.Mean Consulting

Re: 'scrambling' id numbers--methods

In reply to this post by Maguin, Eugene

Gene Maguin wrote:

> Can anyone point me to methods, references, etc for scrambling id numbers,
> such as 9-digit student id numbers, to create new, nonduplicated id numbers.
> There's probably a number of methods. I'd like to understand a few of them.

I'm not an expert in the area, but you might want to read up about hash
functions. The Wikipedia entry:

http://en.wikipedia.org/wiki/Cryptographic_hash_function

is a good starting point.
--
Steve Simon, Standard Disclaimer
The Monthly Mean is celebrating its first anniversary.
Find out more about the newsletter that dares
to call itself "average" at www.pmean.com/news

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Stevan Nielsen

Re: 'scrambling' id numbers--methods

How about this possible approach:

Step 1: A rank variable is computed using the ID number:

RANK VARIABLES=ID (A)
/RANK
/PRINT=YES
/TIES=CONDENSE.

The new variable, RID, would have no duplicates. (I believe configuring RANK this way would yield unique rank numbers even with multiple lines of data per ID.)

Step 2: A key file, consisting of the side-by-side SPSS columns ID and RID, is saved.

Step 3: The key file is kept secure and separate from the original data by someone with proper clearance to have access to ID numbers. This key file could be saved in a physically separated location, such as on a thumb drive or CD or DVD.

Step 4: The ID is then deleted from the working analysis file, so the data in the working analysis file are now identified only by RID.

It seems to me that so long as the IDs in the working file are a subset of a relatively large population of possible and actual ID numbers, and so long as the key file is separated from the work file, the list of identifying IDs cannot possibly be reproduced from the work file, alone. The data might be identifiable from other clues in the file, but not from RID, unless the physical security of the key file is compromised.

Am I correct?

Stevan Lars Nielsen, Ph.D.
Clinical Professor
Clinical Psychologist
1500 WSC, BYU
Provo, UT 84602

801-422-3035; fax 801-422-0175

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Steve Simon, P.Mean Consulting
Sent: Monday, December 07, 2009 1:22 PM
To: [hidden email]
Subject: Re: 'scrambling' id numbers--methods

Gene Maguin wrote:

> Can anyone point me to methods, references, etc for scrambling id numbers,
> such as 9-digit student id numbers, to create new, nonduplicated id numbers.
> There's probably a number of methods. I'd like to understand a few of them.

I'm not an expert in the area, but you might want to read up about hash
functions. The Wikipedia entry:

http://en.wikipedia.org/wiki/Cryptographic_hash_function

is a good starting point.
--
Steve Simon, Standard Disclaimer
The Monthly Mean is celebrating its first anniversary.
Find out more about the newsletter that dares
to call itself "average" at www.pmean.com/news

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD