SPSSX Discussion

Correlation between 2 datasets -HELP please!

Classic

List

Threaded

3 messages Options

George-106

Correlation between 2 datasets -HELP please!

There are 2 arrays of corresponding data: A (serial) and B(code)
A B

0000000 3
0000001 9
0000002 4
....... .
9999999 7

Every B(code) is a result of unknown mathematical permutation of digits of
A (serial) e.g.:

A (serial)=0000000: SN0=0, SN1=0, SN2=0, SN3=0, SN4=, SN5=0,SN6=0
B (code)=3

The aim is to find out how this mathematical calculation is done, which
could be very difficult, therefore I constructed an algorithm based on
latin squares 10x10 calculation (elements are 0,1,2,3,4,5,6,7,8,9)

Sample calculation looks like this:

LS â latin square (array of 100 elements)

Temp_1=10*SN0+SN6
Temp_2=10*LS[Temp_1]+SN5
Temp_3=10*LS[Temp_2]+SN4
........................
Temp_n=10*LS[Temp_n-1]+SN2

Finally the idea is to construct the proper latin square 10x10 for which
every Temp_n will be equal B(code), using above calculations.

Question is which function/method can I use in SPSS to construct that latin
square 10x10, by analyzing datasets A(serials), B(codes) and above sample
calculations.

Please, help, I would appreciate if somebody could provide me with some
working examples in SPPSâ¦

Thank you in advance,
George

Richard Ristow

Re: Correlation between 2 datasets -HELP please!

This won't give you what you want, I'm afraid. But, at 04:17 AM
4/18/2007, George wrote:

>There are 2 arrays of corresponding data: A (serial) and B(code)
>A B
>
>0000000 3
>0000001 9
>0000002 4
>....... .
>9999999 7

Out of curiosity: do you have all 10**7 instances?

>Every B(code) is a result of unknown mathematical [operation on]
>digits of
>A (serial). The aim is to find out how this mathematical calculation
>is done.
>
>I constructed an algorithm based on latin squares 10x10 calculation
[...]
>Please, help, I would appreciate if somebody could provide me with
>some working examples in SPSS.

I'm out of my area, here; I can't tell you either how to implement such
an algorithm, or whether it's a good way to approach the problem.

SPSS can do much of what general-purpose programming languages can do,
so you might be able to do it. But I suspect you'd need things like
two-dimensional (or higher) arrays, which could only be simulated in
SPSS, and awkwardly at that. (If you have SPSS 14 or 15, Python will
probably be far more suitable. Even then, you may find you're using
almost pure Python, with its connection to SPSS offering you little
benefit.)

And you may not find help on this list. This isn't the kind of problem
SPSS is good at. Somebody on the list may have the knowledge to help
you, but if so, it's because of another interest they happen to have.

If I had the problem, I'd hunt up somebody in Computer Science, and
show them. There's not an infinite collection of "checksum" algorithms
in use; somebody may be able to glance at a few cases, and say, "Oh,
that's it." Or, at least to give you more help with selecting and
implementing search algorithms than you may find here - or to know a
colleague who can.

Perhaps somebody here can yet give you more help, but I fear that
neither SPSS nor we SPSS people are the prime tools for this work.

Good luck to you,
Richard

George-106

Re: Correlation between 2 datasets -HELP please!

In reply to this post by George-106

Dear Richard,

Thank you for your quick response and help.

I had some brief training in SPSS in 1993, and I was surprised with its
ability of huge data processing. Of course I do not have 10**7 instances, I
have just a part of it and the idea is to find out how those code
calculation are done from serial numbers by evaluating that part of
datasets.

I developed project in VC++, which completes Latin square to resolve this
problem. Code calculation algorithm is based on Latin square calculation and
it works fine, but the problem is of course the speed of data processing.
There are 13 variables in the calculation algorithm for 1 dataset, the range
of each variable is 0 to 9, in order to complete the Latin square 10x10 (100
elements) at least there is need to evaluate 40 sample datasets i.e. it
gives 13*40= 520 loops, which is endless process...

That is why I wanted to combine SPSS data processing speed with my Latin
square calculation algorithm.

Thank you and best regards,
George