compute a new variable - combining 2 values - problems

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

compute a new variable - combining 2 values - problems

thara vardhan-2
Hi Richard and Andrew

Thank you so much for helping me with the syntax for combining two
variable values in into a new variable. However when I run the syntax
I get the results as follows:

NUMERIC  Combined  (F17).
COMPUTE  Combined = ERefnum *1E9
                   + Personcni.

Erefnum                Personcni      Combined

32177910        629906838     32177910629906840

32351144        754507633     32351144754507632

32427317        749196914     32427317749196912

32565970        553238422     32565970553238424


As you can see the first value in the variable 'combined' has been rounded
up to 629906840
Likewise the fourth value has been rounded up to 553238424

Can you please tell me where am I going wrong?

As to the solution suggested by you Andrew I get the result as:

COMPUTE newvar=ERefnum*100000000 + Personcni.

Erefnum                Personcni     newvar

32177910        629906838    3217791629906838

32351144        754507633     3235115154507633

32218970        737679628    3221897737679628

32218970        737679628     3221897737679628

32218970        726061072     3221897726061072

Also I tried to make both the variables as string and concatenate but I
get several error messages. Moreever It would be help me in (further
calculations) If I have resultant variable as numeric rather than a
string.

Many thanks

regards
Thara Vardhan
Senior Statistician
Performance Improvement & Planning

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The information contained in this email is intended for the named recipient(s)
only. It may contain private, confidential, copyright or legally privileged
information.  If you are not the intended recipient or you have received this
email by mistake, please reply to the author and delete this email immediately.
You must not copy, print, forward or distribute this email, nor place reliance
on its contents. This email and any attachment have been virus scanned. However,
you are requested to conduct a virus scan as well.  No liability is accepted
for any loss or damage resulting from a computer virus, or resulting from a delay
or defect in transmission of this email or any attached file. This email does not
constitute a representation by the NSW Police Force unless the author is legally
entitled to do so.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: compute a new variable - combining 2 values - problems

ViAnn Beadle
This is way too big a number to store with all these exact digits--what is
the point of combining these values? Do you really want to do arithmetic on
this?

Without your exact syntax and error messages, we can only guess at what the
problem is.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Thara Vardhan
Sent: Thursday, December 11, 2008 6:40 PM
To: [hidden email]
Subject: compute a new variable - combining 2 values - problems

Hi Richard and Andrew

Thank you so much for helping me with the syntax for combining two
variable values in into a new variable. However when I run the syntax
I get the results as follows:

NUMERIC  Combined  (F17).
COMPUTE  Combined = ERefnum *1E9
                   + Personcni.

Erefnum                Personcni      Combined

32177910        629906838     32177910629906840

32351144        754507633     32351144754507632

32427317        749196914     32427317749196912

32565970        553238422     32565970553238424


As you can see the first value in the variable 'combined' has been rounded
up to 629906840
Likewise the fourth value has been rounded up to 553238424

Can you please tell me where am I going wrong?

As to the solution suggested by you Andrew I get the result as:

COMPUTE newvar=ERefnum*100000000 + Personcni.

Erefnum                Personcni     newvar

32177910        629906838    3217791629906838

32351144        754507633     3235115154507633

32218970        737679628    3221897737679628

32218970        737679628     3221897737679628

32218970        726061072     3221897726061072

Also I tried to make both the variables as string and concatenate but I
get several error messages. Moreever It would be help me in (further
calculations) If I have resultant variable as numeric rather than a
string.

Many thanks

regards
Thara Vardhan
Senior Statistician
Performance Improvement & Planning

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _

The information contained in this email is intended for the named
recipient(s)
only. It may contain private, confidential, copyright or legally privileged
information.  If you are not the intended recipient or you have received
this
email by mistake, please reply to the author and delete this email
immediately.
You must not copy, print, forward or distribute this email, nor place
reliance
on its contents. This email and any attachment have been virus scanned.
However,
you are requested to conduct a virus scan as well.  No liability is accepted
for any loss or damage resulting from a computer virus, or resulting from a
delay
or defect in transmission of this email or any attached file. This email
does not
constitute a representation by the NSW Police Force unless the author is
legally
entitled to do so.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: compute a new variable - combining 2 values - problems

Richard Ristow
In reply to this post by thara vardhan-2
At 08:39 PM 12/11/2008, Thara Vardhan wrote:

>When I run the syntax I get the results as follows:
>
>NUMERIC  Combined  (F17).
>COMPUTE  Combined = ERefnum *1E9
>                    + Personcni.
>
>Erefnum         Personcni     Combined
>
>32177910        629906838     32177910629906840
>32351144        754507633     32351144754507632
>32427317        749196914     32427317749196912
>32565970        553238422     32565970553238424
>
>
>As you can see the first value in the variable 'combined' has been rounded
>up to 629906840 Likewise the fourth value has been rounded up to 553238424

Right. As ViAnn Beadle wrote at 10:51 PM 12/11/2008,
>This is way too big a number to store with all these exact digits

Exactly. SPSS can store integers of up to 15 digits exactly, as well
as 16-digit integers whose first digit is 8 or less. (I'll put the
draft FAQ on SPSS numbers at the end of this posting.) There's no way
SPSS can hold the exact numerical values you want.

>It would be help me in (further calculations) if I have resultant
>variable as numeric rather than a string.

As ViAnn Beadle also said, you may want to post how you want to use
those values as numbers. It can't be done directly, but there are
usually alternative approaches.

>I tried to make both the variables as string and concatenate but I
>get several error messages.

And as ViAnn also said, we'd need the error messages to see what the
problem is.

But catenating seems to work for me. The combined numeric value is
rounded, as you experienced, but the combined string one seems to be right.
|-----------------------------|---------------------------|
|Output Created               |12-DEC-2008 00:38:00       |
|-----------------------------|---------------------------|
  Erefnum Personcni

32177910 629906838
32351144 754507633
32427317 749196914
32565970 553238422

Number of cases read:  4    Number of cases listed:  4


NUMERIC CombNum (F17).
STRING  CombStr (A17).

COMPUTE CombNum   = Erefnum  * 1E9
                   + Personcni.

STRING  #ErefStr   (A8)
         #PersonStr (A9).

COMPUTE #ErefStr   = STRING(Erefnum,F8).
COMPUTE #PersonStr = STRING(Personcni,N9 /* to include leading 0s */ ).

COMPUTE CombStr    = CONCAT(RTRIM(#ErefStr),#PersonStr).

LIST.
List

|-----------------------------|---------------------------|
|Output Created               |12-DEC-2008 00:38:00       |
|-----------------------------|---------------------------|
  Erefnum Personcni           CombNum CombStr

32177910 629906838 32177910629906840 32177910629906838
32351144 754507633 32351144754507630 32351144754507633
32427317 749196914 32427317749196910 32427317749196914
32565970 553238422 32565970553238420 32565970553238422

Number of cases read:  4    Number of cases listed:  4
=================================
APPENDIX   I: Test data, and code
(WRR: Not saved separately)
=================================
DATA LIST LIST/
    Erefnum         Personcni  /* Combined */
   (F8,             F9).
BEGIN DATA
    32177910        629906838
    32351144        754507633     32351144754507632
    32427317        749196914     32427317749196912
    32565970        553238422     32565970553238424
END DATA.

LIST.

NUMERIC CombNum (F17).
STRING  CombStr (A17).

COMPUTE CombNum   = Erefnum  * 1E9
                   + Personcni.

STRING  #ErefStr   (A8)
         #PersonStr (A9).

COMPUTE #ErefStr   = STRING(Erefnum,F8).
COMPUTE #PersonStr = STRING(Personcni,N9 /* to include leading 0s */ ).

COMPUTE CombStr    = CONCAT(RTRIM(#ErefStr),#PersonStr).

LIST.
============================================
APPENDIX  II: Draft FAQ on SPSS numbers
(Revised 12-DEC-2008. I'll welcome comments)
============================================
Draft FAQ: Precision and range of SPSS numbers

In SPSS,

. Values can be between about 10**-308 and 10**308

. The precision of numbers is very nearly equivalent to 16 decimal
digits, i.e. far more than necessary for any data you're likely to encounter

. However, most non-integers, and integers with more than 15 digits,
are not stored exactly; and SPSS may not consider such numbers equal,
although their exact values are equal.

Numbers used as identifiers, for which exact matching is
essential,should be integers of no more than 15 decimal digits.

The safe rule is: There Is No Such Thing as an Exact Fraction.

DETAILS
-------
SPSS numbers use the 64-bit floating-point representation defined by
IEEE standard 754. (Many applications use it. It has a good
reputation, and most modern computers have hardware to do arithmetic
with IEEE 754 numbers.)

The precision of these numbers is 53 bits, which is very close to 16
decimal digits. It is far more than adequate for any measurement
you're likely to encounter.

Values can be between 2**-1024 to 2**1024 (about 10**-308 and 10**308).

All 15-digit integers are stored exactly, as are most 16-digit
integers. (Precisely, it's all integers from 0 through 2**53 --
9,007,199,254,740,992.)

But integers with more than 16 digits, and most non-integers, are
stored approximately.  exactly. Since the numbers are binary, NOT
16-digit decimal, common experience about what numbers are stored
exactly may be wrong: for example, 0.2 is not.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: compute a new variable - combining 2 values - solution - Reply

thara vardhan-2
In reply to this post by ViAnn Beadle
Hi Viann and Richard

Thank you for responding to my problem with combining two variables.

I needed the new variable to identify duplicate records because I have to
first identify the victims in any one event (event ref num).
I have to identify one victim per record in my datafile. Where there are 4
different victims (with distinct personcni) for one eventref number - my
data file will have 4 victim records. Where I have the same victim
personcni/eventrefnum (with 4 records) I need to choose one - the most
serious - based on  2 other variables- weapon rank or IFC rank (most
serious - given a rank of 1,2,3, etc).

My apologies - when I mentioned 'further calculations' in my earlier email
what I meant was calculation - post selection of records using the new
combined variable.  I will not be using the new combined variable to
compute anything.

Also I was under the impression that I am better off to have the combined
variable as a 'numeric' to use aggregate or match file functions. However
I ran some test files and realised that it does not make a difference.

Helmut Vorkauf, Richard Ristow and  Andrew Fenyo - thank you so much for
helping me with syntax to combine it as a string variable.

I finally worked it out. I can use any one of the following:

Option1:

string snew (A19).
compute snew=concat(ltrim(rtrim(Erefnum)),ltrim(rtrim(personcni))).
execute.

Option2:

STRING   Combined (A20).
STRING   #Padded   (A20)  /* Personcni w. leading 0s */.
COMPUTE  #Padded  = LPAD(LTRIM(Personcni),9,'0').
COMPUTE  Combined = CONCAT(RTRIM(ERefnum),#Padded).

Option3:

Option 3: Richard Ristow

STRING  CombStr (A20).
STRING  #ErefStr   (A9)
         #PersonStr (A9).

COMPUTE #ErefStr   = STRING(Erefnum,F9).
COMPUTE #PersonStr = STRING(Personcni,N9 /* to include leading 0s */ ).

COMPUTE CombStr    = CONCAT(RTRIM(#ErefStr),#PersonStr).

 Richard - thank you again for the draft FAQ on SPSS storing integers only
upto 15 digits. This is extremely useful for me.

cheers
thara


_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The information contained in this email is intended for the named recipient(s)
only. It may contain private, confidential, copyright or legally privileged
information.  If you are not the intended recipient or you have received this
email by mistake, please reply to the author and delete this email immediately.
You must not copy, print, forward or distribute this email, nor place reliance
on its contents. This email and any attachment have been virus scanned. However,
you are requested to conduct a virus scan as well.  No liability is accepted
for any loss or damage resulting from a computer virus, or resulting from a delay
or defect in transmission of this email or any attached file. This email does not
constitute a representation by the NSW Police Force unless the author is legally
entitled to do so.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Problem with Match file after using aggregate function

Richard Ristow
In reply to this post by Richard Ristow
At 09:12 PM 12/15/2008, Thara Vardhan wrote, off-list:

>I have the following variables in my dataset1
>
>ERefNum
>PersonCNI
>Combined
>IFCRANK
>WEAPONRANK
[...]

>I have computed 'combined' = ERefNum + PersonCNI

Interestingly (you asked about this before), there's no need for
variable 'combined'. You can use the two variables together as a key,
just fine.

>The above original file with duplicate records in 'combined' has
>8048 records.
>
>I have ranked Incident further classification and Weapon to pick a
>single victim record (for the duplicates) with the most serious
>incident or weapon. (most serious being 1 and so on).
>
>DATASET ACTIVATE DataSet1.
>DATASET DECLARE sample.
>AGGREGATE
>   /OUTFILE='sample'
>   /BREAK=Combined
>   /WEAPONRANK_min=MIN(WEAPONRANK)
>   /N_BREAK=N.

This isn't about what you were asking, but,
a.) There's no need for the SORT CASES and PRESORTED. AGGREGATE
actually runs faster without it, up to a very large number of break
groups -- at least 100,000.
b.) There's no need for "Combined". I'd suggest,

DATASET ACTIVATE DataSet1.
DATASET DECLARE sample.
AGGREGATE
   /OUTFILE='sample'
   /BREAK=ERefNum PersonCNI
   /WEAPONRANK_min=MIN(WEAPONRANK)
   /N_BREAK=N.


>Aggregate gives me unique records for the variable 'combined' [or
>the pair "ERefNum" and "PersonCNI"]. The total number of unique
>records are 7697.
>
>I then used Match files function to add other variables from
>dataset1 to file2 which is sample
>
>DATASET ACTIVATE sample.
>MATCH FILES /FILE=*
>   /FILE='DataSet1'
>   /BY Combined.
>
>Problem: I get this warning and match files is not working,
>
>>Warning # 5132
>>Duplicate key in a file.  The BY variables do not uniquely identify
>>each case on the indicated file.  Please check the results carefully.

Probably you want "table" for the file ('sample') in which the
records are unique:

DATASET ACTIVATE sample.
MATCH FILES
   /TABLE=*
   /FILE ='DataSet1'
   /BY ERefNum PersonCNI.

>PS: If you need some sample data I will create some dummy data based
>on my original and send it you as an SPSS file.

See if the above works, first.

Finally, you wrote,

>I am writing to you off the list to seek your help ...

It's preferable to ask on-list, though.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD