SPSSX Discussion

merging using string key variable

Classic

List

Threaded

5 messages Options

Rajeshms

merging using string key variable

Hi All,

I have a string variable which contains repeated IDs in a data set1 and I have another string variable with repeated IDs in dataset2 which has lesser cases when compared to dataset1. Now how to merge smaller dataset (other variables) into larger dataset.

I have tried trimming and adding leading blanks zero to the string key variable for both datasets.

Its not merging for me.kindly help.

Regards,

Rajesh M S

David Marso

Re: merging using string key variable

Administrator

Really nothing useful to tell you since you don't provide any useful specifics.
Please define the problem more carefully.
Maybe a small example of the data structures, your syntax and any warnings.
I'm not sure what your trimming, padding etc are intended to achieve.
Critical: the strings MUST have the SAME width or you will get an error.
--

Rajeshms wrote

Hi All,

I have a string variable which contains repeated IDs in a data set1 and I
have another string variable with repeated IDs in dataset2 which has
lesser cases when compared to dataset1. Now how to merge smaller dataset
(other variables) into larger dataset.

I have tried trimming and adding leading blanks zero to the string key
variable for both datasets.
Its not merging for me.kindly help.
--
Regards,

Rajesh M S

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Rajeshms

Re: merging using string key variable

Hi David ,

I have diagnosis codes of string type and I have another set of diagnosis code in set 2. In set1 I have repeated diagnosis codes of string type only and with lots of other data too. In set 2 I have unique diagnosis codes of string type only with other data also.

Note set 2 has less cases and set 1 with repeated diagnosis codes has more cases. Now I need to merge set2 i.e smaller cases into bigger dataset.

I am trying still now got confused for a bit now,I am working will get it....till then I have sent a warnings for you.

Thanks a lot David.

SORT CASES BY DiagnosisCode_New (A).

IF (CHAR.LENGTH(DiagnosisCode_New) = 0) DiagnosisCode_New=DiagnosisCode.

EXECUTE.

DATASET ACTIVATE DataSet1.

SAVE OUTFILE='D:\Dheeraj\PID_Checkindate_DC_deduped.sav'

/COMPRESSED.

DATASET ACTIVATE DataSet1.

DATASET CLOSE DataSet3.

DATASET ACTIVATE DataSet1.

SAVE OUTFILE='D:\Dheeraj\PID_Checkindate_DC_deduped.sav'

/COMPRESSED.

SORT CASES BY slno (A).

DATASET ACTIVATE DataSet1.

SAVE OUTFILE='D:\Dheeraj\PID_Checkindate_DC_deduped.sav'

/COMPRESSED.

DATASET ACTIVATE DataSet6.

SAVE OUTFILE='D:\Dheeraj\ICD9.sav'

/COMPRESSED.

DATASET ACTIVATE DataSet1.

SORT CASES BY ICD9 (A).

DATASET ACTIVATE DataSet6.

SORT CASES BY ICD9 (A).

DATASET ACTIVATE DataSet1.

MATCH FILES /FILE=*

/TABLE='DataSet6'

/BY ICD9.

EXECUTE.

File #2

KEY: 800

>Error # 5131

>Duplicate key on a TABLE file. Each case on a TABLE file in MATCH FILES must

>be uniquely identified by the BY variables.

>Execution of this command stops.

Any changes made to the working file since 12-NOV-2013 19:28:49 have been lost.

The time now is 19:29:41.

DATASET ACTIVATE DataSet6.

SAVE OUTFILE='D:\Dheeraj\ICD9.sav'

/COMPRESSED.

DATASET ACTIVATE DataSet1.

MATCH FILES /FILE=*

/FILE='DataSet6'

/BY ICD9.

EXECUTE.

File #1

KEY: 0

>Warning # 5132

>Duplicate key in a file. The BY variables do not uniquely identify each case

>on the indicated file. Please check the results carefully.

File #2

KEY: 800

>Warning # 5132

>Duplicate key in a file. The BY variables do not uniquely identify each case

>on the indicated file. Please check the results carefully.

MATCH FILES /FILE=*

/FILE='DataSet6'

/BY ICD9.

EXECUTE.

File #1

KEY: 0

>Warning # 5132

>Duplicate key in a file. The BY variables do not uniquely identify each case

>on the indicated file. Please check the results carefully.

File #2

KEY: 800

>Warning # 5132

>Duplicate key in a file. The BY variables do not uniquely identify each case

>on the indicated file. Please check the results carefully.

STAR JOIN

/SELECT t0.slno, t0.PERSON_ID, t0.DiagnosisCode, t0.Checkindate_new, t0.Year, t0.Month, t0.Day, t1.Disease_Subcategory, t1.Disease, t1.ServiceLine, t1.Cat

/FROM * AS t0

/JOIN 'DataSet6' AS t1

ON t0.ICD9=t1.ICD9

/OUTFILE FILE=*.

COMPUTE cl=CHAR.LENGTH(ICD9).

EXECUTE.

STRING xx (A8).

COMPUTE xx=LTRIM(ICD9).

EXECUTE.

STRING cl1 (A8).

COMPUTE cl1=CHAR.LENGTH(xx).

>Error # 4309 in column 1024. Text: (End of Command)

>Invalid combination of data types in an assignment. Character strings may

>only be assigned to string variables. Numeric and logical quantities may only

>be assigned to numeric variables. Use the STRING command to declare new

>string variables before assigning values to them or consider using the STRING

>or NUMBER function.

>Execution of this command stops.

EXECUTE.

COMPUTE cc=CHAR.LENGTH(xx).

EXECUTE.

MATCH FILES /FILE=*

/FILE='DataSet6'

/BY ICD9.

EXECUTE.

File #1

KEY: 0

>Warning # 5132

>Duplicate key in a file. The BY variables do not uniquely identify each case

>on the indicated file. Please check the results carefully.

File #2

KEY: 800

>Warning # 5132

>Duplicate key in a file. The BY variables do not uniquely identify each case

>on the indicated file. Please check the results carefully.

File #1

KEY: .0

>Error # 5130

>File out of order. All the files in MATCH FILES must be in non-descending

>order on the BY variables. Use SORT CASES to sort the file.

>Execution of this command stops.

Any changes made to the working file since 12-NOV-2013 20:15:33 have been lost.

The time now is 20:16:33.

GET

FILE='D:\Dheeraj\PID_Checkindate_DC_deduped.sav'.

DATASET NAME DataSet7 WINDOW=FRONT.

STRING ICD9a (A8).

COMPUTE ICD9a=LTRIM(ICD9).

EXECUTE.

SORT CASES BY ICD9 (A).

DATASET ACTIVATE DataSet7.

SAVE OUTFILE='D:\Dheeraj\PID_Checkindate_DC_deduped.sav'

/COMPRESSED.

DATASET ACTIVATE DataSet6.

DATASET CLOSE DataSet1.

DATASET ACTIVATE DataSet7.

SORT CASES BY ICD9 (A).

DATASET ACTIVATE DataSet7.

SAVE OUTFILE='D:\Dheeraj\PID_Checkindate_DC_deduped.sav'

/COMPRESSED.

MATCH FILES /FILE=*

/TABLE='DataSet6'

/BY ICD9.

EXECUTE.

File #2

KEY: 800

>Error # 5131

>Duplicate key on a TABLE file. Each case on a TABLE file in MATCH FILES must

>be uniquely identified by the BY variables.

>Execution of this command stops.

Any changes made to the working file since 12-NOV-2013 20:26:05 have been lost.

The time now is 20:28:06.

NEW FILE.

DATASET NAME DataSet8 WINDOW=FRONT.

SAVE OUTFILE='C:\Users\Administrator\Desktop\t1.sav'

/COMPRESSED.

NEW FILE.

DATASET NAME DataSet9 WINDOW=FRONT.

SAVE OUTFILE='C:\Users\Administrator\Desktop\t2.sav'

/COMPRESSED.

DATASET ACTIVATE DataSet9.

SAVE OUTFILE='C:\Users\Administrator\Desktop\t2.sav'

/COMPRESSED.

DATASET ACTIVATE DataSet8.

SAVE OUTFILE='C:\Users\Administrator\Desktop\t1.sav'

/COMPRESSED.

DATASET ACTIVATE DataSet9.

SORT CASES BY ICD9 (A).

DATASET ACTIVATE DataSet8.

SORT CASES BY ICD9 (A).

DATASET ACTIVATE DataSet8.

SAVE OUTFILE='C:\Users\Administrator\Desktop\t1.sav'

/COMPRESSED.

DATASET ACTIVATE DataSet9.

SAVE OUTFILE='C:\Users\Administrator\Desktop\t2.sav'

/COMPRESSED.

MATCH FILES /FILE=*

/TABLE='DataSet8'

/BY ICD9.

EXECUTE.

File #2

KEY: 1.00

>Error # 5131

>Duplicate key on a TABLE file. Each case on a TABLE file in MATCH FILES must

>be uniquely identified by the BY variables.

>Execution of this command stops.

Any changes made to the working file since 12-NOV-2013 20:32:31 have been lost.

The time now is 20:32:41.

DATASET ACTIVATE DataSet8.

MATCH FILES /FILE=*

/TABLE='DataSet9'

/BY ICD9.

EXECUTE.

File #2

KEY: 1.00

>Error # 5131

>Duplicate key on a TABLE file. Each case on a TABLE file in MATCH FILES must

>be uniquely identified by the BY variables.

>Execution of this command stops.

Any changes made to the working file since 12-NOV-2013 20:32:28 have been lost.

The time now is 20:33:12.

DATASET ACTIVATE DataSet6.

DATASET CLOSE DataSet8.

DATASET ACTIVATE DataSet6.

DATASET CLOSE DataSet9.

DATASET ACTIVATE DataSet7.

MATCH FILES /FILE=*

/FILE='DataSet6'

/RENAME (ICD9 = d0)

/IN source01

/DROP= d0.

VARIABLE LABELS source01 'Case source is DataSet6'.

EXECUTE.

MATCH FILES /TABLE=*

/FILE='DataSet6'

/BY ICD9.

EXECUTE.

File #1

KEY: 000

>Error # 5131

>Duplicate key on a TABLE file. Each case on a TABLE file in MATCH FILES must

>be uniquely identified by the BY variables.

>Execution of this command stops.

Any changes made to the working file since 12-NOV-2013 21:12:28 have been lost.

The time now is 21:14:39.

DATASET ACTIVATE DataSet7.

SAVE OUTFILE='D:\Dheeraj\PID_Checkindate_DC_deduped.sav'

/COMPRESSED.

MATCH FILES /FILE=*

/TABLE='DataSet6'

/BY ICD9.

EXECUTE.

File #2

KEY: 800

>Error # 5131

>Duplicate key on a TABLE file. Each case on a TABLE file in MATCH FILES must

>be uniquely identified by the BY variables.

>Execution of this command stops.

Any changes made to the working file since 12-NOV-2013 21:16:26 have been lost.

The time now is 21:17:48.

On Tue, Nov 12, 2013 at 9:14 PM, David Marso <[hidden email]> wrote:

Really nothing useful to tell you since you don't provide any useful
specifics.
Please define the problem more carefully.
Maybe a small example of the data structures, your syntax and any warnings.
I'm not sure what your trimming, padding etc are intended to achieve.
Critical: the strings MUST have the SAME width or you will get an error.
--

Rajeshms wrote

> Hi All,
>
> I have a string variable which contains repeated IDs in a data set1 and I
> have another string variable with repeated IDs in dataset2 which has
> lesser cases when compared to dataset1. Now how to merge smaller dataset
> (other variables) into larger dataset.
>
> I have tried trimming and adding leading blanks zero to the string key
> variable for both datasets.
> Its not merging for me.kindly help.
> --
> Regards,
>
> Rajesh M S

-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/merging-using-string-key-variable-tp5722994p5722997.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Regards,

Rajesh M S

David Marso

Re: merging using string key variable

Administrator

" In set 2 I have unique diagnosis ..."
...
">Error # 5131
>Duplicate key on a TABLE file. Each case on a TABLE file in MATCH FILES
must
>be uniquely identified by the BY variables.
>Execution of this command stops."

It seems that SPSS begs to differ ;-(
You need to get rid of the dups in the TABLE.
Be sure both files are SORTED ascending by the Key variable.
HTH, David

Rajeshms wrote

Hi David ,

I have diagnosis codes of string type and I have another set of diagnosis
code in set 2. In set1 I have repeated diagnosis codes of string type only
and with lots of other data too. In set 2 I have unique diagnosis codes of
string type only with other data also.

Note set 2 has less cases and set 1 with repeated diagnosis codes has more
cases. Now I need to merge set2 i.e smaller cases into bigger dataset.

I am trying still now got confused for a bit now,I am working will get
it....till then I have sent a warnings for you.

Thanks a lot David.

SORT CASES BY DiagnosisCode_New (A).
IF (CHAR.LENGTH(DiagnosisCode_New) = 0) DiagnosisCode_New=DiagnosisCode.
EXECUTE.
DATASET ACTIVATE DataSet1.

SAVE OUTFILE='D:\Dheeraj\PID_Checkindate_DC_deduped.sav'
/COMPRESSED.
DATASET ACTIVATE DataSet1.
DATASET CLOSE DataSet3.
DATASET ACTIVATE DataSet1.

SAVE OUTFILE='D:\Dheeraj\PID_Checkindate_DC_deduped.sav'
/COMPRESSED.
SORT CASES BY slno (A).
DATASET ACTIVATE DataSet1.

SAVE OUTFILE='D:\Dheeraj\PID_Checkindate_DC_deduped.sav'
/COMPRESSED.
DATASET ACTIVATE DataSet6.
DATASET ACTIVATE DataSet6.

SAVE OUTFILE='D:\Dheeraj\ICD9.sav'
/COMPRESSED.
DATASET ACTIVATE DataSet1.
SORT CASES BY ICD9 (A).
DATASET ACTIVATE DataSet6.
SORT CASES BY ICD9 (A).
DATASET ACTIVATE DataSet1.
MATCH FILES /FILE=*
/TABLE='DataSet6'
/BY ICD9.
EXECUTE.
File #2
KEY: 800

>Error # 5131
>Duplicate key on a TABLE file. Each case on a TABLE file in MATCH FILES
must
>be uniquely identified by the BY variables.
>Execution of this command stops.

<SNIP remaining messages >
--
Regards,

Rajesh M S

PRogman

Re: merging using string key variable

In reply to this post by Rajeshms

If I remember correctly the ICD-9 diagnose coding is always 5 characters with *trailing* blanks if the code is shorter than 5 characters.
You could try 'Find duplicate cases' on the Data menu to see if something is wrong in the look-up table.

/PR