min syntax for large num concat

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

min syntax for large num concat

wsu_wright
Art, Jonathan & Richard,

Thanks for the speeding reply.  Your alpha conversion will work.  However, I will need the ID factor to be numeric for later transformations.  Since the actual number of cases is smaller than the actual ID factor, I figured I could aggregate by the new alpha ID & assing $casenum to get a unqiue numeric ID.

Listed below is my current syntax attempt but was wondering if there is a way to avoid having to open & save files, perhaps the lag function?

Thanks again for your assitance.


*create string ID from numeric factors, aggregate, assign unique numeric id & save.
dataset close all.
get file='d:data\orig.sav'
  /keep=h_idnum1 h_idnum2
string hh_id_a (A20).
compute hh_id_a=concat(string(h_idnum1,F15),string(h_idnum2,F5)).
sort cases hh_id_a.
exe.
agg
  /outfile=*
  /break=hh_id_a
  /cnt=n.
exe.
compute hh_id=$casenum.
for hh_id (F5.0).
sort cases by hh_id_a.
save outfile='d:\data\hh_id.sav'.

*open original data file, create string ID & merge with numeric ID.
dataset close all.
get file='d:data\orig.sav',
string hh_id_a (A20).
compute hh_id_a=concat(string(h_idnum1,F15),string(h_idnum2,F5)).
sort cases hh_id_a.
exe.
match files
  /file=*
  /table='d:\data\hh_id.sav'
  /by hh_id_a.
exe.
Reply | Threaded
Open this post in threaded view
|

Re: min syntax for large num concat

Art Kendall
using AUTORECODE will sort your hh_id_a then create a new variable of
sequential numeric values that has the alpha string as a value label.

Art Kendall
Social Research Consultants


[hidden email] wrote:

> Art, Jonathan & Richard,
>
> Thanks for the speeding reply.  Your alpha conversion will work.  However, I will need the ID factor to be numeric for later transformations.  Since the actual number of cases is smaller than the actual ID factor, I figured I could aggregate by the new alpha ID & assing $casenum to get a unqiue numeric ID.
>
> Listed below is my current syntax attempt but was wondering if there is a way to avoid having to open & save files, perhaps the lag function?
>
> Thanks again for your assitance.
>
>
> *create string ID from numeric factors, aggregate, assign unique numeric id & save.
> dataset close all.
> get file='d:data\orig.sav'
>   /keep=h_idnum1 h_idnum2
> string hh_id_a (A20).
> compute hh_id_a=concat(string(h_idnum1,F15),string(h_idnum2,F5)).
> sort cases hh_id_a.
> exe.
> agg
>   /outfile=*
>   /break=hh_id_a
>   /cnt=n.
> exe.
> compute hh_id=$casenum.
> for hh_id (F5.0).
> sort cases by hh_id_a.
> save outfile='d:\data\hh_id.sav'.
>
> *open original data file, create string ID & merge with numeric ID.
> dataset close all.
> get file='d:data\orig.sav',
> string hh_id_a (A20).
> compute hh_id_a=concat(string(h_idnum1,F15),string(h_idnum2,F5)).
> sort cases hh_id_a.
> exe.
> match files
>   /file=*
>   /table='d:\data\hh_id.sav'
>   /by hh_id_a.
> exe.
>
>
>
>
>
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: min syntax for large num concat

Richard Ristow
In reply to this post by wsu_wright
At 10:02 AM 9/6/2007, David Wright wrote:

>Thanks for the speeding reply.  Your alpha conversion will work.

OK, so far, so good.

>However, I will need the ID factor to be numeric for later
>transformations.  Since the actual number of cases is smaller than the
>actual ID factor, I figured I could aggregate by the new alpha ID &
>assign $casenum to get a unqiue numeric ID.

OK, here's a Ristow standard: Are you sure this is a good idea?

First, why do you need a numeric ID, if the number can be that
arbitrary?

Second, the method of sorting by a key and assigning sequential ID
numbers in the sorted list, gets you into BIG trouble if you have to
add even one new case to your set.

Third, I notice below that the combination of 'h_idnum1' and 'h_idnum2'
(or your string variable 'hh_id_a') is apparently not a unique key in
your file; there may be several cases with the same pair of
identifiers. That automatically raises questions. Is there some third
variable that can be included in the key, to make a unique identifier?

>I was wondering if there is a way to avoid having to open & save
>files, perhaps the lag function?

Try this (not tested). It does leave variable 'new_HH' cluttering up
your file:

get file='d:data\orig.sav'.
sort cases by h_idnum1 h_idnum2.
ADD FILES
   /FILE  = *
   /BY      h_idnum1 h_idnum2
   /FIRST = new_HH.

NUMERIC hh_id (F5.0).
LEAVE   hh_id.
COMPUTE hh_id = hh_id + new_HH

......................

>Listed below is my current syntax attempt

With comments interspersed:

>*create string ID from numeric factors, .
>* aggregate, assign unique numeric id & save.
>
>get file='d:data\orig.sav'
>   /keep=h_idnum1 h_idnum2.
>string hh_id_a (A20).
>compute hh_id_a=concat(string(h_idnum1,F15),string(h_idnum2,F5)).
>sort cases hh_id_a.
>exe.

1.) To make an oft-repeated point: The 'exe.' does nothing for you. And
since it forces the whole data file to be read, it can slow processing
considerably if the file is large.

2.) There's no need for 'hh_id_a'. The following would do just as well
(the indents are for readability, and are not needed):

.  get file='d:data\orig.sav'
      /keep=h_idnum1 h_idnum2.
.  sort cases h_idnum1 h_idnum2.

Going on, you have,

>agg
>   /outfile=*
>   /break=hh_id_a
>   /cnt=n.
>exe.

3.) Notice that I have changed the "break" from 'hh_id_a' to
'h_idnum1 h_idnum2'; see point 2.

4.) Here's a BIG one: is it true that your 'hh_id_a' (or, equivalently,
the two variables 'h_idnum1' and 'h_idnum2') is *NOT* a unique
identifier in your file? That's the only thing this AGGREGATE can mean.
That's a *much* bigger problem. Can you add some other variable to make
a unique identifier? Or is there simply none - which automatically
raises questions about the quality of your data?

Efficiency:
5.) The above 'exe.' alse serves no purpose, and can slow processing

6.) If you're doing this AGGREGATE, the preceding SORT CASES has no
effect, so *it* is taking processor time without any useful result..
(It took me a while to learn this about AGGREGATE.)

>compute hh_id=$casenum.
>for hh_id (F5.0).
>sort cases by hh_id_a.
>save outfile='d:\data\hh_id.sav'.

7.) Here, you're assigning one ID to each value of the long keys.
However, the SORT CASES is not necessary nor helpful; the AGGREGATED
file is already sorted by the break variables.

>* open original data file, create
>* string ID & merge with numeric ID.
>
>get file='d:data\orig.sav',
>string hh_id_a (A20).
>compute hh_id_a=concat(string(h_idnum1,F15),string(h_idnum2,F5)).
>sort cases hh_id_a.
>exe.
>match files
>   /file=*
>   /table='d:\data\hh_id.sav'
>   /by hh_id_a.
>exe.