Finding Duplicate Records

In the article "Automating TABLES" it was shown how an SML program may be used to examine a table one record at a time. The following routine, which runs in TABLES, tags multiple occurrences of an item or set of items in the currently selected table. It adds an item called "DUP" which is set to "Y" for all duplicate records (otherwise it's blank).
&r dup [begin_item {end_item} / #ALL]

checks a table for duplicate records based on an item or group of items.


arguments

begin_item - an item name will be used to perform the check. If {end_item} is specified, [begin_item] is the first item of a group of items that will be used to perform the check.

end_item - the last item forming a group of items that will be used to perform the check. If {end_item} is not specified, only [begin_item] is used to perform the check.

#ALL - all items will be used to perform the check.


DUP works by sorting the table on the specified items and comparing each record to the previous. While the SORT limit is 9 items, up to 100 items may be examined for duplicates. The table's original order is stored in a temporary item (XX_RECNO) and restored after the check.


&routine dup

&define i -8 &var
&define j -9 &var
&define item1 -12 &var
&define item2 -13 &var
&define numit -14 &var
&define numrec -15 &var
&define dup -16 &var
&define wksp -18 &var
&define temp -19 &var

&goto usage &if &eq "x%-1" "x"
&goto usage &if &eq "x%-1" "x/?"

&rem **** load items into variables 101+

&extract [item1] -1 1
&extract [item2] -2 1
&value [wksp] WKSP
&openw [wksp]t$items.lis
ITEMS
&closew
&sv [i] .FALSE.
&sv [numit] 0
&open [wksp]t$items.lis error
&while &do
   &read [temp] [break]
   &extract -1 [temp] 1
   &if &ne "x%-1" "x" &and &nm %-1 &do
      &extract -1 [temp] 2
      &extract -2 [temp] 4
      &if &eq "x[item1]" "x#ALL" &do
         &inc [numit]
         &sv %<[numit] + 100> "%-1 %-2"
      &elseif &eq [item1] %-1 &do
         &inc [numit]
         &sv %<[numit] + 100> "%-1 %-2"
         &if &ne "x[item2]" "x" &do
            &sv [i] .TRUE.
         &end
      &elseif &eq [i] .TRUE. &do
         &inc [numit]
         &sv %<[numit] + 100> "%-1 %-2"
         &if &eq %-1 [item2] &do
            &sv [i] .FALSE.
         &end
      &end
   &end
&end
&close
& DEL [wksp]t$items.lis
&if &eq [numit] 0 &do
   &type "ERROR:  No items matched"
   &return
&end

&rem **** add duplicate tag and temp recno items

ASEL
&sv [temp]
ADDITEM
DUP
1
C
[temp]
XX_RECNO
11
N
0
[temp]
[temp]
CALC XX_RECNO = $recno

&rem **** get number of records

STATISTICS XX_RECNO # [numrec] 0 0 0 0

&rem **** sort table

&sv [i] 1
&sv [j]
&while &rn [i] 1 9 &and &rn [i] 1 [numit] &do
   &extract [temp] %<[i] + 100> 1
   &sv [j] "[j] [temp]"
   &inc [i]
&end
SORT [j]

&rem **** load first record into variables 201+

RES $RECNO = 1
&sv [i] 1
&while &rn [i] 1 [numit] &do
   &extract -1 %<[i] + 100> 1
   &extract -2 %<[i] + 100> 2
   &if &eq %-2 C &do
      MOVE %-1 to %<[i] + 200>
   &else
      CALC %<[i] + 200> = %-1
   &end
   &inc [i]
&end
ASEL

&rem **** scroll through table and assign DUP values

MOVE ' ' to DUP
&sv [i] 2
&while &rn [i] 2 [numrec] &do
   RES $RECNO = [i]
   &type "Analyzing record [i]..."
   &sv [dup] .TRUE.
   &sv [j] 1
   &while &rn [j] 1 [numit] &do
      &extract -1 %<[j] + 100> 1
      &extract -2 %<[j] + 100> 2
      &value -3 %<[j] + 200>
      &if &eq %-2 C &do
         MOVE %-1 to -4
      &else
         CALC -4 = %-1
      &end
      &if &ne %-3 %-4 &do
         &sv [dup] .FALSE.
      &end
      &value %<[j] + 200> -4
      &inc [j]
   &end
   &if &eq [dup] .TRUE. &do
      MOVE 'Y' to DUP
   &end
   ASEL
   &inc [i]
&end

&rem **** restore table

SORT XX_RECNO
DROPITEM XX_RECNO
Y
&type "Done."
&return

&label error
&type "I/O ERROR"
&return

&label usage
&delim < >
&type "Usage: &r dup [begin_item {end_item} / #ALL]"
&delim [ ]
&return

For example, VEG.SPECIES contains the following items:

COLUMN   ITEM NAME     WIDTH   TYPE   N.DEC
     1   MUNTNUMB         19      N       5
    20   MUNTCOMP         19      N       5
    39   SPP               5      C       0
    44   PCT              19      N       5
    63   LIFEFORM          5      C       0
The following command will examine MUNTNUMB, MUNTCOMP, and SPP for duplicates:

&r dup muntnumb spp
In this case, no duplicates will be found. If, however, item MUNTCOMP is dropped and the following command is issued:

&r dup #all
The following duplicates will be found:
$RECNO            MUNTNUMB SPP                   PCT LIFEFORM DUP
000033            279.0000 AGCR               5.0000 Grass    Y
000034            279.0000 AGSM              15.0000 Grass    Y
000044            279.0000 ARTR2             20.0000 Shrub    Y
000046            279.0000 CEMO2              0.0100 Shrub    Y
000061            279.0000 SEMU3              0.1000 Forb     Y
000066            279.0000 BOGR2             20.0000 Grass    Y
000075            279.0000 GUSA2              0.0100 Shrub    Y
000085            290.0000 LUAR3              4.0000 Forb     Y
000090            290.0000 MUMO               2.0000 Grass    Y
000092            290.0000 SIHY               0.5000 Grass    Y
000096            290.0000 QUGA               5.0000 Shrub    Y
000099            290.0000 PIPOS             65.0000 Tree     Y
000101            290.0000 ANRO2              0.1000 Forb     Y
000110            290.0000 POFE               3.0000 Grass    Y
000120            672.0000 CALOC              0.0100 Forb     Y
000132            672.0000 BOGR2             25.0000 Grass    Y
000137            672.0000 ARTR2             15.0000 Shrub    Y
000142            672.0000 PUTR2              2.0000 Shrub    Y
000144            672.0000 SPPA2              0.3000 Shrub    Y
000150            672.0000 HYRI               0.0100 Forb     Y
000157            672.0000 AGSM              15.0000 Grass    Y
000164            672.0000 SIHY               8.0000 Grass    Y
000169            672.0000 GUSA2              2.0000 Shrub    Y
000171            672.0000 QUGA              10.0000 Shrub    Y

Return to ArcTips page