Friday 15 February 2019

Removing duplicates from VB file


Consider I have a VB file of length 321 bytes (4 for RDW + 317 actual length) and wanted to remove duplicate records from a file.

Here is the sample JCL: 

//REMODUPL   EXEC PGM=SORT                                     
//SORTIN   DD DSN=INPFILE1,DISP=SHR                 
//SYSIN    DD *                                                
  OPTION VLSHRT                                                
  SORT FIELDS=(5,317,CH,A)                                     
  SUM FIELDS=NONE                                              
/*                                                             
//SORTOUT  DD DSN=OUTFILE1,DISP=(NEW,CATLG,DELETE),           
//         RECFM=VB,LRECL=321,AVGREC=K
//SYSOUT   DD SYSOUT=* 

VLSHRT tells DFSORT to temporarily replace any missing control field bytes with binary zeros.

For eg.       

As you are aware that VB file will have records of different lengths and the maximum length of the record in the VB file will be treated as LRECL of that VB file.

During the sort say i have two records one with length 300 and another with 317, then the record with 300 bytes will be temporarily replaced with extra 17 binary zeros (300+17 binary zeros) to run the sort smoothly.  These binary zeros are temporary and will NOT be copied to output dataset. 

The default OPTION is NOVLSHRT (reverse of VLSHRT) which terminates the sort if it finds shorter record than the one specified in SORT FIELDS.