Tuesday, May 21, 2013

AS400 SQL to figure out what duplicate records exist in a file, but a field value contains special value

My dilemma today has been that I have to create an SQL to tell me out of a 700+ million record file, what duplicate records there are in the table, and to only select them if a certain field is a certain value.  I should be able to put some brains together tomorrow and figure this out, but if anyone out there knows the magic for this I'd be happy to hear it ahead of time so I can impress management with my technical prowess.  :)

So here's the problem:

1.  I have an AS400 file that contains more than 700 million records (growing daily).  In this file, there are 100 fields all together, and there are 5 fields that when combined, identify the record as unique (to me anyway).  There are 3 other fields besides the unique key of importance in this file.

2.  In the file, I am trying to come up with the SQL statement to retrieve all records from the file where there is not a corresponding record with the same unique key (field1 + field2 + field3 + field4 + field5), and where field6 = 'AAA' and field 7 = 'XX' and field8 = 'XX'. 

3.  The final output should give me only records that are in the table once, based on the 5 key fields, and only when field6, field7, and field8 equals the input values.

I think I am very close with the following SQL statement.  I was running late today, so didn't have time to put the finishing touches on it.


select * from file1 where                 
(field1,field2,field3,field4,field5) in      
(                                                   
select field1,field2,field3,field4,field5    
from file1                                       
group by                                            
field1,field2,field3,field4,field5           
having count(*)=1                                   
)                                                   
and field6 = 'AAA'                                   
and field7 = 'XX'                                   
and field8 = 'XX'                                                      

I think this is giving the correct information, although I need to validate it some more.  I am telling the SQL query to select all fields from file1, where the 5 unique key fields are found in the inner table one time only with the having count(*)=1 statement, grouping by the 5 key fields.  On the outer query, I am limiting the results to only those where the other 3 fields are equal to the specific values.

I will put the finishing touches on this query tomorrow (if needed), and will update my entry if anything changed.


1 comment:

  1. Just in case you were wondering.....The SQL above worked like a charm! :)

    ReplyDelete