Friday, March 30, 2012

How to remove rows where only part of the row is duplicated

Hi,

I've got a db table containing 5 columns(excluding id) consisting of
1.) First Half of a UK postcode
2.) Town name to which postcode belongs
3.) Latitude of Postcode
4.) Longitude of Postcode
5.) Second Part of the Postcode

I want to select columns 1,2,3 and 4, but once only. There are often
several entries where 1 and 2 are the same but 3 and 4 are different
i.e.
WA1Bewsey and Whitecross53.386492-2.596847
WA1Bewsey and Whitecross53.388203-2.590961
WA1Bewsey and Whitecross53.388875-2.598504
WA1Fairfield and Howley53.388455-2.581701
WA1Fairfield and Howley53.396117-2.571789

My current query is
SELECT DISTINCT Postcode, Town, latitude, longitude
FROM Postcode
WHERE Postcode.Postcode = 'wa1'
ORDER BY Postcode, Town

However as latitude and longitude differ on each line DISTINCT does
not do what I'm looking for.
Can anybody suggest a way changing the query to just give the first
instance of each Postcode/Town combo?
I.E.
WA1Bewsey and Whitecross53.386492-2.596847
WA1Fairfield and Howley53.388455-2.581701

Many thanks!
DrewThere isn't really any 'first' instance unless you
define your own ordering. However, assuming you have a unique ID
column
this should work

SELECT a.Postcode, a.Town, a.latitude, a.longitude
FROM Postcode a
WHERE NOT EXISTS (SELECT * FROM Postcode b
WHERE b.Postcode=a.Postcode
AND b.Town=a.Town
AND b.ID>a.ID)
ORDER BY a.Postcode, a.Town|||(andylole@.gmail.com) writes:

Quote:

Originally Posted by

I've got a db table containing 5 columns(excluding id) consisting of
1.) First Half of a UK postcode
2.) Town name to which postcode belongs
3.) Latitude of Postcode
4.) Longitude of Postcode
5.) Second Part of the Postcode
>
I want to select columns 1,2,3 and 4, but once only. There are often
several entries where 1 and 2 are the same but 3 and 4 are different
i.e.
WA1 Bewsey and Whitecross 53.386492 -2.596847
WA1 Bewsey and Whitecross 53.388203 -2.590961
WA1 Bewsey and Whitecross 53.388875 -2.598504
WA1 Fairfield and Howley 53.388455 -2.581701
WA1 Fairfield and Howley 53.396117 -2.571789
>
My current query is
SELECT DISTINCT Postcode, Town, latitude, longitude
FROM Postcode
WHERE Postcode.Postcode = 'wa1'
ORDER BY Postcode, Town
>
However as latitude and longitude differ on each line DISTINCT does
not do what I'm looking for.
Can anybody suggest a way changing the query to just give the first
instance of each Postcode/Town combo?


A simple way out would be:

SELECT Postcode, Town, AVG(latitude), AVG(longitude)
FROM Postcode
GROUP BY Postcode, Town

Of course, this yield data that is not in the table at all, but it's a
reasonable assumption that the different lat/long values are in the same
proximity. And if they are not, you have a much bigger problem anyway.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Erland,

Your solution is perfect!
Many thanks to all for helping.
Cheers,
Andy.

No comments:

Post a Comment