JodoHost Forums  

Go Back   JodoHost Forums > The Code Dump > ASP.NET Code Snips

Reply
 
Thread Tools Display Modes
  #1  
Old 05-26-09, 06:15 PM
SubSpace SubSpace is offline
Hammerhead Shark
 
Join Date: Jan 2004
Location: The Netherlands
Posts: 1,050
Rep Power: 10
SubSpace is at Grade 1
Send a message via ICQ to SubSpace Send a message via MSN to SubSpace
Removing duplicates from a List

Just a random .NET code snippet that is probably of marginal value to you, but I happened to need it for my project.

The problem:
I have a list of arbitrary items (integers, objects, whatever) that might contain duplicates that I have to remove before doing further processing.
The catch is that I cannot change the order of the items in the list, and the whole thing should be high performance (as far as .NET goes).
If it wasn't for the order requirement, sorting the List and walking through the array, comparing element n with element n-1 would do the trick. You would need to allocate a new List to avoid lots of moving elements around when you do RemoveAts though.

I did some searching on the internet (modern day programming) and found some references to the Pollard's Rho algorithm, but it didn't seem usable in this case.

This is what I came up with:
Code:
using System.Collections.Generic;

public static void Dedupe<T>(List<T> list)
{
	HashSet<T> items = new HashSet<T>(list);

	int writePos = 0;
	for (int readPos = 0; readPos < list.Count; readPos++)
	{
		T item = list[readPos];
		if (items.Contains(item))
		{
			// Item is still in the hash set, so we're seeing it for the first time
			list[writePos] = item;
			items.Remove(item);
			writePos++;
		}
	}

	list.RemoveRange(writePos, list.Count - writePos);
}
The HashSet Contains() and Remove() operations are O(1) operations. Building the initial HashSet should be O(n), as well as walking through the List once. I left out List.TrimExcess() as I'm deallocating the List shortly after anyway.

Good thing I found out about HashSet, I had never actually used it before... Now let's see if it performs as well as I hope
__________________
Reply With Quote
  #2  
Old 06-10-09, 06:51 PM
largerabbit's Avatar
largerabbit largerabbit is offline
Spiny Dogfish Shark
 
Join Date: Jan 2005
Posts: 114
Rep Power: 6
largerabbit is at Grade 1
Re: Removing duplicates from a List

Cool,

could you use linq? Assuming .net 3.5

var dedupeList= fulllist.Distinct().ToList();

Filtering might be quicker as well.

Cheers,

paul.

Last edited by largerabbit; 06-10-09 at 06:51 PM. Reason: can't spell!
Reply With Quote
  #3  
Old 06-12-09, 05:46 PM
SubSpace SubSpace is offline
Hammerhead Shark
 
Join Date: Jan 2004
Location: The Netherlands
Posts: 1,050
Rep Power: 10
SubSpace is at Grade 1
Send a message via ICQ to SubSpace Send a message via MSN to SubSpace
Re: Removing duplicates from a List

Hmm, that would have worked. For some reason I wasn't sure that it would maintain item order, but since it is an extension method of IEnumerable, that makes sense. Looking at the Distinct implementation, it appears to use an internal Set class very similar to HashSet. In fact, apparently I'm not the only one reinventing the wheel..

The implementation adds to the Set instead of removing like it did in my code, which would be slightly faster I suppose. On the other hand, my implementation does the job in-place, which makes for less garbage collection for one thing.
I wouldn't have cared enough about it had I thought of Distinct(), but now that I have it, I might as well keep it around
__________________
Reply With Quote
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
removing web services snooper Reseller Hosting Support 2 06-07-06 08:48 AM
Preventing Duplicates Harrison Database Support 2 09-23-04 06:28 PM
removing password protection LegalAlien Shared Hosting Support 2 07-02-04 10:59 AM
Setting values of 2 combo boxes w/ same row source and eliminating duplicates etalent Database Support 3 04-13-04 03:02 PM


All times are GMT -4. The time now is 06:18 AM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
(c) 2002 - 2009 JodoHost