SubSpace
Bass
Just a random .NET code snippet that is probably of marginal value to you, but I happened to need it for my project.
The problem:
I have a list of arbitrary items (integers, objects, whatever) that might contain duplicates that I have to remove before doing further processing.
The catch is that I cannot change the order of the items in the list, and the whole thing should be high performance (as far as .NET goes).
If it wasn't for the order requirement, sorting the List and walking through the array, comparing element n with element n-1 would do the trick. You would need to allocate a new List to avoid lots of moving elements around when you do RemoveAts though.
I did some searching on the internet (modern day programming) and found some references to the Pollard's Rho algorithm, but it didn't seem usable in this case.
This is what I came up with:
The HashSet Contains() and Remove() operations are O(1) operations. Building the initial HashSet should be O(n), as well as walking through the List once. I left out List.TrimExcess() as I'm deallocating the List shortly after anyway.
Good thing I found out about HashSet, I had never actually used it before... Now let's see if it performs as well as I hope
The problem:
I have a list of arbitrary items (integers, objects, whatever) that might contain duplicates that I have to remove before doing further processing.
The catch is that I cannot change the order of the items in the list, and the whole thing should be high performance (as far as .NET goes).
If it wasn't for the order requirement, sorting the List and walking through the array, comparing element n with element n-1 would do the trick. You would need to allocate a new List to avoid lots of moving elements around when you do RemoveAts though.
I did some searching on the internet (modern day programming) and found some references to the Pollard's Rho algorithm, but it didn't seem usable in this case.
This is what I came up with:
Code:
using System.Collections.Generic;
public static void Dedupe<T>(List<T> list)
{
HashSet<T> items = new HashSet<T>(list);
int writePos = 0;
for (int readPos = 0; readPos < list.Count; readPos++)
{
T item = list[readPos];
if (items.Contains(item))
{
// Item is still in the hash set, so we're seeing it for the first time
list[writePos] = item;
items.Remove(item);
writePos++;
}
}
list.RemoveRange(writePos, list.Count - writePos);
}
The HashSet Contains() and Remove() operations are O(1) operations. Building the initial HashSet should be O(n), as well as walking through the List once. I left out List.TrimExcess() as I'm deallocating the List shortly after anyway.
Good thing I found out about HashSet, I had never actually used it before... Now let's see if it performs as well as I hope