Filter Unique Email Addresses And Verify Validity

By Jimmy Bonney | June 24, 2012

Find unique

When gathering contact lists, it is quite common to end up with quite a few duplicate email addresses. In order to do some cleanup in those lists, I have created a simple ruby script that accepts a CSV file as input and analyzes the list to extract only unique entries.

The reason to create the script in the first place was that some entries were containing multiple email addresses for the same contact. When comparing lists it was therefore not easy to see if the email address was present in one list and not in the other. The first step was therefore to create one line per email address which then allowed for an easier cleanup of duplicate entries.

Since some of the email addresses in my list were quite old, I have added a simple verification on whether the domain of each address contains some valid MX server.

More detailed information is available on the project page.

There is of course lots of room for improvement. A few interesting articles and source code are available below if you would like to take this script to the next step:


Credits Image

County of Berks

For the time being, comments are managed by Disqus, a third-party library. I will eventually replace it with another solution, but the timeline is unclear. Considering the amount of data being loaded, if you would like to view comments or post a comment, click on the button below. For more information about why you see this button, take a look at the following article.