Extract Email Addresses from a Mailbox Hosted on CPanel

By Jimmy Bonney | December 28, 2012

Extract

If you are using CPanel default mailboxes, you may have noticed that the contacts from whom you receive emails are not added automatically in your address book on CPanel. This is not a big problem but if you are using a mail application such as Outlook / Thunderbird / Mail / … and collect multiple email addresses in this application, then there is no easy way to extract the contacts that belong to only one of your email addresses.

To better illustrate the situation, let’s assume that you are using a mail application on your computer / tablet that collects your professional (@your-domain.com) and your personal email addresses (for instance @gmail.com or @hotmail.com). Usually such applications will create and maintain an address book for you but unfortunately, most of the time this is a common one that will therefore contain both your professional and personal contacts in one place. If you haven’t edited and sorted your contacts on a regular basis, then you most probably won’t be able to differentiate the professional from the personal contacts. In case you wish to send out some information to your professional network only, this might end up being a tedious work.

Everything is however not completely lost if your professional emails are hosted on a server using CPanel. Provided that all your emails are still on the server, it is possible to extract the email addresses of the different people that you have been in contact with. The following paragraphs will illustrate how to do that.

To make it easy, here are the few steps that I followed. To start with, connect as root through SSH:

1
ssh root@yourhost.com -o PubkeyAuthentication=no

Then, export the mailbox of the user:

1
tar -cvzf mailbox.tar.gz /home/[user]/mail/[domain]/[mailbox]

where: [user] is the CPanel account, [domain] is the domain name on the CPanel account and [mailbox] is the name of the user. For instance, the complete path for me can be /home/jimmybonney/mail/jimmybonney.com/jimmy

Once this is done, we can download the archive locally and untar it to a local folder

1
2
scp root@yourhost.com:mailbox.tar.gz .
tar -xvzf mailbox.tar.gz

At this stage, we will use some existing tools to convert the mailbox to another format. Start by installing mutt if it is not already installed and then convert the mailbox from Maildir to Mbox format:

1
2
sudo apt-get install mutt
mutt -f [mailbox]/ -e 'set confirmcreate=no; set delete=no; push "T.*<enter>;smailbox_mbox<enter><quit>"'

where [mailbox] is the folder in which the mailbox.tar.gz was unarchived (this folder should include a cur and new directories) and mailbox_mbox is the mbox file1.

The final step consists in extracting the email addresses. To start with, save the following script as extract_emails.sh and make it executable chmod +x extract_emails.sh2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#!/bin/bash

# This script will parse an mbox file, displaying all of the From: email addresses, removing ones that
# are from postmaster, mail admins, etc

FILE=$1

if [ ! -r $FILE ]; then
 if [ -r /var/spool/mail/$FILE ]; then
  FILE="/var/spool/mail/$FILE"
 else
  echo "Sorry! Neither $FILE nor /var/spool/mail/$FILE exists, or I can't read them"
  exit
 fi
fi

grep "^From:" $FILE | egrep -vi \
"(postoffice|\
postman|\
administrator|\
bounce|\
MAILER-DAEMON|\
postmaster|\
Mail Administrator|\
Auto-reply|\
out of office|\
Mail Delivery System|\
Email Engine|\
Mail Delivery Subsystem|\
Mail.Administrator|\
non.deliverable)" |\
egrep -io "[A-Z0-9._%+-]+@[A-Z0-9.-]+\.([A-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)" |\
tr '[A-Z]' '[a-z]' |\
sort | uniq

Run the script…

1
./extract_emails mailbox_mbox > email_addresses.txt

… and all you email addresses are in the file email_addresses.txt (one email address per line).


Credits Image

Extract


For the time being, comments are managed by Disqus, a third-party library. I will eventually replace it with another solution, but the timeline is unclear. Considering the amount of data being loaded, if you would like to view comments or post a comment, click on the button below. For more information about why you see this button, take a look at the following article.