Archiving Gmail

On a recent mac power users episode, Katie and David discussed a few email backup systems. They look really useful, but for my needs they are too sophisticated. I just want to download local copies of my email to my mac on a regular basis. So I thought I’d cook up a script for that.

I use gmail, but am using mail.app as a back end to do a bunch of sorting and, now, backing up. I decided to go with an applescript that is run every few hours by Keyboard maestro. I wanted a script that looks at the “All Mail” folder of gmail and copies all the recent emails to a local mailbox on my mac.

First, I want to keep track of the last time the backup was run, so I create a .txt file to store the date whenever the backup runs, so I only copy the recent emails since then. I also like to file the emails by year on my mac, so put a piece of code in that checks for the year mailbox and creates it, if it doesn’t exist. There are some emails in my “All Mail” that I never read, so to remove that unread label in my folder I mark them all as read in the end.

John Gruber posted a nifty inbox sweeper script that does something similar, but is not applicable to my workflow, since I like to sweep my inbox manually when I process emails to zero. I did get a neat trick from his script though, which copies all selected emails without looping, speeding up the code. I also used mascripter’s great tutorial on accessing files.

To adopt this, you need to create the LastDate_EmailBackup.txt file and put in a date from which you want to start the backup, e.g. in m/d/y format, and adjust the archive mailboxes to your needs.

set todaysDate to current date 
set theText to todaysDate as string 
set theHomePath to (path to home folder) as string 
set thefilePath to theHomePath & "Library:ScriptSupport:LastDate_EmailBackup.txt" as string 
set oldArchiveDate to date (read file thefilePath from 1) 
set newArchiveDate to oldArchiveDate - 1 * hours --allow for some time cushion
set theOldYear to year of oldArchiveDate as string 
set thenewYear to year of newArchiveDate as string 

tell application "Mail" 
set theMailbox to mailbox "All Mail" of mailbox "[Gmail]" of account "Gmail" 

if theOldYear is thenewYear then 
   set selectedMessages to (a reference to (every message of theMailbox whose date received is greater than newArchiveDate)) 
   if mailbox thenewYear of mailbox "Archive" exists then 
      copy selectedMessages to mailbox thenewYear of mailbox "Archive" 
      set read status of every message in mailbox thenewYear of mailbox "Archive" to true 

   else
      make new mailbox with properties {name:"Archive/" & thenewYear}
      copy selectedMessages to mailbox thenewYear of mailbox "Archive" 
      set read status of every message in mailbox thenewYear of mailbox "Archive" to true 
   end if 
else 
   set selectedMessages to every message of theMailbox whose date received is greater than newArchiveDate 
   repeat with eachMessage in selectedMessages 
      set theDate to date received of eachMessage 
      set theYear to year of theDate as string 
      if mailbox theYear of mailbox "Archive" exists then 
         copy eachMessage to mailbox theYear of mailbox "Archive" 
      else make new mailbox with properties {name:"Archive/" & theYear} 
         copy eachMessage to mailbox theYear of mailbox "Archive" 
      end if 
   end repeat 
   set read status of every message in mailbox thenewYear of mailbox "Archive" to true 
   set read status of every message in mailbox theOldYear of mailbox "Archive" to true 
end if 
end tell 
set eof of file thefilePath to 0 
write theText to file thefilePath starting at eof

UPDATE: An earlier version archived mail from up to 2 days before the last archive date. In mail.app it doesn’t seem like this would duplicate a message. However, in the actual Library/Mail folder those messages did get duplicated resulting in a huge number of occupied space. I now archive only email from at most 1 hour before the last archive date to allow for late imports in Gmail from other email addresses. This should keep the number of duplicates to a minimum.