How to Clean Malware and Viruses Off a Windows PC with Free Software

Background

I recently needed to move from an old PC that had developed the click of death to a newer PC someone gave me for free—except that it was riddled with viruses and malware (spyware, adware, trojans, etc.).

I’d have liked to wipe the new PC and put a fresh Windows XP copy on it, but I didn’t have the XP installation discs for the new PC and my old PC only had Windows 2000. So in order to keep XP I had to clean off the malware.

I spent about a week researching free software options for cleaning malware and viruses from a Windows PC, and running/babysitting the PC during the process, so I thought I’d document it for anyone else who has to do this.

Clean Up Your Computer First

Before you start running malware/virus cleaner programs, you should do a different kind of cleaning first: Cleaning off unnecessary files, especially big ones.

You should look for directories of stuff you don’t need any more, or that you can burn to a CD/DVD and file away (such as old photos and music you no longer listen to). You should also clean out temporary files and any big files you don’t need; very big files (250MB+) can cause some scanners to slow to a crawl, while others just skip them entirely.

In Windows XP, you can find big files by going to My Computer, then Local Disk (C:), then right-click, Search…, What size it it? >, Large (more than 1MB), Search. Then just select ones you don’t think you’ll need any more and right-click->shift-delete (that way you don’t have to remember to empty the recycling bin). Obviously, if you don’t know what a file is, it’s safer to just leave it alone.

Programs like CCleaner and EasyCleaner can also help, especially for removing temporary files.

The main reason for cleaning up unnecessary files is that they take up scanning time, and when you run a lot of scans all that time really adds up.

Determine If You Have the Latest Windows Service Pack

At the time of this writing (Sept 2010), the most recent “service packs” (batches of Windows security and other updates) are Service Pack 3 for Windows XP and Service Pack 2 for Vista, with Service Pack 1 for Windows 7 due in the first half of 2011; see http://support.microsoft.com/sp for the latest Service Pack information about your version of Windows.

You should know which Service Pack your computer has installed and whether it’s the latest Service Pack that’s available. You can determine this by doing: Start -> Run -> msinfo32 -> OK -> System Summary -> Version.

If your PC doesn’t have the latest Service Pack for your kind of Windows, make a note of your current Service Pack (if any) and the most recent Service Pack available from Microsoft. We’ll return to updating to the latest service pack later, after cleaning your system with rescue CDs.

Download and Burn Rescue CDs

Malware/viruses that are actively running on a PC have too many ways to avoid detection and resist removal. Instead, you have to get them when they’re inert and can’t hide or fight back, and the best way is by starting up in another operating system entirely.

To do that you’ll need to download and burn at least one rescue CD, preferable all of them because each product catches different things. If you’re really careful/paranoid, use a different computer to download and burn the CDs too.

Most of the rescue CD files below are .iso files; after downloading them, double-click each to open up a Windows program allowing you to burn the image to a new CD:

(If you know of other free rescue CDs I should add to this list please let me know; I ruled out Norton Bootable Recovery Tool, Panda SafeCD, and Avast Bart CD because they are not free. If you already have a paid product like Norton or McAfee, it may have a rescue disc version too; search the company’s website for more info.)

Note: Several of these rescue CD programs require at least 512MB of memory to run. If your computer has less memory, the programs will often start up but mysteriously fail with an “unknown error” or no error at all.

Provide Internet for Rescue CDs via Ethernet (If Possible)

To work best, the rescue CDs will need to download the latest malware/virus definitions from the Internet, though if they can’t connect they all (except Trinity Rescue Disk) do have older virus definitions on the CD itself—but those can be too old if your PC was infected recently.

So it’s best if your computer is connected to the Internet so each CD can download the most recent definitions. Unfortunately all the CDs seem to assume an Ethernet connection and don’t really understand connecting wirelessly.

If you’re already online via an Ethernet cable, you’re all set—just leave it plugged in when you boot from each rescue CD.

If you connect through wireless, your PC might still have an Ethernet card; if so, just use an Ethernet cable to connect it to the back of your wireless router. If you need an Ethernet cable you can get one at a computer store like Radio Shack or Best Buy, or online via Amazon, eBay, etc.

If your PC only has a wireless card, you probably won’t be able to get updated malware/virus definition files since none of the CDs seems to work with wireless access.  Fortunately most computers still seem to have Ethernet cards so it’s just a question of buying/borrowing a cable to plug into your router.

Start Up from Rescue CDs and Scan/Clean PC

After downloading all the CD image files and burning them to CDs, you’ll need to pick one, put it in the drive, and restart your computer—though you may wish to read the comments on each CD in the next section before/as you start up from it.

Your computer should then start from the rescue CD. If it doesn’t, and instead goes right into Windows like normal, you’ll need to go into your computer’s BIOS and change it so that it first checks the CD drive for an operating system before checking the hard disk where Windows resides.

There are many web pages describing how to change the BIOS boot order in general terms (such as this one), but if those don’t help, you’ll have to search for something like “boot from CD bios Dell Dimension 4500″.

In rare cases you might have to open up your computer and do something like fiddle with jumpers or reset the CMOS. I had to do that with one Compaq system and it wasn’t fun. Hopefully your computer will already be set to boot from its CD drive, or your computer manufacturer won’t make you jump through hoops to set that option.

Note: If you successfully start up from some rescue CDs and then do nothing, they’ll just start the rescue CD program, but for others if you do nothing they’ll go back to the normal Windows boot sequence. So after restarting your PC you should stay until the initial boot selection menu appears because if you walk away right after restarting (to get coffee, snack, etc.), you could come back to find Windows running.

Comments on Running the Rescue CDs

(Note: If you have a USB stick you should plug it in so that the rescue programs can scan it too; malware/viruses can infect USB drives and use the autorun feature to spread to new machines or reinfect yours later; this tactic even worked against U.S. Department of Defense computers in 2008.)

AVG: One of the slower programs (3 hours for 22GB). Don’t bother enabling the macro scanning option because it only tells if any macros are present, not if the macros are actually “bad”. Also if you enable the cookies option, know that “bad” cookies are just a privacy concern and not actually viruses or malware.

Avira: One of the fastest scanners (1 hour for 22GB). Starts up in German, just click the British flag in the corner to change it to English.

Kaspersky: One of the slower programs (5 hours for 22GB), but also caught a lot of things. When it first starts up and gives you the license screen, you’ll have to click somewhere in the text to activate that window, before you can press “A” to accept. Also, the first few times I ran this, I thought it only detected problems and wouldn’t fix them. Finally though I noticed a small pair of “Disinfect all” and “Quarantine” links on the lower left corner of the Detected Threats tab.

Dr. Web: The slowest program (10 hours for 22GB), but it also scanned inside big zip/archive files that other programs just skipped. Note that when shutting down with the “Eject and Shut Down” option, you have to hit F2 to view details in order to see the prompt asking you to press Enter to finish shutting down.

Trinity Rescue Kit: Has ClamAV, F-Prot, BitDefender, Vexira, and Avast; to access these, do nothing at the boot prompt to “Run Trinity Rescue Kit” in default mode, then use the down arrow to select Virus Scanning, hit Enter, then use the down arrow to choose Scan with ClamAV, Scan with F-Prot, etc. Do not choose “Set the Scan Destination” since the default is already correctly set to scan all drives. Also while there is an option to run all the virus scanners at once, when I tried this I got an error, so I tried running them one at a time and it worked fine.

ClamAV (only via Trinity Rescue Kit; requires Internet access via Ethernet): Definitely one of the slower programs (5 hours for 22GB). It’s open source so my guess is that it’s one of the more thorough scanners because many contributors constantly improve its scanning abilities.

Vexira (only via Trinity Rescue Kit; requires Internet access via Ethernet): Reasonably fast (1.5 hours for 22GB), otherwise no comments on this one.

Avast (only via Trinity Rescue Kit; requires Internet access via Ethernet): Reasonably fast (1.75 hours for 22GB). I wasn’t going to include this since it requires a license key, but I decided to because the key is freely available on request. Plus, it’s already on the Trinity Rescue Kit anyway.

F-Prot (standalone or via Trinity Rescue Kit): Reasonably fast (1.25 hours for 22GB). If running from the F-Secure Rescue CD, be careful that when the main program first starts up, it looks like pressing Enter will select the Next option, but in fact the default cursor is on Restart Computer, so you need to use the left arrow once before hitting Enter. Also its shutdown sequence ejected the disk and then just hung, so I had to turn the computer’s power off and on again in order to restart.

BitDefender (standalone or via Trinity Rescue Kit): One of the faster scanners (1 hour for 22GB). At the time of writing, the rescue disk link had two .iso files; bitdefender-rescue-cd.iso gave me an error on booting and rescue_new.iso started up fine. Also, to restart after scanning, click the little dog/bug at the bottom, which is on the same strip as the time, then Log Out.

I’ll also note that when going from one rescue CD to another, the timing is a bit tricky. You have to use the “restart” option on one CD, wait for the system to be pretty much actually shut down, and then just as it powers up again, quickly hit the eject button on your CD drive, get the old CD out, put the next rescue CD in, and then push it in (or hit the button again) and hope you did it fast enough that it gets recognized by the time the BIOS is checking the CD drive for a bootable disc. If you miss the timing, you’ll have to boot into Windows, then do a restart from there before you can boot into the next rescue CD.

If a rescue CD offers an “eject and reboot” option, use it, and as soon as the old one ejects, put the new one back in (the old one should continue with the shutdown/reboot sequence running in the computer’s memory).

Use Caution Before Downloading Other Rescue CDs

The rescue CDs listed here are what was available at the time of writing. You may be reading this long afterwards, in which case you’re welcome to search for new/better rescue discs online.

Just be sure to research any companies you’re about to download software from to make sure that they are in fact legitimate companies/software, and not malware masquerading as cleaner tools because that does happen (i.e., you can download a package to disinfect your PC and end up making it worse because you’re actually downloading more malware!).

Do not only judge legitimacy based on how “professional” a company’s website looks; scammers can make pretty sites too, including just copying other companies’ sites. Instead, Google the company name and see what comes up, especially in Wikipedia but also on other sites (and make sure the Wikipedia article doesn’t seem like it was written by the company as an advertisement!).

If you find a page where the software was “reviewed”, was the review done by a well-known independent company (e.g., C|Net, ZDNet, PCMag), or someone you’ve never heard of, so that maybe it was set up by the company itself to look like an independent review?

Also, be sure to only download from domain names associated with the company you just researched. For example, if you researched Foobar Antivirus and articles around the web seemed to say it’s a good company, you’d probably want to download from something at foobar.com or if not, something directly linked from the foobar.com site.

Update Your Winows Service Pack (If Necessary)

If you are running the latest Service Pack for your version of Windows, you can skip this section, but if you determined before that you’re not running the latest Service Pack for your version of Windows, now is the time to update it.

At this point you’ll have cleaned most/all of the malware/viruses from your system with the rescue CDs, but your Windows system still isn’t as secure as it could be against future intrusions because it doesn’t have the security fixes in the latest Service Pack.

The problem is that the Service Pack updates are big downloads, which can take an hour or more depending on connection speed—and during that time, you’ll be vulnerable to infections that exploit security holes fixed in the Service Pack you’re trying to download! This has been compared to running across a battlefield to get a bulletproof vest.

The solution is to download the latest Service Pack on another computer, copy the .exe file to a USB flash drive or burn the .iso file to a CD, disable network access on your PC (by unplugging its Ethernet cable, unplugging your wireless router, or pulling the wireless card out of your PC), then start up the PC and run the Service Pack installation program from the CD/USB drive. This should let you install the latest Service Pack files before reconnecting to the network.

You can get the latest Service Pack installation file from Microsoft, such as by Googling the version of windows and the service pack number (e.g., “Windows XP Service Pack 3″) and then looking for microsoft.com pages in the results.

The only tricky thing is that Microsoft steers you to the Windows Update page so they can automatically update the computer… except that you’ll be doing this on a different computer, not the infected one, and you don’t want to update this different computer, you want to download an installation file that you can run on your infected PC.

For example, when I searched for Windows XP Service Pack 3, the top result was to download a “white paper” about the service pack—not what I needed. This page also said, “In order to download Windows XP SP3 for one computer, you must visit Windows Update at http://update.microsoft.com. For more information, visit http://www.microsoft.com/windows/products/windowsxp/sp3/default.mspx“.

As I said, you don’t want to go to Windows Update since you’re downloading the file on a different machine in order to copy it to your previously-infected PC via CD or USB drive.

So instead I went to the second link, http://www.microsoft.com/windows/products/windowsxp/sp3/default.mspx, and at the top of that page it again told me to go to Windows Update, but further down the page was a “Manually installing SP3 using the Microsoft Download Center or a CD” section that said, “If you have problems obtaining the service pack from Windows Update, you can download SP3 as a standalone installation package from the Microsoft Download Center website, and then install SP3 manually.”

However, that brought me to a “network installation package” which was “intended for IT professionals and developers downloading and installing on multiple computers on a network”—which I didn’t really think was what I wanted. While I did try to download and run the WindowsXP-KB936929-SP3-x86-ENU.exe file on that page on my PC, I had some kind of error that confirmed my suspicions that the network installation package wasn’t right either.

Ultimately I went with the Windows XP Service Pack 3 – ISO-9660 CD Image File file and then used it to burn a Service Pack 3 CD which worked great. Hopefully, if you’re using Windows Vista or Windows 7 there will be an ISO file you can download and burn also.

Install Other Scanners & Definitions Via Another PC

To be as thorough as possible, there are a handful of other programs and definition files you can download on another computer, copy over to your still-unnetworked PC with a USB drive or CD, install the programs and definitions, and then run to scan your target PC:

With Malwarebytes and Spybot, after installing the main program you just download the definitions update program and run it. After installing Avira, download the vdf update file (vdf_fusebundle.zip), then start Avira, choose Update -> Manual Update, and locate the update file.

(Lavasoft’s Ad-Aware supposedly had an update file you could download and install manually, but I tried multiple times and each time Ad-Aware just hung.)

Install Windows Update and More Scanners

After running the rescue discs, updating your Windows Service Pack (if necessary), and possibly installing additional programs and definitions in the last section, you can finally restore network access to your PC by turning your wireless router back on, plugging your wireless card back in, or reconnecting your Ethernet cable.

The first thing you need to do once you reconnect your PC and start it up, is run Windows Update. Usually you can find this somewhere in your Start menu, such as Start -> All Programs -> Windows Update. But if you can’t find it, start Internet Explorer and go to http://update.microsoft.com

You need to run Windows Update as soon as your PC is online again because even more security issues will have been identified since the most recent Service Pack, and you running Windows Update will apply the fixes for those vulnerabilities. This may require restarting your PC, so each time it downloads updates and suggests that you restart, do it and then run Windows Update again until it says there are no more High Priority / Security updates to download.

Once that’s done you can install more programs to scan your computer for viruses/malware. Remember, not every program can catch everything out there, so if you’re serious about cleaning your computer off as much as possible you should run as many legitimate free programs as you can if your time allows.

So I’d recommend downloading and running the following programs too:

And also the following programs, if you hadn’t installed them and their definitions via another computer as described in the previous section:

Install Long-Term Protection

Additionally, for long-term protection it’s good to have one or more anti-virus programs running at all times (not just for scanning your hard drive every once in a while). This will set up a barrier to new infections by scanning web pages as you visit them, USB drives as you plug them in, files as you download them, programs as you run them, your PC’s memory as it starts up, etc.

Two decent long-term protection programs that are free for personal use (though they would love to get you to upgrade to their premium versions of course!) are:

You should install these for long-term protection and also run their scanners on your computer periodically just like the other programs.

Relax (and Pray :) )

If you’ve done everything in this guide, you’ve scanned/cleaned your PC with 10 rescue disc programs, updated your PC to the latest Windows Service Pack, installed the latest Windows security updates, and run 8 additional scanner/cleaning programs from within Windows.

While there are no guarantees where viruses and malware are concerned, this is probably about as thorough as you can get with freely available tools and chances are pretty good that you’ve removed of all the viruses and malware thad had infected it.

If you found this guide useful please let me know or feel the joy of giving back by sending a donation. And of course if you have any corrections to the links or procedures please let me know too!

0 Comments

Listserv to Mailman Part 3.3: Tips, Tricks, and Notes

During our Listserv to Mailman migration, I started keeping a tips file to remind myself of the syntax for helpful commands, so I wanted to share those tidbits here as an appendix along with some notes/caveats and useful pages on other sites.

Commands

  • To see all the settings for a mailman list: /usr/lib/mailman/bin/config_list -o – ‘listname’  (outputs to screen; use -o filename to output to a file)
  • To see a list’s settings minus comments and whitespace: config_list -o – ‘listname’ | perl -ne ‘print unless /(^#|^\s*$)/’
  • To change the password for a list: /usr/lib/mailman/bin/change_pw -l ‘listname’ -p ‘newpass’ -q (the -q prevents a notice going to the list owners)
  • To show the subscribers of a list: /usr/lib/mailman/bin/list_members -f ‘listname’ (the -f shows full names for subscribers)
  • To show the owners of a list: /usr/lib/mailman/bin/list_owners ‘listname’ (if you use list_admins instead, it seems to do the same thing)
  • To rebuild all list archives (in bash): for list in `/usr/lib/mailman/bin/list_lists -b | egrep -v ‘^mailman$’`; do /usr/lib/mailman/bin/arch –wipe $list; done
  • To restart Mailman (as root or mailman user): /etc/init.d/mailman restart
  • To one or more list settings for a list: config_list -o – ‘listname’ > tmp.txt; then edit tmp.txt to leave only the setting you want to change; config_list -i tmp.txt ‘listname’
  • To apply a setting change to multiple lists, use config_list -o – ‘listname’ > tmp.txt and edit tmp.txt as above, but then use shell looping: or list in adf-foo adf-bar etc; do config_list -i setting.txt $list ; done

Notes

I’d intended to list here each code file I’m making available along with a description, but instead I put the code descriptions in the Apache directory pages so just look there.

The Mailman logs on our server are in /var/log/mailman and /var/log/mailman/error contains the most recent errors.

Important: Our list host informed me that the default “mailman” list must be kept around, you can’t just delete it or important Mailman things will (apparently) break. So you’ve been warned :)

To increase the max # of recipients for a message to be able to get posted (in case you have lists where legitimate posts get rejected due to having too many recipients), see the max_num_recipients setting, which is in the Privacy Options -> Recipient Filters part of the web interface.

I created the arch_wipe_all script because I’d tried running the commands in it from the shell but it took so many hours (70+ lists and 10+ years of archives) that my shell would get disconnected and I’d have to start over completely. By putting the commands in a script I could preface it with nohup so it would keep running even if I got disconnected.

I noticed in /etc/mailman/postfix-to-mailman.py that if DEB_LISTMASTER isn’t set in /etc/mailman/mm_cfg.py or /usr/lib/mailman/Mailman/Defaults.py that error mail will go to postmaster@localhost, but I wanted it sent to another address in our organization so I added DEB_LISTMASTER = ‘addr@ourhost.org’ to the bottom of /etc/mailman/mm_cfg.py.

Pre-migration messages in people’s mailboxes had links to the old Listserv archives  in the footer, so even though we changed the footer for Mailman, we needed to provide redirects in case someone used an old link. We did this with an Apache URL Rewrite rule to redirect http://lists.ourhost.org/archives/listname.html to http://lists.ourhost.org/cgi-bin/mailman/private/listname/:

RewriteEngine on
RewriteRule ^archives/(.+)\.html$ http://lists.us.org/cgi-bin/mailman/private/$1/ [R]

(the second line, the RewriteRule, needs to be all on one line)

During one part of our disk issues converting our archives, our server admin said we’d maxed out our number of inodes (files and directories), probably because of the large number of archive related files (70+ lists for 10+ years). If you see errors like “unable to write file”, you can check the inodes used on the disk itself with df -i, and check inodes for your user with quota -sv username.

Bookmarks

The following pages weren’t listed in other parts of the guide and may be useful:

Up: Table of Contents

Listserv to Mailman Part 3.1: Converting Listserv Archives to Mailman

Note: You could end up using ten times as much disk space as your uncompressed Listserv archives, after converting them to Mailman’s mbox format and using Swish to index them for searching. If that amount of disk space could be an issue for you, see “A Word About Disk Space” at the end of this page.

Introduction

If you’re converting Listserv lists to Mailman, you probably want to keep your list archives too. The good news is that the Listserv archive format is plain text and is actually pretty well documented. The bad news is that it’s a custom proprietary format, not something common like the Unix “mbox” format, so conversion will be necessary.

Mailman actually wanted the archives to be in mbox format, so that’s what we converted the Listserv archives to. In fact, one of our lists had a lot of spam in its archives so after we did the conversion, but before we imported into Mailman, we opened the mbox file in Thunderbird, deleted the spam messages, and then saved the file back to mbox for importing into Mailman.

After converting the Listserv archives to mbox format, we had to move the mbox files to the right locations for Mailman to generate HTML archive pages, and then use Swish to index the HTML pages and do a little Mailman fiddling to keep private archives private (including search results). But first, more about the archive conversion process.

Converting the Archive Files

My archive conversion journey began with a Perl script called ls2mail.pl that was posted in 1999 and claimed to convert from Listserv’s archive format to mbox for Mailman.

The admin of our old Listserv modified it a bit to fix an unspecified “potentially nasty bug”, and I modified it further to 1) skip messages with dates earlier than the earliest legitimate post and later than the current year (to weed out spam messages with invalid dates), and 2) better match the mboxrd format (by quoting body lines beginning with “From”).

First, our Listserv archives were broken up into weekly archive files named listname.logYYMMw where YY is the year of the archive file, MM is the month, and w is the week (a = first week, b = second, up to e for a month with five weeks). For example, listname.log0901a was the Listserv archive file for the first week of January, 2009.

I decided it would be better to rename the files to use four digit years instead of two (i.e., listname.logYYYYMMw), so I used a small Perl utility called perlren to rename the files according to a Perl regular expression. So for each list I did:

perlren 's#log1#log201#' *.log*
perlren 's#log9#log199#' *.log*
perlren 's#log0#log200#' *.log*

Then I wanted a master file for each list which would have a sorted listing of all archive files (and because of the rename above, an alpha sorting was also a date sorting):

ls -1 *.log* > archive-files.txt

(I used a general name like archive-files.txt since all the log files for each list were grouped into a separate directory for each list.)

Then I used that file as a basis of making a file containing all the Listserv log files combined into one big file, in order:

perl -ne 'print "Processing $_"; chomp; print `cat $_ >> abc-listname.ls`;' \
  archive-files.txt

Finally, I ran the ls2mail.pl conversion script on the master Listserv log file to convert it to mbox format:

perl ~/mailman/ls2mail.pl < abc-listname.ls > abc-listname.mbox

Generating Mailman’s HTML Archive Pages

To generate the viewable HTML pages for Mailman’s web archive, I just had to move this new mbox file to the appropriate Mailman list archive directory (e.g., /var/lib/mailman/archives/private/abc-listname.mbox/) and run the Mailman bin “arch” command to generate the HTML archive pages from the mbox file:

mv abc-listname.mbox /var/lib/mailman/archives/private/abc-listname.mbox/
/usr/lib/mailman/bin/arch --wipe abc-listname \
  /var/lib/mailman/archives/private/abc-listname.mbox/abc-listname.mbox

(the –wipe option tells arch to overwrite any existing HTML pages with newly-generated ones from the mbox file, but since this was a new list that wasn’t a problem)

Once I’d tested this with one list, I repeated the process automatically with the other lists by doing something like the following (using command looping in the bash shell again):

cd /var/lib/mailman/archives/private
for list in abc-list1 abc-list2 abc-list3 etc;
do cd $list;
perlren 's#log1#log201#' *.log*;
perlren 's#log9#log199#' *.log*;
perlren 's#log0#log200#' *.log*;
ls -1 *.log* > archive-files.txt;
perl -ne 'print "Processing $_"; chomp; print `cat $_ >> archive.ls`;' \
  archive-files.txt;
perl ~/mailman/ls2mail.pl < archive.ls > $list.mbox;
mv $list.mbox ../$list.mbox/;
/usr/lib/mailman/bin/arch --wipe $list ../$list.mbox/$list.mbox;
cd ..;
done

By doing this, all the existing list archives were converted from Listserv archives (combined into .ls files) to Mailman .mbox archive files, and HTML web-viewable archives pages were generated by Mailman’s arch command.

A Word About Disk Space

I ran into disk space limits several times during this conversion process.

First, our old list host gave us the Listserv archives as one large compressed .tar.gz file—which expanded to triple the size, requiring 4x the size to accommodate both the gzip file and its uncompressed files.

Furthermore, I found that after concatenating all the individual Listserv .log files into one giant Listserv notebook archive .ls file (so, disk space times two for that step), the conversion to mbox format caused the resulting mbox file to take up about 60% more space than the .ls files.

Then running the Mailman arch command on the mbox files generated HTML pages that took up about twice as much space as the corresponding mbox files.

And then the Swish search index files took up about 55% of the size of the HTML archive pages, so that was another bunch of disk space.

The following table shows, step by step, how a 350MB compressed Listserv archive file ballooned up to almost 10GB:

Item/Action Item Size Total GB
Original Gzipped (compressed) Listserv archive files, as one big .tar.gz file 0.35GB 0.35GB
Uncompressed Listserv archives, indiv .log files (gzip file x 3 due to 1/3 comp ratio) 1.05GB 1.40GB
Concatenate .log files for each list into .ls Listserv archives (same size as .log files) 1.05GB 2.45GB
Convert .ls Listserv archive files to mbox files (about 60% bigger than .ls files) 1.68GB 4.13GB
Run arch on mbox files to generate HTML pages (about 2x size of mbox files) 3.36GB 7.49GB
Create Swish indexes from HTML pages (about 55% of HTML pages) 1.85GB 9.34GB

You can mitigate some of those increases by deleting interim files (e.g., deleting the individual .log files after creating the concatenated .ls Listserv archive files), but the overall disk usage still ends up a lot more than you might think because the mbox files take more space than the Listserv notebooks, the HTML archive files are twice the size of the mbox files, and the Swish search index files are even more space.

We filled our disk up several times during the conversion, and one of the times this happened caused all mail delivery for all our just-converted lists to stop completely until some disk space was freed up and Mailman was restarted (actually it required a server restart, not just Mailman, which they’d tried with “/etc/init.d/mailman restart”).

By the way, after filling the disk a few times I created a script called disk_space_check and installed a daily cron job to send an email if disk usage was too high:

/bin/df -h | $HOME/bin/disk_space_check

In the next section we’ll add searching to Mailman’s archives, and make it so search on private lists (and search results) are limited to just subscribers.

Next: Setting Up Archive Search or Up: Table of Contents

Listserv to Mailman Part 2.3: Migrating Subscribers And Keeping Them In Sync

Introduction: Options Make Things Complex

To migrate subscribers, ideally we’d just get the addresses on each Listserv list with a REVIEW command and then subscribe those addresses to each new Mailman list of the same name.

The problem is that we need to copy not only the subscribers but their list option settings, such as NOMAIL, DIGEST, etc. This makes it much more complicated.

The first thing I did was identify the few “concealed” addresses on our Listserv lists so I could change them to NOCONCEAL. I did this by sending QUERY LISTNAME WITH CONCEAL FOR *@* to LISTSERV (replacing LISTNAME with one of the Listserv list names), for each list.

Since there weren’t a lot of concealed addresses, I set them “noconceal” by sending QUIET SET LISTNAME NOCONCEAL FOR ADDRESS PW=LISTPASSWORD for each subscriber to LISTSERV (replacing LISTNAME with the list name, ADDRESS with the subscriber address, and LISTPASSWORD with the Listserv list password).

I wanted to eliminate concealed statuses because I needed to make sure all subscribers would show up when I did a REVIEW command to generate one subscriber file for each list (more on this later).

Listserv Subscriber Settings And Mailman Equivalents

You may find it helpful to see a list of possible Listserv subscriber options and what settings/actions are available in Mailman to match them:

Listserv Subscriber Options Mailman Equivalent
MAIL/NOMAIL (whether to get posts; MAIL is assumed unless NOMAIL is set) each subscriber can have the delivery flag set “on” or “off”
DIGEST/NODIGEST (whether posts are batched up into periodic digests; NODIGEST is assumed unless DIGEST is set) add_members bin command has a -d option to add people with the digest setting; afterwards, a subscriber can have the digest flag set “plain”, “mime”, or “off”; see also the digest_is_default and other digest_ list config setting
MIME/NOMIME (whether digests are sent to a user in MIME format; NOMIME is assumed unless MIME is set) each subscriber can have the digest flag set “plain”, “mime”, or “off”; plain is equivalent to NOMIME and mime is equivalent to MIME; see also the mime_is_default_digest list config setting
INDEX/NOINDEX (a variant of digest mode; instead of digest batches of messages, just the subject lines of posts are sent as a summary) no equivalent that I am aware of, and that’s fine because I don’t really see how useful a summary is with just subjects
ACK/NOACK/MSGACK (sends a “your post was received” confirmation message, or not; NOACK is assumed unless ACK is set; MSGACK is obsolete) each subscriber can have the ack flag set “on” or “off”, though I don’t see the point—it seems to make more sense to just use getting your own posts back (with the myposts flag) as the acknowledgement message
SUBJECTHDR (whether [list name] is put in the Subject field of every post, for filtering purposes) I believe this is only settable on a list-wide (not subscriber-specific) basis via the subject_prefix list config setting
CONCEAL/NOCONCEAL (whether someone shows up on a normal REVIEW of the list) each subscriber can have the hide flag set “on” or “off”; also the private_roster list config setting lets you restrict member addresses to other members or list admins; my mailman_admin_command_handler.py doesn’t support the hide option though since I think it’s a PITA for admins
REPRO/NOREPRO (whether someone gets copies of their own posts; REPRO is assumed unless NOREPRO is set) each subscriber can have the myposts flag set “on” or “off”
TOPICS (a way to let people only see certain categories of posts; I never understood or used this though, seemed easier just to create more lists) Mailman definitely seems to support this but we did not have to deal with it; there are a number of topics_ list config settings and supposedly users can choose to just get certain topics; see Mailing list topics in the docs
POST/NOPOST (whether someone is allowed to post to the list; POST is assumed unless NOPOST is set) no equivalent that I am aware of, though you can use the mod subscriber flag to cause someone’s posts to require approval, and then just never approve them :)
EDITOR/NOEDITOR (whether someone can bypass normal approval mechanisms required for posting to certain lists) I don’t think there is a per-user subscriber flag for this, but the moderator list configuration setting allows you to specify all moderator/editors at once; the only downside is that then they also get notices of any posts held for review and have the ability to approve/deny them too; can also use the accept_these_nonmembers list config setting to allow people to bypass approval, but I think it only works for non-subscribers
REVIEW/NOREVIEW (whether someone’s posts are specially held for approval on an otherwise unmoderated list—usually for bad behavior) each subscriber can have the mod flag set “on” or “off” (on = they are “moderated” and need their posts approve)
RENEW/NORENEW (whether someone should get periodic stay-subscribed confirmation emails, overriding the list setting) no equivalent that I am aware of, since I don’t believe Mailman sends out stay-subscribed (i.e., renew your subscription) confirmation messages

Note that some of these settings (like the mod flag for users) were difficult to discover, and sometimes required delving into the source code (especially the Mailman/Commands/cmd_set.py module).

More Preparation Work

At this point we’d changed anyone who was set CONCEALed to NOCONCEAL so we wouldn’t miss them in the conversion process. Then we needed to decide which of the subscriber settings were important to copy in the conversion and which were not. (The table above should help you think about the various subscriber settings to consider.)

For example, I decided that we’d support (i.e., carry over the subscriber settings on our old Listserv lists to equivalent Mailman settings on the new lists) the NOMAIL, DIGEST, MIME, NOPOST, NOREPRO, and REVIEW options.

I’d already decided to include list names in subject lines for all subscribers on all our lists, regardless of what their old Listserv SUBJECTHDR settings were, due to the fact that Mailman only has this setting at the list level.

(For people who had relied on list names in subject lines, to not include this setting on the Mailman lists would destroy their mail filters, while for those who didn’t rely on it, at worst it would be a mild annoyance to suddenly see list names in subject lines—though before our cutover I warned subscribers that list names in subject lines would soon be the default.)

The Listserv EDITOR flag was also something Mailman only supported on a per-list basis, but that wasn’t a problem for us because we’d always used the Editor= line in the old Listserv list headers, rather than the EDITOR flag on individual subscribers, so we dealt with it at the list setting level instead of here at the subscriber setting level.

I’d decided against supporting the CONCEAL flag, even though there is an equivalent setting on Mailman lists (“hide”), because I don’t think it really protects subscriber addresses (this is really best done with the “private_roster” Mailman setting), and over the years CONCEAL caused me a lot of headaches (e.g., I’d waste a lot of time trying to figure out a subscriber problem only to finally realize that they were concealed).

Finally, I decided we wouldn’t support the INDEX, ACK, or RENEW/NORENEW options, as Mailman doesn’t really have equivalents for those and I doubt anyone had set them anyway. I also didn’t bother with TOPICS since we’d never used it in Listserv.

So, for all of the “supported” options above, I had to send a message to LISTSERV seeing who, for each list, had these supported options set and saving the responses to each message in a separate text file. The message to LISTSERV was QUERY LISTNAME WITH OPTION FOR *@* (LISTNAME was the list name and OPTION was something like NOMAIL or MIME).

The reason we did it this way was that based on the way Mailman worked, it seemed best to do this in two steps: 1) Subscribe all subscribers of each Listserv list to the corresponding Mailman list, regardless of their settings, and then 2) go back and apply the special settings for certain people (such as REVIEW) by hand or by script.

Copying Subscribers And Settings

With our old Listserv setup we used cron on our web server to do an hourly REVIEW of each list and then save the results (which included all subscribers) to files, one file per list with the listname as the filename.

I’d written a quick and dirty script called queryopt.pl to email a query to Listserv to find out all subscribers with a given option on a given list, such as DIGEST, NOMAIL, EDITOR, etc.

Since all our list names began with abc- (where abc is our organization’s acronym), this allowed me to go into the directory containing all those review output files and do something like perl queryopt.pl nomail abc-* and have it send a bunch of Listserv queries, one for each list (because abc-* is expanded by the shell to all the abc- files, or all our list names).

I used the following procmail filter to save the results of these query emails to files named option-listname.txt (e.g., nomail-abc-foo.txt would have everyone with the NOMAIL option on the abc-foo list):

PMDIR=$HOME/.procmail
# Save results of certain queries; pipe to prog for massage and save
:0 Hc
* ^From:.*LISTSERV@
* ^Subject:.*Re:.* query
| cat - | $PMDIR/save_opt_query.py

This matched the sender and subject line and then sent the email with LISTSERV’s query response into a program called save_opt_query.py.

One minor issue I had, with no easy fix, was that save_opt_query.py expected the Listserv response message to be like:

Subscription options for Mike Smily <mikednahelix@HOTMAIL.COM>, list
ABC-ANNOUNCE:

DIGEST         You receive list digests, rather than individual postings
SUBJECTHDR     Full (normal) mail headers with list name in message
               subject
REPRO          You receive a copy of your own postings
MSGACK         Short "TELL" acknowledgement of successfully processed
               postings

Subscription date: 30 Jul 2004

Subscription options for "Alexandria ." <alextheconquerer@YAHOO.COM>,
list ABC-ANNOUNCE:

DIGEST         You receive list digests, rather than individual postings
SUBJECTHDR     Full (normal) mail headers with list name in message
               subject
REPRO          You receive a copy of your own postings
MSGACK         Short "TELL" acknowledgement of successfully processed
               postings

That worked most of the time, but sometimes the name/email was too long so the name was on one line and the email was on the next. So after importing our subscribers I had to go back and manually fix those few cases (by adding those addresses to lists and setting their options), but it was only about 30 people for 70 lists so, while annoying, it was an acceptable trade-off to keep save_opt_query.py as simple as possible.

(As to how I identified the problem records, I looked for lines in the saved option-listname.txt files that did not have the <foo@bar.com> brackets, for example with: egrep -v ‘<’ *.txt)

As mentioned before, the Listserv subscriber options I decided to support (i.e., copy over to the new Mailman lists) were: DIGEST, MIME (a special case of digest), NOMAIL, NOPOST, NOREPRO, and REVIEW. I also wanted to run queryopt.pl for NODIGEST to get the “regular” (non-digest) subscribers, because the Mailman add_members command has flags to specify a file with non-digest subscribers and a file with digest subscribers to add.

So after testing the procmail filter to make sure it would catch and save the option query output into files correctly, I did the following to run it for all the options:

for option in digest nodigest mime nomail nopost norepro review; \
do perl queryopt.pl $option abc-*; done

(Note that this used looping syntax specific to the bash shell; your shell’s loop and variable substitution might vary, such as requiring parentheses—if you don’t know what I’m talking about read your shell’s man page, search the web, or ask another developer/admin.)

To check if all responses had been saved properly I just did a file count based on filenames:

for option in digest nodigest mime nomail nopost norepro review; \
do echo $option; ls -1 ${option}-* | wc -l; done

(checking that the number of each option file equaled the number of lists we have, for example with 71 lists we should have had 71 nodigest-* files)

Since I’d been doing all this on our web server where the hourly Listserv REVIEW files resided (which allowed me to do abc-* in the shell to expand to all our listnames), I had to copy these files (which were the results of queries to LISTSERV to find out the options people had on our soon-to-be-old Listserv lists) over to our new list host for automated subscription and option setting on the new Mailman lists.

Importing Subscribers and Setting Options

Now we had a bunch of option-related text files (option-listname.txt) on our new Mailman list server. I used shell looping again to do a batch import of subscribers followed by option setting for subscribers with special options like NOMAIL.

Since I’d already created new Mailman lists on our new list host equivalent to the old Listserv lists I could do the following:

for lst in `/usr/lib/mailman/bin/list_lists -b | egrep -v '^mailman$'`; do
echo add_members -r nodigest-${lst}.txt -d digest-${lst}.txt -w n -a n $lst
echo set_option.py mod on ${lst} 'tree#bard' review-${lst}.txt
echo set_option.py mod on ${lst} 'tree#bard' nopost-${lst}.txt
echo set_option.py myposts off ${lst} 'tree#bard' norepro-${lst}.txt
echo set_option.py delivery off ${lst} 'tree#bard' nomail-${lst}.txt
echo set_option.py digest mime ${lst} 'tree#bard' mime-${lst}.txt
done

As before, when using shell looping I like to use “echo” statements to check the commands before running them. If all looks good, I remove the echos to really run the commands.

(In fact, I first ran the add_members and set_option.py commands manually on one of our lists and verified their success by sending “who” and “show” commands to the admin command handler, before using the shell loop to run them on ALL lists.)

The commands above first subscribe people in bulk using the Mailman add_members bin command with options to specify files with non-digest and digest subscribers, and to not send any notifications of the subscriptions. Then they use another script I wrote (set_option.py) to apply the Mailman option settings to subscribers and thus copy the old Listserv settings to the new subscribers.

Keeping Subscribers and Options In Sync

At this point I had the Mailman lists set up as a snapshot in time of the Listserv lists, but I needed to do additional work to make sure they’d stay in sync going forward (i.e., to run them in parallel with the Listserv lists, in terms of subscribers and subscriber options).

Fortunately, all of our subscribers seemed to use our CGI-based web subscription management pages rather than emailing LISTSERV directly or using the Listserv web interface and, as described before, those CGI scripts just generated admin email commands to LISTSERV.

This worked out well for our migration because I was able to add Mailman code right after Listserv related code, to do the same things for the Mailman lists (such as subscribe, change options, unsubscribe, etc.).

To get a little more detailed, our web pages called CGI scripts which in turn called a MailingLists.py module to do the actual work for our Listserv lists. For the migration I copied MailingLists.py to MailmanLists.py and kept the function names the same but changed the actual commands to use the custom Mailman admin command handler. Then I could just make sure that every time a CGI script used a function in MailingLists.py it called the same function in MailmanLists.py to keep the old Listserv and new Mailman lists in sync.

This was still a little time-consuming, searching all our CGI scripts to see where MailingLists.py was called and then adding calls to MailmanLists.py, but it was worth it.

But it wasn’t just CGI scripts—I also had to check for cron jobs or shell scripts I’d written that used MailingLists.py functions too, and add equivalent calls to MailmanLists.py like in the CGI scripts (and keep track of all these changes so that I could go back and remove the MailingLists.py calls after the migration was done).

Fortunately, there were relatively few cron jobs or shell scripts which used MailingLists.py (really just one weekly cron job that checked whether the email addresses on “members only” lists actually corresponded to current members).

But this is definitely a case where utility scripts like tgrep came in handy, such as running a command like the following from the home directory on our web server, which also contained the web tree as a subdirectory:

find . -type f -size -512k -exec tgrep MailingLists {} \; > /var/tmp/mls.txt

This started in the current directory (.) and looked through all files (-type f) less than 512Kb (-size -512k) and ran tgrep (which only searches text files) for “MailingLists” and printed out the filename and matching lines. The output was redirected someplace outside the current directory tree because otherwise find/tgrep would have kept matching lines in the output file itself, resulting in an ever-growing output file (ask me how I know this!).

(The unix find command is very powerful and the above only scratches the surface of the many awesome things you can do with it; for more information on its many useful options, Google something like using find linux.)

Keeping Subscriber Files Updated With Mailman

This may not apply to you at all, but I want to mention that we still use the loosely coupled web subscription pages whose input goes to CGI scripts which use objects to generate admin email commands to the list server; the only real difference after our migration is that the list server has changed from Listserv to Mailman. (See the lib directory, particularly MailmanLists.py—the other files in there are just supporting objects for it.)

You may remember that I said our old Listserv setup had an hourly cron job which sent REVIEW commands to LISTSERV and saved the replies into text files, one per list. That provided subscriber files our MailmanLists.py object could use, allowing the CGI scripts to have (relatively) recent information about which subscribers were on which lists.

We could have used the same setup with Mailman, sending email messages to the admin command handler every hour to get an updated subscriber roster for each list, but since we were allowed to submit cron jobs on our new list host, it seemed better to just run a job hourly on the list host that would directly extract the subscriber rosters, save them to files, and then copy them over to our web host using secure copy (scp).

By exchanging SSH key files ahead of time we were able to allow the scp copy to happen automatically without having to enter passwords for the file transfer. If you don’t know what I’m talking about, Google something like ssh passwordless login (since scp uses ssh underneath).

If you want to do something like this automated saving of list subscriber rosters to files, and then copying them somewhere else, you might find my short review_and_copy utility script useful.

Conversely, if you want to do something like we used to do with Listserv, where you regularly email Mailman’s admin command handler a “who listname” command and then save the results to a file, see the revmmlists script.

Similarly, if you’re paranoid about backing up your Mailman list configurations to dump files on a regular basis (with content equal to running config_list -o dumpfile listname at the command line), you might want to look at my backup_list_configs file.

And if you do a lot from the command line you might be interested in the sub, unsub, group_sub, and group_unsub scripts in the utils area.

That’s not necessary by any means, but I spent so much time configuring our lists that I want to know I can always recreate those configurations instantly if needed (with config_list -i configfile listname).

A Note About Passwords

I need to mention that since our website also had a private, login-based “members only” area, we wanted list subscribers who were also members of our organization to have their Mailman list passwords be the same as their website login passwords.

On the other hand, if someone was a subscriber but not a member, we wanted all their passwords set as the same default password.

This didn’t matter so much for subscription management because we had our own custom pages for subscribing/unsubscribing/etc. In fact, we didn’t actually want people going to the Mailman list pages for those things, we wanted them to go to our pages so we could also do things like keeping mailing list and website passwords in sync.

However, subscriber passwords did matter for accessing list archives since all our list archives are password-protected to subscribers only, so we wanted the password to be a no-brainer, just-use-the-same-password-as-your-website-login thing.

Shortly copying the subscribers from Listserv to Mailman and setting their options, then adding code to keep Listserv and Mailman subscribers/options in sync, I realized that I needed to also keep subscriber passwords in sync, and not just between Listserv and Mailman, but in the case of members, in sync with each member’s corresponding website login password too.

I dealt with this in a two-stage manner, just as I had for the options. The first stage was doing a massive sync to get everything correct as of a snapshot point in time. Then the second stage was to make sure it would be correct going forward.

For the first item, the one-time-fix, I coded a utility script called sync-list-passwords that used objects representing our membership database and website login system to check whether someone was a member and if so, whether they’d set up a login on our site. The script checked these and then set the Mailman list passwords appropriately depending on which case each subscriber fit into.

For the second item, I simply made sure in the MailmanLists.py code that whenever someone got added to a list it would see if they were a member and had a website login, and then send a password setting command to the admin command handler. If someone were removed from a list there was nothing that needed to be done, and if someone changed their address the clone_member Mailman code took care of copying over their password.

I also had to add code to our website login system so that if someone changed their login password there, it sent a message to the Mailman admin command handler to change their password on all their list subscriptions as well.

I’d be very surprised if your setup matched ours or was even as complicated, but the point is that Mailman does assign passwords to subscribers for each list (randomly I think, if you don’t specify), and you will need to think about this if you intend to have private list archives or intend to allow subscribers direct access to the Mailman web interface pages (since subscribers need to log in to change anything).

Further, if you have a lot of lists you’ll need to think about how you’re going to handle the fact that if you don’t do something to actively keep subscriber passwords the same across all those lists, they could end up with different random passwords on different lists, which could be a big problem.

And it’s a lot better to come up with a password-handling strategy up front, before you start your migration, than in a panic after the migration has already started (ask me how I know this!).

Next: Converting Listserv Archives to Mailman or Up: Table of Contents

Listserv to Mailman Part 2.2: Creating Mailman Lists Based On Listserv Lists

Introduction

This section describes how to create new Mailman lists equivalent to the existing Listserv lists.

When we did this, there were about two weeks when we made subscriber changes in parallel, to both the old Listserv lists and the new Mailman lists, while we converted the archives and got ready for the final cutover.

We were able to do this because all our subscriber changes (subs, unsubs, and address changes) were done through CGI scripts on our website which generated email commands to LISTSERV—so we just modified the scripts to also send equivalent commands to the Mailman administrative command handler we described in the last section.

In other words, if someone subscribed to ListX on our website, the CGI script sent commands to subscribe them to both the Listserv ListX (soon-to-be-old server) and the Mailman ListX (soon-to-be-new server), even though the latter weren’t get getting any posts because ListX@lists.ourhost.org still pointed to the Listserv server.

When we were ready for the actual cutover we just changed lists.ourhost.org to point from the Listserv host to the Mailman host, and Mailman started handling the new posts right away. Then once everything was verified as working properly, we went back and updated the CGI scripts to stop sending subscriber changes to the old Listserv host.

Identifying And Grouping Current Lists

This may be easy for you, but we had 70+ lists to migrate, so not only did we have to identify them, we had to group them in terms of their settings so we’d be able to use a script to create multiple lists with similar settings in a batch instead of having to create them one at a time.

To do this you obviously need to understand all of your old Listserv list settings AND the equivalent new Mailman settings.

So if you  have a large number of lists you’ll need to examine all of their settings and see if they need to be categorized into settings groups. For example, after analyzing our 70+ lists, I identified the following settings groups:

  • announcement lists – moderated lists allowing only a few people to post directly (no approval needed), but forcing all other posters’ submissions to go into an approval queue, with replies going to the poster by default, not the list, and with anyone able to subscribe to these lists
  • member lists – discussion lists that any of our organization’s members (not the general public) could subscribe to and then post to, with replies going to the list by default
  • public lists – discussion lists anyone in the general public could subscribe to and then post to, with replies going to the list by default
  • public review lists – discussion lists anyone in the general public could subscribe to, but with messages requiring moderator approval, and replies going to the list by default
  • open post lists – special purpose lists that anyone in the general public could post to, but which required special list owner approval for subscriptions (the subscriber lists for these were very small)
  • restricted lists – discussion lists whose subscribers were limited to special organizational subgroups, requiring each subscriber to be added or removed by a list owner, with replies going to the list by default

To do this for your lists, ask basic questions like the following:

  1. Who can subscribe to the lists? (anyone in the public, only people in your organization, only a few special people, etc.)
  2. Who can post to the list? (you might have an announcement list that anyone can subscribe to, but only a few people can post to)
  3. Do posts require moderator approval? (and if so, is the moderator just the list owner, or do you need to have additional non-owner moderators?)
  4. Where do replies to posts go by default? (to the list or to the poster?)

Examining your current list settings should answer these questions and maybe prompt additional ones you’ll need to answer for all your lists. Then you can determine if all your lists have the same settings or they need to be categorized into settings groups as ours did.

Exploring Mailman List Config Options

To see all the possible Mailman settings for a list, I used the following command to view the settings of the previously-created test list:

/usr/lib/mailman/bin/config_list -o orig.listconfig test-list

(your config_list location might differ, but it will be with all the other “mailman bin” commands)

The goal of this was, for each possible Mailman list setting, to determine if we wanted the same setting to apply to all our lists or if the setting’s value would differ depending on which settings group the list was in.

If you do this and look at orig.listconfig you’ll see that it’s about 80% comments, about 10% blank lines and 10% Python code; list config files like this are (and must be) valid Python code since they are read back in by the config_list script with the -i option to make list changes.

(Note that if you’re just changing one or two things, you don’t need to feed the entire config file back into config_list; it’s smart enough that if you feed it a file with a small number of settings, it will just change those settings on the given list and keep the other settings at their previous values.)

It’s also important to note that these “config files” have no direct ties to Mailman. Once you dump a list’s configuration settings into a text file using config_list -o (o=output) as above, you can do whatever you want with that text file and Mailman won’t care.

The only time the text file matters again is if you feed it back into Mailman with config_list -i (i=input), at which point Mailman will take each setting in that file and use it to update the real list settings it keeps internally. Once you’ve done that, you’re free to delete the file if you want—you can always recreate it if necessary with config_list -o.

I wanted to keep orig.listconfig as a set of Mailman’s default values, so I copied it to announcement.listconfig so that the latter could represent the configuration of our announcement lists (see our settings groups in the bulleted list above).

Then I edited announcement.listconfig, first replacing every reference to “test-list” with “abc-listname” (no quotes; all our mailing lists begin with abc-). This allowed us to later batch-create multiple lists with almost identical configurations by replacing abc-listname.

Second, I added a comment of “ALWAYS” or “DIFFERS” next to every setting in announcement.listconfig, for “always use this setting value for all our lists” or “this will differ from one list settings group to another”. This allowed me to copy announcement.listconfig to other settings group listconfig files, search for each DIFFERS comment, and make changes to just those options.

I also made notes next to each ALWAYS or DIFFERS comment to explain the choice of setting value and make notes about the configuration option itself if necessary. For example:

# Should any existing Reply-To: header found in the original message be
# stripped?  If so, this will be done regardless of whether an explict
# Reply-To: header is added by Mailman or not.
#
# legal values are:
#    0 = "No"
#    1 = "Yes"
# ALWAYS - force replies to list or, if possible, list AND poster
#   either way, need to strip original reply to, so always true
first_strip_reply_to = True

The bold text is what I added, a note that I wanted this option to be true for all our lists, applying the “reply to list” or “reply to poster” policy equally for all posts and never honoring an original Reply-To header included by a poster.

Here’s an example of a DIFFERS comment explaining the choice and which list settings groups should have which values for this setting:

# When this option is enabled, all list traffic is emergency moderated,
# i.e. held for moderation.  Turn this option on when your list is
# experiencing a flamewar and you want a cooling off period.
#
# legal values are:
#    0 = "No"
#    1 = "Yes"
# DIFFERS - set to no for almost all lists; yes for moderated lists
emergency = False

Finally, since a file that’s intended for feeding back into config_list -i must be valid Python code, I wanted to make sure I hadn’t made any typos so I ran python announcement.listconfig to make sure the script diddn’t bomb; a successful run just gives you back the command line with no output since config files just set a lot of variables.

I’d love to explain all of our option choices, but you’ll need to examine them all for yourself anyway and the in-file comment documentation is usually pretty good. One option I would like to explain though is the “new_member_options” setting, an opaque numerical value with a nebulous comment:

# When a new member is subscribed to this list, their initial set of
# options is taken from the this variable's setting.
new_member_options = 256

The new_member_options name implied it was pretty important and determined the default options for all new subscribers, and I needed to know if it should be the same across all our lists or if it needed to differ per settings group.

However, as you can see there was very little comment documentation for this option, and I couldn’t find anything online either. Ultimately I had to delve into the source code, where I found the following in the source tree’s Mailman/Defaults.py.in file:

# Digests             = 0 # handled by other mechanism, doesn't need a flag.
# DisableDelivery     = 1 # Obsolete; use set/getDeliveryStatus()
# DontReceiveOwnPosts = 2 # Non-digesters only
# AcknowledgePosts    = 4
# DisableMime         = 8 # Digesters only
# ConcealSubscription = 16
# SuppressPasswordReminder = 32
# ReceiveNonmatchingTopics = 64
# Moderate = 128
# DontReceiveDuplicates = 256

I believe the way this works is that you decide which options (“flags”) you want to be “on” for new subscribers and then add up all the values for those options; the total is what new_member_options should be.

In other words, since the default (in a new Mailman list) new_member_options value is 256, that means the only option turned on for new subscribers is DontReceiveDuplicates.

I examined the above options and determined that the only setting we wanted was SuppressPasswordReminder (if someone subscribed to 20 of our lists it would be awful to get 20 monthly password reminders!), so I made our new_member_options value 32 instead of the default 256, and made an ALWAYS note that this should apply to all our lists.

(Side note: I got the sense that new_member_options is older/obsolete since some of its flags seemed covered by other configuration options and this style of determining default subscriber options is a lot more difficult for most people to use.)

Anyway, after replacing test-list with abc-listname in the announcement config file, and then adding ALWAYS or DIFFERS comments for each option, I copied announcement.listconfig to members.listconfig, then went through all the DIFFERS comments and made sure that the settings values were correct for “members” type lists.

When done with members.listconfig I copied it to openpost.listconfig, edited all the DIFFERS settings, and then repeated the copy/edit process for the remaining list settings groups.

I ended up with six .listconfig files, matching our list settings groups in the bulleted list above: announcement.listconfig, members.listconfig, openpost.listconfig, public.listconfig, pubreview.listconfig, and restricted.listconfig. These files were 1) identical to each other except where lists of that settings group should differ, and 2) had a generic list name of “abc-listname” instead of the real list name.

Listserv And Mailman List Configuration Option Equivalents

In order to make good choices about the Mailman configuration settings for our list settings groups, I needed to understand which old Listserv configuration settings translated to which new Mailman configuration settings.

I wasn’t able to find much online comparing Listserv and Mailman list settings; in fact the only page I found was the MIT Mailman User Guide: ListServ Keywords in Current Use and its accompanying “cheat sheet” PDF.

I can only discuss the Listserv options we used and give you their Mailman equivalents, but hopefully between the options here, the MIT page, the Listserv list header documentation, and the Mailman config file comments, you can figure everything out for your site.

Listserv Header Keyword Mailman Config File Option
Review= (who can get subscriber lists) private_roster (found in Privacy options… → Membership exposure in the Mailman web admin interface)
Subscription= (who can subscribe to the list) subscribe_policy (Privacy options… → Subscribing)
Send= (who can post to the list) emergency (causes all posts to be moderated; General Options → Additional settings), accept_these_nonmembers (set to ‘^.*@*.^‘ to allow anyone to post; Privacy options… → Sender filters → Non-member filters), include_list_post_header (General Options → Additional settings)
Notify= (if list owner is notified of new subscriptions, deletions, etc.) admin_immed_notify, admin_notify_mchanges (both are in General Options → Notifications)
Reply-to= (if replies go to list/sender, and if reply-to header is honored) reply_goes_to_list, first_strip_reply_to, reply_to_address (these three are in General Options → Reply-To header munging), anonymous_list (General Options → General list personality)
Default-Options= (initial settings for new subscribers) new_member_options (General Options → Additional settings), digest_is_default (Digest options), default_member_moderation (Privacy options… → Sender filters → Member filters)
Files= (obsolete option, see Listserv doc) no equivalent
Validate= (how to validate commands as authentic; with a password, confirmation and reply, etc.) no equivalent, all Mailman web changes require password login
Filter= (whether to auto-identify mailing loops and suspicious/spammy From addresses) header_filter_rules (loosely; Privacy options… → Header filters), mostly handled automatically
Confidential= (whether list’s existence is public knowledge) advertised (Privacy options… → Subscription rules → Subscribing)
X-Tags= (whether X-To and X-Cc headers in posts are passed to subscribers) no equivalent, presumably they are passed on but I have not confirmed
Stats= (obsolete option, see Listserv doc) no equivalent
Ack= (whether to respond to postings to let sender know post got through) autorespond_postings, autoresponse_postings_text, other autoresponse_ options (all in Auto-responder section of web admin)
Notebook= (whether to keep archives and how often to rotate) archive, archive_private, archive_volume_frequency (all in Archiving Options section of web admin)
Auto-Delete= (whether and how aggressively to auto-remove bad subscriber addresses) bounce_processing and other bounce_ options (all in Bounce processing section of web admin)
SizeLim= (in lines unless otherwise specfied) max_message_size (Kb; in General Options → Additional settings)
Daily-threshold= (max msgs per day before list is automatically “held”) no equivalent, though you can manually set “emergency” if there is a mail loop or flame war (in General Options → Additional settings)
List-Address= (domain for the list, so it is listname@list-address) host_name (in General Options → Additional settings)
Attachments= (whether to allow attachment, reject with an error, or silently strip them) all filter_ options, pass_mime_types, pass_filename_extensions, collapse_alternatives (these four options are in Content filtering), scrub_nondigest (in Non-digest options)
Digest= (whether to support digests for this list and if so, how many messages/lines in each) nondigestable (in Non-digest options), digestable, mime_is_default_digest, digest_is_default and other digest_ options (all in Digest options)
Language= (whether to strip/convert HTML to text) convert_html_to_plaintext (in Content filtering section of web admin)
Editor= (who can review/approve other people’s posts, and bypass similar moderation moderator (General Options → General list personality)
Owner= (the list owner or owners) owner (General Options → General list personality)
Errors-To= (address that errors should go to) no equivalent, errors go to list owners

Creating Mailman Lists In Bulk

In the “Identifying And Grouping Current Lists” section we created several different settingsgroup.listconfig files, where settingsgroup was a name for a group of lists with all the same configuration settings except for the listname itself. We’d put abc-listname as the list name in the group settings file so we could search and replace it in the batch creation process, which is described in this section.

While we came up with .listconfig files for each settings group, we still had to have a way to say which settings group each list belongs to. So I created settingsgroup.lists files such as announcement.lists, member.lists, public.lists, pubreview.lists, etc. Each file simply had, the lists belonging to that settings group, one list per line.

For example, we only had one announcement type list to create so our announce.lists file had just one line:

ABC-Announce

If we’d had more announcement type lists, they would have each been on a separate line in the announce.lists file.

So the idea was that announcement.lists had a roster of lists (one per line) to create according to the announcement.listconfig file settings, with abc-listname in that settings file to be replaced with each list name from announcement.lists. And the same thing with public.lists, pubreview.lists, etc., with each .lists file containing the lists to be created with the corresponding .listconfig file settings.

There were probably many ways to do the batch creation, taking each line in the whatever.lists file and replacing abc-listname in the whatever.listconfig, then using the “newlist” Mailman bin command to create the new list with that name.

I chose to just use a series of single-line commands, one for each settings group. The command I used was the following, pasted into the shell and run as a single line:

perl -ne ‘chomp; $mclist = $_; $lclist = lc($_); print `echo cp -f announcement.listconfig /tmp/$lclist.listconfig`; print `echo perl -pi -e s#ABC-Listname#$mclist#g /tmp/$lclist.listconfig`; print `echo perl -pi -e s#abc-listname#$lclist#g /tmp/$lclist.listconfig`; print `echo newlist -q $lclist abc-listmaster\@abc.org listpassword >> aliases-to-make.txt`; print `echo config_list -i /tmp/$lclist.listconfig $lclist`;’ announcement.lists

(This assumes that you’re already in the Mailman bin directory, such as /usr/lib/mailman/bin, or that you’ve put that directory in your $PATH. If not, you will want to write out the full paths to newlist, config_list, and change_pw above.)

Note:  The command as written above won’t actually do anything, it will just print what it would do if you removed all the “echo” commands. Whenever I do anything complex like this, repeating commands over a lot of items, I first use echo statements to print out what’s going to happen and then look it over to check for errors before running the real commands.

The overall idea is that Perl is used to iterate over each line of the announcement.lists file, where each line is the mixed-case name of a list we want to create. The names are mixed case because in some settings (like the list description) Mailman lets you use mixed case, while in others it just takes lower case.

The command above sets $mclist for the mixed-case list name, and $lclist for lowercase, then it copies announcement.listconfig to a temp location, replaces ABC-Listname in the temp file with the mixed case value using the $mclist variable, and does the same with abc-listname and $lclist.

Then it uses the Mailman newlist bin command to create the list using abc-listmaster@abc.org as the initial list owner and “listpassword” as the initial list password, though both can be overridden in the group settings file which is applied next.

Important Note: Since we had 70 different email lists, we did want them all to have the same password. Your situation may be different and if you need each list password to be different you might use the above commands to set them all the same and then use the Mailman change_pw command afterward to customize the password on a per-list basis.

The version of the newlist command above also appends the output of the command to a file called aliases-to-make.txt in case you need to create mail aliases for your new lists. We didn’t need to do that though because our new list host had set up postfix-to-mailman.py so that postfix automatically recognized the Mailman aliases for new lists. (There’s a qmail-to-mailman.py file too if you’re running qmail, very handy.)

(Note: Actually the “>> aliases-to-make.txt” redirection in the “echo newlist” portion causes that command to get redirected into aliases-to-make.txt rather than printing on screen, so for testing the echo you’ll have to remove “>> aliases-to-make.txt”, just be sure to add it back in before running the commands for real!)

After the new list is created, config_list is used with the temp config file to configure it.

If you’re comfortable with all of this, try it out and examine the output from the echo commands.

You can even try the commands for real, one at a time in the shell with one list, to make sure they’ll work as you expect them to (e.g., that there are no path or permission errors). In that case you can check the contents of aliases-to-make.txt to make sure the output is what you expected, and even run config_list -o – newlistname to double-check the configuration.

When you’re comfortable with what will happen from the looping commands above, remove the echo statements and run the commands for real with a small number of lists. If that works, you can then run the commands with each successive settings group (which for us meant changing “announcement” to “member”, then “public”, “pubreview”, etc.), ultimately creating all new Mailman lists in batches according to their old Listserv settings groups.

Next: Migrating Subscribers And Keeping Them In Sync or Up: Table of Contents

Listserv to Mailman Part 2.1: Installing an Administrative Command Handler

Introduction

This section describes how to install an “administrative command handler”, something you can email list management commands to, just like emailing commands to LISTSERV. If you never used the Listserv email interface then this entire section may be unnecessary for you.

Background

One of the first things I did when investigating Mailman was try to find out how to send mailing list management commands by email.

That was important because in administering our 70 email lists over the years, we’d developed a set of helper scripts and modules to make list management actions like subscribing/unsubscribing easier.

If someone wanted to subscribe to our lists they’d put their name and email into a form, check off the lists they wanted, and submit. Then behind the scenes our scripts would send an email to LISTSERV with the appropriate commands to make it happen.

This was all aimed at 1) allowing subscribers to manage their own subscriptions on the web, and 2) allowing list admins to easily manage subscribers over a lot of lists.

Unfortunately, it soon became apparent that Mailman has no administrative email interface. Instead, Mailman provides a simple but clean web-based interface for list admins to manage users on each mailing list.

Going to each list’s web interface to manage a given subscriber is fine for one or a few mailing lists, but it’s completely unmanageable at 10, 20, or 70 lists—you need a simple one-stop place to set options, change addresses, etc. for a subscriber on all lists at once.

This meant I either had to abandon Mailman as the mailing list software in favor of something which did have an administrative email interface, or stick with Mailman and program it myself. Since Mailman seemed like the only viable replacement for Listserv, I decided to scratch my own itch.

Fortunately, I already knew Python, which most of Mailman is written in, so programming an administrative email interface which could call built-in Mailman code to do its work was actually possible.

Issues Setting Up Admin Command Handler

I’d noticed when I created the “test-list” on my development box with the newlist command that it told me I still needed to create some email aliases for various list-related addresses to pipe messages into mailman for handling:

To finish creating your mailing list, you must edit your
/etc/aliases (or equivalent) file by adding the following
lines, and possibly running the `newaliases' program:
## test-list mailing list
test-list:              "|/path/to/mailman post test-list"
test-list-admin:        "|/path/to/mailman admin test-list"
test-list-bounces:      "|/path/to/mailman bounces test-list"
...

This got me to thinking that I’d need an email address to receive the Mailman admin commands. My first thought, based on the above, was something like test-list-admin-cmd, but I didn’t want to have to make a new admin alias for every new list.

I really wanted something like our old listserv@lists.ourhost.com, which all our web and shell scripts were sending admin command messages to.

So I created a command handler called mailman_admin_cmd_handler.py and in the control panel for the development box set it so that mail sent to mailman-admin-cmd@lists.ourhost.com would get piped into mailman_admin_cmd_handler.py

(Our host’s control panel was cPanel, and I used the Forwarders option in the Mail section to “Add a Forwarder”, then “Address to Forward” of mailman-admin-cmd, then “Advanced Options”, then “Pipe to a Program”, then /path/to/mailman_admin_cmd_handler.py)

If you use /etc/aliases for Sendmail or Exim, as the newlist output suggests, then you’d probably do something like:

## admin interface for all mailing lists
mailman-admin-cmd:   "|/path/to/mailman_admin_cmd_handler.py"

Actually that’s a simplification of what I really did. I really used procmail because that’s what handled all our other mail forwarding needs. So in “Pipe to a Program” I piped it to |/usr/bin/procmail -m /home/ouruser/.procmail/rc.mailman-admin-cmd and then in that procmail recipe file I put:

:0 H
* ^TOmailman-admin-cmd@lists.ourhost.com
| $HOME/.procmail/mailman_admin_cmd_handler.py

(actually I had a few other things in that file, but a procmail tutorial is beyond the scope of this guide—type “man procmailrc” or “man procmailex” at a shell for more info)

This was all on our dev box, and part of why I stuck mailman_admin_cmd_handler.py in ~/.procmail was that my goal was not to make any modifications to the base Mailman installation in case our production list host didn’t let us muck with those files.

This placement had a few minor consequences. One is that I had to create symbolic links for two mailman “bin” programs, list_members and clone_member, into the same directory as the handler script, adding .py to the symbolic links so the handler script could import them. For example:

cd /home/ouruser/.procmail
ln -s /path/to/mailman/bin/clone_member ./clone_member.py
ln -s /path/to/mailman/bin/list_members ./list_members.py

I also had to install a helper module (ListStream.py) and configure mailman_admin_cmd_handler.py with custom settings like the Mailman installation directory, the SMTP server, an optional string/password for the subject line of admin command messages (for added security), etc.

Even if you can put mailman_admin_cmd_handler.py in the mailman bin directory itself, you’ll still need to symlink clone_member to clone_member.py (and the same for list_members) within that directory so mailman_admin_cmd_handler.py can import and use those scripts, you’ll still need to save ListStream.py there, etc.

(But I’d still recommend keeping these custom scripts outside the installed Mailman tree, even if you have permissions to modify those files, simply because if you ever upgrade your Mailman installation, you don’t want to inadvertently delete/move the custom files and break a lot of stuff. Plus it’s just nice to have standard-install Mailman stuff in one area, and custom add-on stuff in another.)

I should also mention that when we finally chose a list host, they didn’t use alias forwarding/piping or procmail at all, they used a specialized postfix-to-mailman.py script so that no aliases needed to be made when new lists were created.

(If you’re using Postfix with Mailman see the GNU Mailman Installation Manual, particularly the Integrating Postfix and Mailman section; see also a related discussion on FreeBSDDiary.)

Downloading the Files

Here are the basic files for setting up the administrative email interface; click to view, right-click and Save to download:

test_mailman_admin_cmd_handler.py is completely optional, but I’d recommend downloading and running it just to check that you made all the necessary customization changes in mailman_admin_cmd_handler.py. Just be sure to look through it thoroughly before running it, because you’ll need to choose your own test values.

IMPORTANT NOTE: I’ve tried to make sure that any code I’m providing has the word INSTALL in comments to mark any places you’ll need to modify the code for your own installation (i.e., places where your installation will probably differ from mine). So you need to go through every file you download and search/grep for INSTALL and update the corresponding settings.

Installing

The following are the steps I used to install the administrative email command handler on our production host. In some cases your environment may differ, such as the difference between using Sendmail/procmail filtering (on our dev box) and using Postfix (on our production box) as described above.

First, I created a directory to hold all custom Mailman related files, ~/mailman.

I also noted the locations of the system (installed) Mailman files, which seemed to be /etc/mailman, /var/lib/mailman, and /usr/lib/mailman. In particular, I needed to know the path to the directory tree holding the Mailman bin scripts and helper objects, which the admin command handler uses. (For us, those were /usr/lib/mailman/bin/ and /usr/lib/mailman/Mailman/ respectively.)

After downloading the files in the “Downloading the Files” section above into the ~/mailman directory I needed to modify the /etc/mailman/postfix-to-mailman.py file in order to allow commands sent to mailman-admin-cmd@lists.ourhost.org to get sent to the handler script:

                  ("/usr/sbin/sendmail", MailmanOwner))
         sys.exit(0)

+    # Local customization
+    elif local == 'mailman-admin-cmd':
+        mm_admin_pgm = '/home/adf/mailman/mailman_admin_cmd_handler.py'
+        os.execv(mm_admin_pgm, [])
+
     # Assume normal posting to a mailing list
     mlist, func = local, 'post'

(If you’re not familiar with diff output, that means I added the four lines with the pluses to the area between the sys.exit(0) line and the # Assume lines.)

At this point I really wanted to send a command to see what would happen, but I thought it would be better to run the handler’s unit tests first to see if I’d missed something basic.

To do this, I needed to search the downloaded files for server-specific things to edit that had been flagged with INSTALL comments (such as the location of our Mailman files in /usr/lib/mailman).

I also had to, as described above, create the list_members.py and clone_member.py symlink files in ~/mailman pointing to list_members and clone_member in /usr/lib/mailman/bin.

The unit tests assume a test list called test-list, so I had to use newlist test-list in /usr/lib/mailman/bin/ to make it (you can delete it later with rmlist if you wish).

The unit tests in test_mailman_admin_cmd_handler.py also assume there is a test user already on the test list, so I did this with:

echo "test@ourhost.org" | /usr/lib/mailman/bin/add_members -r - test-list

Once I’d done that, and made sure all the other settings in test_mailman_admin_cmd_handler.py were correct (especially in the setUp function), I ran test_mailman_admin_cmd_handler.py at the command line and verified that all the unit tests passed.

(If you do this and the tests don’t pass, you’ll need to look at the tests that failed in test_mailman_admin_cmd_handler.py and figure out what went wrong and why.)

Testing

Now, having verified that the code was set up correctly via the unit tests in test_mailman_admin_cmd_handler.py, I wanted to send a test command by email.

I should mention that I hit a snag at this point that you might not.

I had decided, as part of the overall migration strategy, to have our new Mailman list host be lists2.ourhost.org while continuing to run our lists on the soon-to-be-old Listserv lists.ourhost.org.

When Mailman was set up properly, including creating all lists and copying all users and their options, so that the two hosts were in sync, the plan for the cutover was to just change where lists.ourhost.org pointed to. This worked out pretty smoothly in the end, by the way, but I wanted to explain the lists2.ourhost.org thing now.

So I sent a command to mailman-admin-cmd@lists2.ourhost.org, putting help in the subject line and body.

That should have generated an error because the default setup requires any emailed admin commands to have a password in the subject line for security, and of course most list-related commands require their own passwords too (the list password, the site admin password, or the list owner’s password if the list owner is on the list and the email is sent from that account).

Unfortunately that simple “help” test produced the following error:

550-Verification failed for <mailman-admin-cmd@lists2.ourhost.org>
550-The mail server could not deliver mail to
mailman-admin-cmd@lists2.ourhost.org. The account or domain may not
exist, they may be blacklisted, or missing the proper dns entries.
550 Sender verify failed (in reply to RCPT TO command))

Also, the error only happened when I sent from myaddress@ourhost.org; when I sent the help test message from anotheracct@gmail.com (or any domain which was not ourhost.org) it worked fine and produced the expected “Required password not present in subject line.”

The admin for our new list provider said, “Because this test domain, lists2.ourhost.org, does not have Sender Verify set up on it, it’s failing the check and your ourhost.org mailserver is rejecting it.” I researched this and found that what he called “sender verify” refers to the Sender Policy Framework, aka SPF record, an anti-spam related DNS record.

Eventually our old host (which was still running the DNS for ourhost.org) solved the problem and when I asked them how, they wrote:

I’m not sure I remember all the steps along the way. I originally
created an A record for lists2.ourhost.org, and an MX record that
referred to the same IP address. When all was said and done I had
an A record, an MX record that referred to lists2.ourhost.org (instead
of to its IP address) and a TXT field with the following contents:

spf1 a mx a:208.85.173.148 -all

Here are the tinydns source records that I used:
+lists2.ourhost.org:208.85.173.148:3600
@lists2.ourhost.org::lists2.ourhost.org:20
‘lists2.ourhost.org:v=spf1 a mx a\072208.85.173.148 -all:3600

(I’ve altered the IP addresses for the privacy of the org I did this for, btw.)

I wish I could give more information about this Sender Verify problem and what really solved it, but at least if you encounter the same thing this info might help.

Anyway, having solved the Sender Verify problem, I re-sent the test message to mailman-admin-cmd@lists2.ourhost.org with help in the subject and body and got the expected “Required password not present in subject line” response.

Then I sent another test message, this time with help PASSWORD in the subject (no quotes, where PASSWORD was the value of SUBJECT_PASSWD in mailman_admin_cmd_handler.py) and help in the body, and got the expected response, “Valid commands:” followed by a list of valid commands for the admin command handler.

I didn’t do an exhaustive test of every command, since the unit tests in test_mailman_admin_cmd_handler.py sort of do that, but I did try one more command besides “help” just to make sure the basic mechanism was working.

I tried sending the “who” command to see who was subscribed to the test list I’d set up. Specifically, I sent a message with a subject of who test-list PASSWORD (where PASSWORD was the SUBJECT_PASSWD value in mailman_admin_cmd_handler.py) and a body of who test-list LISTPASSWORD (where LISTPASSWORD was the password I set for the list when creating it).

(Note: You can actually use anything in the subject line as long as it has the admin command handler password, but I like to keep it as descriptive as possible, and for a single command that usually means using the command itself minus the list password.)

After a few issues involving list ownership (I’d mistakenly sent the command from an account that wasn’t one of the list owners for the test list), I finally got a roster of subscribers for test-list so I felt a little more confident that the basic system was working correctly.

Next: Creating Mailman Lists Based On Listserv Lists or Up: Table of Contents

Listserv to Mailman Part 1.2: Installing Swish for Archive Searching

Background

Mailman, unlike Listserv, doesn’t come with built-in list archive searching. (Which is funny since I’d always thought Listserv’s archive search was clunky and dated—but at least it had archive search! :) )

I think Mailman may not have archive searching because the developers wanted to keep their focus on the mailing list software itself rather than trying to write and maintain search software too. They probably figured it would be better to leave search to search software developers, and also give people the freedom to install whatever search package they want.

Nonetheless, it would have been a great relief if Mailman had just come with a default search package for list archives that could be uninstalled/overridden if necessary, rather than making everyone who wants to provide list archive searching (which must be a pretty common requirement) re-invent the search wheel by hunting down a package and figuring out how to integrate it into Mailman.

The Swish-E Package

After a good deal of searching (ha) I eventually found that many others who also wanted searchable list archives seemed to lean toward using Swish-E (swish-e.org, Wikipedia).

(Note: Since writing this guide, I’ve found that others have used htdig successfully for archive searching too; see the _README file at msapiro.net/mm/ and this documentation.)

Fortunately, I already had experience with Swish because our organization already used Swish for the search feature on its main website.

I wish I could provide step-by-step instructions for setting up Swish , but I forget the steps I used to install it from source on our development box (web host) back in 2003, and the server admins on our production list host installed it for me, probably using a binary package manager like RPM or Apt.

Even though I don’t have step-by-step instructions for installing Swish (though if you are root, downloading/unpacking the source, running “configure” and “make install” will probably do the trick), I wanted to include a section about it for a few reasons.

Aside from pointing you to the main swish-e.org site, in particular the Download and Documentation sections (the latter including a nice INSTALL page) , I also wanted to point you to the extremely helpful Integrating Mailman with a Swish-e Search Engine page.

I ended up only following some of that page’s advice, but it was invaluable for getting started on the Listserv to Mailman archive conversion as far as what issues to consider. So that page is definitely worth checking out just to see what’s involved. (Though note that I ended up writing code to automate some of the things on that page, which I’ll cover later.)

First though, just focus on installing Swish-e itself, which indexes and searches 1) the Mailman HTML archive pages (one page per message) and, optionally, 2) any message attachment files such as PDFs or Word docs by using extra Swish add-on/filter programs.

Using Swish-E for Searching Message Attachments

While the basic setup of Swish for HTML archive messages is fairly easy, setting it up to search non-HTML/text files such as Word, Excel, and PDF attachments is a little more tricky. I struggled with this when I set up Swish to search binary files on our website, so I wanted to give some info on it in case you want to support searching message attachments too.

(Mailman does allow attachments to be included in list archives, it just separates each from its associated message and puts a link to the attachment file at the bottom of the archived message.)

Note though: This entire section (the rest of this page) is optional. To be blunt, if you don’t have to support searching archive message attachments, make your life easier and don’t do it; you can always go back and add it later if you need to. We didn’t even do it for our list archives, I’m just including this info about using Swish to index binary files based on my experience doing that on our main website.

For a general article about indexing HTML pages and other file types with Swish-E, see How to Index Anything from the Linux Journal in 2003. At first glance that page is very good, but I haven’t examined it in detail and it’s possible that some details have changed since 2003.

Basically, for non-text/HTML files Swish relies on external helper programs to extract text from each file. For example, it uses the pdftotext program in the xpdf package for extracting text from PDF files, the catdoc program to get text from Word .doc files, etc. See the “Optional But Recommended Packages” section of the Swish-E INSTALL doc for more info on what’s available.

(I haven’t yet tackled the issue of extracting text from Word’s newer .docx file format for Swish; if I do, I’ll update this page, but if you’ve already done so please let me know; some possibilities are 1) docx2txt, 2) unoconv though that might unusable by Swish because it requires a running OpenOffice instance, or 3) maybe even a quick-and-dirty unzip/sed/grep combination.)

For more information about supporting searching of PDF, Word, etc. files, see “How do I index my PDF, Word, and compressed documents?” and the sections after it on the Swish-E FAQ page as well as the example filters in the “Document Filter Directives” section of the SWISH-CONFIG man page.

Note: I had to tweak the example given on that last page for PDF and DOC files, which were the only two binary file types I included in our website search. Specifically, the SWISH-CONFIG page gave the example of:

FileFilter .pdf       pdftotext   "%p -"

and that produced an error during indexing; every time the Swish indexer encountered a PDF file and tried to run pdftotext, it printed pdftotext’s usage info:

pdftotext version 3.02
Copyright 1996-2007 Glyph & Cog, LLC
Usage: pdftotext [options] <PDF-file> [<text-file>]
  -f <int>          : first page to convert
  -l <int>          : last page to convert
  ... etc.

To fix this, I had to tweak it to:

FileFilter .pdf pdftotext "'%p' -"

Actually I chose to spell out the full /path/to/pdftotext instead of just pdftotext there, but you get the idea—the main difference is in the quoting at the end, to put %p within its own single quotes.

I had to do the same thing with the catdoc example; the SWISH-CONFIG page suggested:

FileFilter .doc     /usr/local/bin/catdoc "-s8859-1 -d8859-1 %p"

… but that failed too so I enclosed the %p in single quotes there as well:

FileFilter .doc /path/to/our/catdoc "-s8859-1 -d8859-1 '%p'"

Character Set Problems While Indexing

Another issue was that at some point I started getting character set error messages when running the Swish indexer. I wish I’d documented the problem and my solution when it happened, but I’ve done my best to reconstruct the issue in case it’s useful to someone.

I believe the error may have been something about catdoc not being able to find the ascii.replchars and/or ascii.specchars character set files. I think I hunted around to find these files, but since catdoc seems to be a fairly old (and seemingly unmaintained) package, my search was fruitless.

Ultimately I think my solution was that I noticed I did have ascii.rpl and ascii.spc files, so I copied those to ascii.replchars and ascii.specchars and that fixed the problem.

I’m also still getting the following error (from pdftotext I believe) when running the Swish indexer on our web site:

Error: Unknown character collection 'Adobe-Korea1'

I searched online and couldn’t find anything for that specific character set, but when I searched for “swish unknown character collection” I found this post which recommended upgrading the xpdf package as a possible solution. I haven’t tried it yet because it’s only a few errors and I hate to upgrade things unless I absolutely have to, but I wanted to mention it here in case someone else gets similar errors using Swish to index PDFs.

Next: Installing an Administrative Command Handler or Up: Table of Contents

Listserv to Mailman Part 1.1: Installing Mailman

Installing from Source

This page describes how I installed Mailman from source on our development server (a shared web hosting environment). However, it’s also possible to install through a binary package manager such as RPM or Apt, which is actually how our list host admins installed it onto our production server. So things might end up in different locations depending on how Mailman is installed.

Choosing a Version and Downloading

My first question for installing Mailman on our dev box was which version. This mattered because we had not yet chosen a list host but I assumed that whatever host we ultimately chose would probably have the latest stable and mature version.

Figuring out the latest stable and mature version of Mailman turned out to be a bit challenging. By examining the “timeline” graph at https://launchpad.net/mailman/+series (specifically, by hovering over the graph and hitting the little magnify-minus sign several times so I could eventually see all the way to the right), I found that at the time of checking (early Dec 2009), version 2.1.12 was the latest stable version since it was the most recent on the 2.1 “latest stable release” line.

I tried to download mailman-2.1.12.tgz from launchpad.net several times, but it never worked, so I finally downloaded it from http://ftp.gnu.org/gnu/mailman/?C=M;O=D

(Both of the above download sites were found on the main Mailman site at http://www.gnu.org/software/mailman/download.html)

Installing Mailman to a Custom Directory

I’ll skip past the usual technical details of downloading and unpacking files (I did it from the shell command line with: wget http://ftp.gnu.org/gnu/mailman/mailman-2.1.12.tgz and then: tar xvfz mailman-2.1.12.tgz).

In fact I wouldn’t even mention the installation process except that my first attempt failed because the web host I was using (which is not the one I am happily using now, Dreamhost) had pre-mapped the /mailman URL for me, and it resisted any attempts to override/fix it.

This was a problem because when I first installed Mailman on our dev box, created a test list (with the newlist command), and then tried to go to something like http://ourdomain.org/mailman/listinfo/test-list it told me there was no such list—even though I had just created it. I banged my head against the wall for a while until I figured out that my web host had pre-mapped /mailman and that there was no way around it.

(If you want to see if your web host has done this, I recommend going to http://www.yourhost.com/mailman/ before installing Mailman, and seeing if you get the same error page as when going to http://www.yourhost.com/badpagelink/ or if you get some unexpectedly-different page instead.)

So eventually I had to delete that first installation and install again into a different directory with the following options:

./configure --prefix=/home/ouruser/public_html/mm
--with-username=ouruser --with-groupname=ouruser
--with-mail-gid=ouruser --with-cgi-gid=ouruser
--with-mailhost=mail.ourhost.org --with-cgi-ext=.cgi

(I had to use the –prefix option, rather than accepting the default installation location, because my dev box was a shared hosting environment where I was just a regular user, rather than having root/admin permissions and being able to install into system directories like /var/mailman.)

After running configure and make install I then created a test list by running the mailman “bin” program newlist test-list (in ~/public_html/mm/bin). Then I tried going to http://ourdomain.org/mailman/listinfo/test-list and things seemed to work fine until I tried to log in, at which point the fact that Mailman insists on setting cookies with a Path of /mailman caused my logins to not “stick”.

I thought this would be easy to fix by editing the cookie set by Mailman from /mailman to /mm, but it turned out that most of the cookie-related add-ons for Firefox do not allow you to edit cookie values.

The Add N Edit Cookies Firefox add-on seemed promising, but when I edited the Path value of the cookies set after logging in to Mailman, to change from /mailman to /mm, it didn’t actually save the changes. But I figured out a workaround: copying the cookie values for the old login cookie (Name, Content, Host, Path, Expires) to a temp text file, deleting the cookie, then creating a new cookie with the same values except a different Path (and plus 20 years for the Expires value too).

All these machinations were just to deal with having to install Mailman to /mm instead of /mailman on our dev box because the web host had pre-mapped /mailman for us and we couldn’t override it. Hopefully you won’t have that problem at all; I highly recommend just installing using the defaults if you can!

For more information about installing Mailman see the official GNU Mailman Installation Manual. There’s also a helpful page covering more technical aspects of installing on various flavors of Linux (including the easy way of just installing from an RPM or YUM package) at the YoLinux Tutorial: GNU Mailman Email List Installation and Configuration.

Testing the Installation

If you’ve installed Mailman into the default /mailman and used newlist to create a “test-list”, you should be able to use the following URLs to check if your installation is correct:

  • http://yourhost.com/mailman/listinfo/test-list for info about the test list
  • http://yourhost.com/mailman/admin/test-list for admin pages about test list
  • http://yourhost.com/mailman/listinfo for info on all lists on the server

If you had to install Mailman to a custom directory like /mm and/or use a custom CGI file extension like .cgi, then you’d want to use something like the following:

  • http://yourhost.com/mm/cgi-bin/listinfo.cgi/test-list for info about the test list
  • http://yourhost.com/mm/cgi-bin/admin.cgi/test-list for admin pages for test list
  • http://yourhost.com/mm/cgi-bin/listinfo.cgi for info on all lists on the server

I haven’t covered setting up the automatic Mailman daemons and cron jobs because I couldn’t do that on our dev box’s shared hosting environment, and they were already set up on our production server. If you’re installing on your own box as root, the default options to configure and make install should take care of all that for you.

Next: Installing Swish for Archive Searching or Up: Table of Contents