Listserv to Mailman Part 2.3: Migrating Subscribers And Keeping Them In Sync

Introduction: Options Make Things Complex

To migrate subscribers, ideally we’d just get the addresses on each Listserv list with a REVIEW command and then subscribe those addresses to each new Mailman list of the same name.

The problem is that we need to copy not only the subscribers but their list option settings, such as NOMAIL, DIGEST, etc. This makes it much more complicated.

The first thing I did was identify the few “concealed” addresses on our Listserv lists so I could change them to NOCONCEAL. I did this by sending QUERY LISTNAME WITH CONCEAL FOR *@* to LISTSERV (replacing LISTNAME with one of the Listserv list names), for each list.

Since there weren’t a lot of concealed addresses, I set them “noconceal” by sending QUIET SET LISTNAME NOCONCEAL FOR ADDRESS PW=LISTPASSWORD for each subscriber to LISTSERV (replacing LISTNAME with the list name, ADDRESS with the subscriber address, and LISTPASSWORD with the Listserv list password).

I wanted to eliminate concealed statuses because I needed to make sure all subscribers would show up when I did a REVIEW command to generate one subscriber file for each list (more on this later).

Listserv Subscriber Settings And Mailman Equivalents

You may find it helpful to see a list of possible Listserv subscriber options and what settings/actions are available in Mailman to match them:

Listserv Subscriber Options Mailman Equivalent
MAIL/NOMAIL (whether to get posts; MAIL is assumed unless NOMAIL is set) each subscriber can have the delivery flag set “on” or “off”
DIGEST/NODIGEST (whether posts are batched up into periodic digests; NODIGEST is assumed unless DIGEST is set) add_members bin command has a -d option to add people with the digest setting; afterwards, a subscriber can have the digest flag set “plain”, “mime”, or “off”; see also the digest_is_default and other digest_ list config setting
MIME/NOMIME (whether digests are sent to a user in MIME format; NOMIME is assumed unless MIME is set) each subscriber can have the digest flag set “plain”, “mime”, or “off”; plain is equivalent to NOMIME and mime is equivalent to MIME; see also the mime_is_default_digest list config setting
INDEX/NOINDEX (a variant of digest mode; instead of digest batches of messages, just the subject lines of posts are sent as a summary) no equivalent that I am aware of, and that’s fine because I don’t really see how useful a summary is with just subjects
ACK/NOACK/MSGACK (sends a “your post was received” confirmation message, or not; NOACK is assumed unless ACK is set; MSGACK is obsolete) each subscriber can have the ack flag set “on” or “off”, though I don’t see the point—it seems to make more sense to just use getting your own posts back (with the myposts flag) as the acknowledgement message
SUBJECTHDR (whether [list name] is put in the Subject field of every post, for filtering purposes) I believe this is only settable on a list-wide (not subscriber-specific) basis via the subject_prefix list config setting
CONCEAL/NOCONCEAL (whether someone shows up on a normal REVIEW of the list) each subscriber can have the hide flag set “on” or “off”; also the private_roster list config setting lets you restrict member addresses to other members or list admins; my mailman_admin_command_handler.py doesn’t support the hide option though since I think it’s a PITA for admins
REPRO/NOREPRO (whether someone gets copies of their own posts; REPRO is assumed unless NOREPRO is set) each subscriber can have the myposts flag set “on” or “off”
TOPICS (a way to let people only see certain categories of posts; I never understood or used this though, seemed easier just to create more lists) Mailman definitely seems to support this but we did not have to deal with it; there are a number of topics_ list config settings and supposedly users can choose to just get certain topics; see Mailing list topics in the docs
POST/NOPOST (whether someone is allowed to post to the list; POST is assumed unless NOPOST is set) no equivalent that I am aware of, though you can use the mod subscriber flag to cause someone’s posts to require approval, and then just never approve them 🙂
EDITOR/NOEDITOR (whether someone can bypass normal approval mechanisms required for posting to certain lists) I don’t think there is a per-user subscriber flag for this, but the moderator list configuration setting allows you to specify all moderator/editors at once; the only downside is that then they also get notices of any posts held for review and have the ability to approve/deny them too; can also use the accept_these_nonmembers list config setting to allow people to bypass approval, but I think it only works for non-subscribers
REVIEW/NOREVIEW (whether someone’s posts are specially held for approval on an otherwise unmoderated list—usually for bad behavior) each subscriber can have the mod flag set “on” or “off” (on = they are “moderated” and need their posts approve)
RENEW/NORENEW (whether someone should get periodic stay-subscribed confirmation emails, overriding the list setting) no equivalent that I am aware of, since I don’t believe Mailman sends out stay-subscribed (i.e., renew your subscription) confirmation messages

Note that some of these settings (like the mod flag for users) were difficult to discover, and sometimes required delving into the source code (especially the Mailman/Commands/cmd_set.py module).

More Preparation Work

At this point we’d changed anyone who was set CONCEALed to NOCONCEAL so we wouldn’t miss them in the conversion process. Then we needed to decide which of the subscriber settings were important to copy in the conversion and which were not. (The table above should help you think about the various subscriber settings to consider.)

For example, I decided that we’d support (i.e., carry over the subscriber settings on our old Listserv lists to equivalent Mailman settings on the new lists) the NOMAIL, DIGEST, MIME, NOPOST, NOREPRO, and REVIEW options.

I’d already decided to include list names in subject lines for all subscribers on all our lists, regardless of what their old Listserv SUBJECTHDR settings were, due to the fact that Mailman only has this setting at the list level.

(For people who had relied on list names in subject lines, to not include this setting on the Mailman lists would destroy their mail filters, while for those who didn’t rely on it, at worst it would be a mild annoyance to suddenly see list names in subject lines—though before our cutover I warned subscribers that list names in subject lines would soon be the default.)

The Listserv EDITOR flag was also something Mailman only supported on a per-list basis, but that wasn’t a problem for us because we’d always used the Editor= line in the old Listserv list headers, rather than the EDITOR flag on individual subscribers, so we dealt with it at the list setting level instead of here at the subscriber setting level.

I’d decided against supporting the CONCEAL flag, even though there is an equivalent setting on Mailman lists (“hide”), because I don’t think it really protects subscriber addresses (this is really best done with the “private_roster” Mailman setting), and over the years CONCEAL caused me a lot of headaches (e.g., I’d waste a lot of time trying to figure out a subscriber problem only to finally realize that they were concealed).

Finally, I decided we wouldn’t support the INDEX, ACK, or RENEW/NORENEW options, as Mailman doesn’t really have equivalents for those and I doubt anyone had set them anyway. I also didn’t bother with TOPICS since we’d never used it in Listserv.

So, for all of the “supported” options above, I had to send a message to LISTSERV seeing who, for each list, had these supported options set and saving the responses to each message in a separate text file. The message to LISTSERV was QUERY LISTNAME WITH OPTION FOR *@* (LISTNAME was the list name and OPTION was something like NOMAIL or MIME).

The reason we did it this way was that based on the way Mailman worked, it seemed best to do this in two steps: 1) Subscribe all subscribers of each Listserv list to the corresponding Mailman list, regardless of their settings, and then 2) go back and apply the special settings for certain people (such as REVIEW) by hand or by script.

Copying Subscribers And Settings

With our old Listserv setup we used cron on our web server to do an hourly REVIEW of each list and then save the results (which included all subscribers) to files, one file per list with the listname as the filename.

I’d written a quick and dirty script called queryopt.pl to email a query to Listserv to find out all subscribers with a given option on a given list, such as DIGEST, NOMAIL, EDITOR, etc.

Since all our list names began with abc- (where abc is our organization’s acronym), this allowed me to go into the directory containing all those review output files and do something like perl queryopt.pl nomail abc-* and have it send a bunch of Listserv queries, one for each list (because abc-* is expanded by the shell to all the abc- files, or all our list names).

I used the following procmail filter to save the results of these query emails to files named optionlistname.txt (e.g., nomail-abc-foo.txt would have everyone with the NOMAIL option on the abc-foo list):

PMDIR=$HOME/.procmail
# Save results of certain queries; pipe to prog for massage and save
:0 Hc
* ^From:.*LISTSERV@
* ^Subject:.*Re:.* query
| cat - | $PMDIR/save_opt_query.py

This matched the sender and subject line and then sent the email with LISTSERV’s query response into a program called save_opt_query.py.

One minor issue I had, with no easy fix, was that save_opt_query.py expected the Listserv response message to be like:

Subscription options for Mike Smily <mikednahelix@HOTMAIL.COM>, list
ABC-ANNOUNCE:

DIGEST         You receive list digests, rather than individual postings
SUBJECTHDR     Full (normal) mail headers with list name in message
               subject
REPRO          You receive a copy of your own postings
MSGACK         Short "TELL" acknowledgement of successfully processed
               postings

Subscription date: 30 Jul 2004

Subscription options for "Alexandria ." <alextheconquerer@YAHOO.COM>,
list ABC-ANNOUNCE:

DIGEST         You receive list digests, rather than individual postings
SUBJECTHDR     Full (normal) mail headers with list name in message
               subject
REPRO          You receive a copy of your own postings
MSGACK         Short "TELL" acknowledgement of successfully processed
               postings

That worked most of the time, but sometimes the name/email was too long so the name was on one line and the email was on the next. So after importing our subscribers I had to go back and manually fix those few cases (by adding those addresses to lists and setting their options), but it was only about 30 people for 70 lists so, while annoying, it was an acceptable trade-off to keep save_opt_query.py as simple as possible.

(As to how I identified the problem records, I looked for lines in the saved optionlistname.txt files that did not have the <foo@bar.com> brackets, for example with: egrep -v ‘<‘ *.txt)

As mentioned before, the Listserv subscriber options I decided to support (i.e., copy over to the new Mailman lists) were: DIGEST, MIME (a special case of digest), NOMAIL, NOPOST, NOREPRO, and REVIEW. I also wanted to run queryopt.pl for NODIGEST to get the “regular” (non-digest) subscribers, because the Mailman add_members command has flags to specify a file with non-digest subscribers and a file with digest subscribers to add.

So after testing the procmail filter to make sure it would catch and save the option query output into files correctly, I did the following to run it for all the options:

for option in digest nodigest mime nomail nopost norepro review; \
do perl queryopt.pl $option abc-*; done

(Note that this used looping syntax specific to the bash shell; your shell’s loop and variable substitution might vary, such as requiring parentheses—if you don’t know what I’m talking about read your shell’s man page, search the web, or ask another developer/admin.)

To check if all responses had been saved properly I just did a file count based on filenames:

for option in digest nodigest mime nomail nopost norepro review; \
do echo $option; ls -1 ${option}-* | wc -l; done

(checking that the number of each option file equaled the number of lists we have, for example with 71 lists we should have had 71 nodigest-* files)

Since I’d been doing all this on our web server where the hourly Listserv REVIEW files resided (which allowed me to do abc-* in the shell to expand to all our listnames), I had to copy these files (which were the results of queries to LISTSERV to find out the options people had on our soon-to-be-old Listserv lists) over to our new list host for automated subscription and option setting on the new Mailman lists.

Importing Subscribers and Setting Options

Now we had a bunch of option-related text files (optionlistname.txt) on our new Mailman list server. I used shell looping again to do a batch import of subscribers followed by option setting for subscribers with special options like NOMAIL.

Since I’d already created new Mailman lists on our new list host equivalent to the old Listserv lists I could do the following:

for lst in `/usr/lib/mailman/bin/list_lists -b | egrep -v '^mailman$'`; do
echo add_members -r nodigest-${lst}.txt -d digest-${lst}.txt -w n -a n $lst
echo set_option.py mod on ${lst} 'tree#bard' review-${lst}.txt
echo set_option.py mod on ${lst} 'tree#bard' nopost-${lst}.txt
echo set_option.py myposts off ${lst} 'tree#bard' norepro-${lst}.txt
echo set_option.py delivery off ${lst} 'tree#bard' nomail-${lst}.txt
echo set_option.py digest mime ${lst} 'tree#bard' mime-${lst}.txt
done

As before, when using shell looping I like to use “echo” statements to check the commands before running them. If all looks good, I remove the echos to really run the commands.

(In fact, I first ran the add_members and set_option.py commands manually on one of our lists and verified their success by sending “who” and “show” commands to the admin command handler, before using the shell loop to run them on ALL lists.)

The commands above first subscribe people in bulk using the Mailman add_members bin command with options to specify files with non-digest and digest subscribers, and to not send any notifications of the subscriptions. Then they use another script I wrote (set_option.py) to apply the Mailman option settings to subscribers and thus copy the old Listserv settings to the new subscribers.

Keeping Subscribers and Options In Sync

At this point I had the Mailman lists set up as a snapshot in time of the Listserv lists, but I needed to do additional work to make sure they’d stay in sync going forward (i.e., to run them in parallel with the Listserv lists, in terms of subscribers and subscriber options).

Fortunately, all of our subscribers seemed to use our CGI-based web subscription management pages rather than emailing LISTSERV directly or using the Listserv web interface and, as described before, those CGI scripts just generated admin email commands to LISTSERV.

This worked out well for our migration because I was able to add Mailman code right after Listserv related code, to do the same things for the Mailman lists (such as subscribe, change options, unsubscribe, etc.).

To get a little more detailed, our web pages called CGI scripts which in turn called a MailingLists.py module to do the actual work for our Listserv lists. For the migration I copied MailingLists.py to MailmanLists.py and kept the function names the same but changed the actual commands to use the custom Mailman admin command handler. Then I could just make sure that every time a CGI script used a function in MailingLists.py it called the same function in MailmanLists.py to keep the old Listserv and new Mailman lists in sync.

This was still a little time-consuming, searching all our CGI scripts to see where MailingLists.py was called and then adding calls to MailmanLists.py, but it was worth it.

But it wasn’t just CGI scripts—I also had to check for cron jobs or shell scripts I’d written that used MailingLists.py functions too, and add equivalent calls to MailmanLists.py like in the CGI scripts (and keep track of all these changes so that I could go back and remove the MailingLists.py calls after the migration was done).

Fortunately, there were relatively few cron jobs or shell scripts which used MailingLists.py (really just one weekly cron job that checked whether the email addresses on “members only” lists actually corresponded to current members).

But this is definitely a case where utility scripts like tgrep came in handy, such as running a command like the following from the home directory on our web server, which also contained the web tree as a subdirectory:

find . -type f -size -512k -exec tgrep MailingLists {} \; > /var/tmp/mls.txt

This started in the current directory (.) and looked through all files (-type f) less than 512Kb (-size -512k) and ran tgrep (which only searches text files) for “MailingLists” and printed out the filename and matching lines. The output was redirected someplace outside the current directory tree because otherwise find/tgrep would have kept matching lines in the output file itself, resulting in an ever-growing output file (ask me how I know this!).

(The unix find command is very powerful and the above only scratches the surface of the many awesome things you can do with it; for more information on its many useful options, Google something like using find linux.)

Keeping Subscriber Files Updated With Mailman

This may not apply to you at all, but I want to mention that we still use the loosely coupled web subscription pages whose input goes to CGI scripts which use objects to generate admin email commands to the list server; the only real difference after our migration is that the list server has changed from Listserv to Mailman. (See the lib directory, particularly MailmanLists.py—the other files in there are just supporting objects for it.)

You may remember that I said our old Listserv setup had an hourly cron job which sent REVIEW commands to LISTSERV and saved the replies into text files, one per list. That provided subscriber files our MailmanLists.py object could use, allowing the CGI scripts to have (relatively) recent information about which subscribers were on which lists.

We could have used the same setup with Mailman, sending email messages to the admin command handler every hour to get an updated subscriber roster for each list, but since we were allowed to submit cron jobs on our new list host, it seemed better to just run a job hourly on the list host that would directly extract the subscriber rosters, save them to files, and then copy them over to our web host using secure copy (scp).

By exchanging SSH key files ahead of time we were able to allow the scp copy to happen automatically without having to enter passwords for the file transfer. If you don’t know what I’m talking about, Google something like ssh passwordless login (since scp uses ssh underneath).

If you want to do something like this automated saving of list subscriber rosters to files, and then copying them somewhere else, you might find my short review_and_copy utility script useful.

Conversely, if you want to do something like we used to do with Listserv, where you regularly email Mailman’s admin command handler a “who listname” command and then save the results to a file, see the revmmlists script.

Similarly, if you’re paranoid about backing up your Mailman list configurations to dump files on a regular basis (with content equal to running config_list -o dumpfile listname at the command line), you might want to look at my backup_list_configs file.

And if you do a lot from the command line you might be interested in the sub, unsub, group_sub, and group_unsub scripts in the utils area.

That’s not necessary by any means, but I spent so much time configuring our lists that I want to know I can always recreate those configurations instantly if needed (with config_list -i configfile listname).

A Note About Passwords

I need to mention that since our website also had a private, login-based “members only” area, we wanted list subscribers who were also members of our organization to have their Mailman list passwords be the same as their website login passwords.

On the other hand, if someone was a subscriber but not a member, we wanted all their passwords set as the same default password.

This didn’t matter so much for subscription management because we had our own custom pages for subscribing/unsubscribing/etc. In fact, we didn’t actually want people going to the Mailman list pages for those things, we wanted them to go to our pages so we could also do things like keeping mailing list and website passwords in sync.

However, subscriber passwords did matter for accessing list archives since all our list archives are password-protected to subscribers only, so we wanted the password to be a no-brainer, just-use-the-same-password-as-your-website-login thing.

Shortly copying the subscribers from Listserv to Mailman and setting their options, then adding code to keep Listserv and Mailman subscribers/options in sync, I realized that I needed to also keep subscriber passwords in sync, and not just between Listserv and Mailman, but in the case of members, in sync with each member’s corresponding website login password too.

I dealt with this in a two-stage manner, just as I had for the options. The first stage was doing a massive sync to get everything correct as of a snapshot point in time. Then the second stage was to make sure it would be correct going forward.

For the first item, the one-time-fix, I coded a utility script called sync-list-passwords that used objects representing our membership database and website login system to check whether someone was a member and if so, whether they’d set up a login on our site. The script checked these and then set the Mailman list passwords appropriately depending on which case each subscriber fit into.

For the second item, I simply made sure in the MailmanLists.py code that whenever someone got added to a list it would see if they were a member and had a website login, and then send a password setting command to the admin command handler. If someone were removed from a list there was nothing that needed to be done, and if someone changed their address the clone_member Mailman code took care of copying over their password.

I also had to add code to our website login system so that if someone changed their login password there, it sent a message to the Mailman admin command handler to change their password on all their list subscriptions as well.

I’d be very surprised if your setup matched ours or was even as complicated, but the point is that Mailman does assign passwords to subscribers for each list (randomly I think, if you don’t specify), and you will need to think about this if you intend to have private list archives or intend to allow subscribers direct access to the Mailman web interface pages (since subscribers need to log in to change anything).

Further, if you have a lot of lists you’ll need to think about how you’re going to handle the fact that if you don’t do something to actively keep subscriber passwords the same across all those lists, they could end up with different random passwords on different lists, which could be a big problem.

And it’s a lot better to come up with a password-handling strategy up front, before you start your migration, than in a panic after the migration has already started (ask me how I know this!).

Next: Converting Listserv Archives to Mailman or Up: Table of Contents