A Python Script to Update NCBI BLAST Databases

With added functionality to check if databases are being used.

Updating BLAST databases from NCBI can be done using the update_blastdb command included in the BLAST+ package. For example, the following command will download and/or update the swissprot protein database in the current directory:

update_blastdb --decompress --passive swissprot

which should print:

Connected to NCBI
Downloading swissprot.tar.gz... [OK]

List of downloaded files:

ls 
swissprot.tar.gz  swissprot.tar.gz.md5

One issue with this approach is that any long running BLAST jobs currently accessing the database will be aborted. To overcome this problem, I wrote a wrapper around the update_blastdb command - blastdb_updater.py.

It uses a symbolic link to the latest version of the database and only updates the link if the database is not being used. If the database is being used, the script adds a message to the log after the database download is complete. The link can then be updated manually later.

Note
This script will only work on Linux/Unix-like systems due to its dependence on the lsof command to check if a directory is being accessed.

Download python script

From vimalkvn/sysadminbio repository on GitHub:
Link: blastdb_updater.py

Save this script as blastdb_updater.py under /home/user/programs/blastdb_updater.py (only used for the purpose of the examples below). It can be saved somewhere else.

Usage

Assuming you would like to download the swissprot database to /home/user/blast, use:

python /home/user/programs/blastdb_updater.py \
  -d swissprot -p /home/user/blast

A log file will be available under
/home/user/blast/log/blastdb_updater.log.

To use the database in your BLAST search, you can use:

blastp -db /home/user/blast/swissprot/swissprot \
  -query sample.fasta

Other databases (supported by update_blastdb) can be downloaded in the same manner.

Automated update

An automated update can be setup using cron:

MAILTO=email@domain
0 0 1 * * /home/user/programs/blastdb_updater.py \
-d swissprot -p /home/user/blast

The above cron job will update the database on the 1st of every month.