[phenixbb] local BLAST server

Zhijie Li zhijie.li at utoronto.ca
Fri Aug 8 08:19:00 PDT 2014

Hi Rob,


You may want to check the pdbaa.gz first. It seems this is the PDB subset of 
the nr database.

ftp://ftp.ncbi.nlm.nih.gov/blast/db/README says:
pdbaa.*tar.gz          | protein sequences from pdb protein structures, 
| its parent database is nr.


Hi Rob,

The BLAST nr database (fasta format) can be downloaded from the NCBI ftp:
As I remember it is the nr.gz file. When unzipped the file is called "nr".

According to BLAST the nr database does contain PDB entries.

It is significantly larger than the PDB data file you are currently using.
You might consider extract all the PDB sequences from it so that you do not
need to go through all the non-PDB sequences.


-----Original Message----- 
From: R.D. Oeffner
Sent: Friday, August 08, 2014 10:14 AM
To: phenixbb at phenix-online.org
Subject: [phenixbb] local BLAST server


I'm in the process of installing a local BLAST server for doing blast
protein queries. As I understand it I need a file with all the FASTA
sequences as input for initially generating my local BLAST database.
The one present in
ftp://ftp.rcsb.org/pub/pdb/derived_data/pdb_seqres.txt seems to
contain redundant entries. Querying it produces many extra PDB
chain-ids when compared to a BLAST query on the NCBI web server.

Does anyone know where to get a non-redundant version of FASTA records
so that I can create a similar database as the one used by NCBI?

Many thanks,


Robert Oeffner, Ph.D.
Research Associate, The Read Group
Department of Haematology,
Cambridge Institute for Medical Research
University of Cambridge
Cambridge Biomedical Campus
Wellcome Trust/MRC Building
Hills Road
Cambridge CB2 0XY

tel: +44(0)1223 763234
mobile: +44(0)7712 887162
phenixbb mailing list
phenixbb at phenix-online.org

More information about the phenixbb mailing list