Using phred/phrap/consed
     Phred, phrap and consed are software packages for base call of ABI files etc., assembling and editing contigs visually examining electropherograms (traces).
     To start from the software installation, read from 1 getting the softwares. If you are an end user of the installed system, read from 4 setting-up for a user. If the administrator is nice enough for settin up your local system, read from 5 basecalling and assembling.


1. GETTING THE SOFTWARES

     Visit;
http://www.phrap.org
and follow the instructions written there. The software download site allows to download consed only for a licensed computer with a particular global IP address.


2. HARDWARE

     The phred/phrap/consed are software packages executable on UNIX. To run these softwares a RedHat Linux v.9 machine is set in my lab in which these packages are installed. A few Windows machines work as clients connected with the Linux host. A number of clients make access to use the softwares installed in the host machine. To follow the academic user license agreement, the licensed user puts the packages in a directory where other users of his/her group can make access to use them, so it is unnecessary to copy the softwares for the group users.
     The LAN system of our institute is of 10base/T and it is slow. To avoid stresses using consed with complicated graphical interface, a tiny 100base/T LAN system was constructed between the Linux and Windows machines through a 10/100base/T network hub. Connecting the hub to the institute LAN, Windows machines can have accesses to internet, and we can use the Windows machines for general purposes. To use the Linux machine as a Windows file server, samba and netatalk are installed in the Linux machine. This can bypass file manipulation with the Linux X11 which is not very much user-friendly.

system componentsystem component

     There are two ways to access and display X11 graphics on Windows machine with freewares. Both Cygwin and X-Deep/32 (v.4.6.5) work for use of phred/phrap/consed. I prefer X-Deep/32 because it requires a smaller disk space. Effective screen size on X-Deep/32 is, however, smaller than on Cygwin, because it displays a menu bar at the top of the screen. To get the biggest screen size in this limited environment, check off the 'Allways on top' box of the taskbar properties. Recently X-Deep/32 has been updated to v.4.7, and a freeware version is no more available.

taskbar properties
Checking off the 'Allways on top' box, application windows including X-Deep/32 can come in front of the taskbar. Click right mouse button on the taskbar to show the taskbar properties.

3. INSTALLATION

Extraction and Compilation of Source Files

Phred
Make an appropriate folder and copy "phred-dist-020425.c.acd.tar.Z" to it.

Example
     Through the Windows network access, make a folder "phred" in the user's home directory and copy the file to it. I was not able to extract files with the command written in the instruction document. I extracted the source files as follows.
     On a windows machine, start X-Deep/32 and establish connection with the Linux. Log in as a normal user. On the X11 window, click on start menu (red hat icon) -> System Tools -> Terminal to open a command terminal window. I suggest to make a shortcut icon of Terminal on the toolbar for convenience.

Example to Do This
$ cd phred
$ gunzip phred-dist-020425.c.acd.tar.Z
Then the original file is replaced with "phred-dist-020425.c.acd.tar".
$ tar xvf phred-dist-020425.c.acd.tar
Source files are extracted in the current folder.
$ make
A "phred" executable file is built. Currently it works fine without any modification of the source. It may be necessary to edit "Makefile" or some other source files for other UNIX systems.

Phrap
Copy "distrib.tar.Z" to an appropriate folder (ex. making "phrap").
Uncompress and extract by gunzip and tar. Make to build the executable.

phd2fasta
Copy "phd2fasta-acd-dist.tar.Z" to an appropriate folder (ex. making "phd2fasta").
Extract and build as above.

consed (current version is 15)
Copy "consed_linux.tar.Z" to an appropriate folder (ex. making "consed").
Uncompress and extract by gunzip and tar. Three executable files appear.
"consed_linux2.4"
"consed_linux2.6"
"consed_linux2.6_dyn"
One of them is suitable for Linux system in use according to its Kernel version. To see the Kernel version, type uname command.
$ uname -r
"consed_linux2.4" is good for a system with Kernel version 2.4 or 2.5.
Log off.

Installation

     Log in as the root. Make a folder "usr/local/genome/bin". This folder is out of "/home/username/" (user account) domain, and unaccesible from Windows network access.

To do this, open a command terminal window.
$ cd /usr/local
$ mkdir genome
$ cd genome
$ mkdir bin

Copy "consed" and other appropriate files to the folder.
$ cp /home/username/consed/consed_linux2.4 /usr/local/genome/bin
$ chmod 555 /home/username/consed/scripts/*
$ cp /home/username/consed/scripts/* /usr/local/genome/bin
$ cp /home/username/consed/contributions/* /usr/local/genome/bin

Copy "phd2fasta".
$ cp /home/username/phd2fasta/phd2fasta /usr/local/genome/bin

Copy "phrap" and other files for assemble.
$ cp /home/username/phrap/cluster /usr/local/genome/bin
$ cp /home/username/phrap/cross_match /usr/local/genome/bin
$ cp /home/username/phrap/loco /usr/local/genome/bin
$ cp /home/username/phrap/phrap /usr/local/genome/bin
$ cp /home/username/phrap/phrapview /usr/local/genome/bin
$ cp /home/username/phrap/swat /usr/local/genome/bin
Make a folder "usr/local/genome/lib/screenLibs".
$ cd /usr/local/genome
$ mkdir lib
$ cd lib
$ mkdir screenLibs
$ cp /home/username/phrap/vector.seq /usr/local/genome/lib/screenLibs

Copy "phred".
$ cp /home/username/phred/phred /usr/local/genome/bin
Make a folder "usr/local/etc/PhredPar".
$ cd /usr/local/etc
$ mkdir PhredPar
Edit "phredpar.dat" according to sequencing system in use and copy.
$ cp /home/username/phred/phredpar.dat /usr/local/etc/PhredPar

Editing Script
     Placement of "phredpar.dat" is different folder from above for consed new versions on a default. I put it above folder as old versions. In this case, edit "addReads2Consed.perl" and "phredPhrap".
To do this, open and edit
/home/username/consed/scripts/addReads2Consed.perl and
/home/username/consed/scripts/phredPhrap
with gedit as follows.

     from:
-----------------------------------------------------------------
# change this to reflect wherever you put the phred parameter file
$szPhredParameterFile = $szConsedHome . "/lib/phredpar.dat";
#$szPhredParameterFile = "/usr/local/common/lib/PhredPar/phredpar.dat";
#$szPhredParameterFile = "/usr/local/etc/PhredPar/phredpar.dat";
-----------------------------------------------------------------
     to:
-----------------------------------------------------------------
# change this to reflect wherever you put the phred parameter file
#$szPhredParameterFile = $szConsedHome . "/lib/phredpar.dat";
#$szPhredParameterFile = "/usr/local/common/lib/PhredPar/phredpar.dat";
$szPhredParameterFile = "/usr/local/etc/PhredPar/phredpar.dat";
-----------------------------------------------------------------

Software installation finished.
Log off.

4. SETTING-UP FOR A USER

Making Access to the Linux Server from a Windows Machine
     Install an appropriate X11 emulator (X-Deep/32, etc) onto the Windows machine. Ask the root or administrator of the Linux server for accounts for both system login and samba login (username and password) as well as IP address or name of the server. I suggest to set the same name for both Linux and Windows usernames.
     Displaying Linux directories from your Windows desktop.
     To do this; open and follow My Computer -> Desktop -> My Network ... and then a remote folder with the username appears. Double click to open it. A dialog appears asking username and password. I suggest to make a shortcut onto the Desktop for convenience.
     Displaying Linux X11 window onto your Windows desktop.
     Start the emulator software. Input name or IP address of the server (according to the emulator software). For X-Deep/32, select 'IP: [ANY] accept requests on any local interface' at a dialog. Then a list of server names appears. Select the server with the name told by the administrator.

Add PATHs
Log in as a normal user.
Open a terminal window.
Add "/home/username/bin" and "/usr/local/genome/bin" to the current PATH list ($PATH) in this order. If the login shell is bash, which is the default in RedHat Linux, open ".bash_profile" in the user's home directory and change;

PATH=$PATH:$HOME/bin
     to:
PATH=$PATH:$HOME/bin:/usr/local/genome/bin

in which $HOME=/home/username usually.
If such line is absent from ".bash_profile", add that line and a line;
export PATH
also (if not). To see which shell is the login shell, type;
$ echo $SHELL

Log off.

Log in again as a normal user to activate the new PATH parameters.

Script for Starting consed
Make a folder "/home/username/bin".
$ mkdir /home/username/bin
The domain under "/home/username/" is accessible from the Windows network access of each user. Manipulation of files and folder is easier with Windows machine. Folder named "bin", however, is reserved file name by Windows, and above operation particularly should be done on Linux.
Make a script file with a name "consed" with gedit on Linux. To do this, open a new file with gedit and write;
consed_linux2.4
and save the new file as "consed" in "/home/username/bin".
Make the script file executable.
$ chmod 755 /home/username/bin/consed

     To share the system with a number of Windows machines, the only things to do for other machines are to add PATHs and to make the consed starting script for each machine.


5. BASE CALLING AND ASSEMBLING

     The sequence project is made in a project folder with the following structure.
"Project_folder"       The project folder (can be named freely)(containing the following sub-folders)
     "chromat_dir"     to put sequence files generated by sequencers
     "edit_dir"        for consed data files
     "phd_dir"         for sequence files base called by phred (phred files)
Make template folders for convenience, and then copy and paste them for each project. Copy ab1 or other sequence files of sequencer output to the "chromat_dir" folder. Use Windows network access for convenience.

Example to Do This
     Place a project template folder in the user's home directory. The domain under "/home/username/" is accessible from the Windows network access of each user. Manipulation of files and folder is easier with Windows machine. Copy, paste and rename (ex. "project_template" -> "project") the template folder with Windows network access as shown below.

project folder template

     Copy ab1 or other sequence files to "chromat_dir" folder in the project folder. Checking chromatogram quality prior to assembly is unnecessary except in case of bad electrophoresis. Bad sequence files are just omitted from assemblies. Neither, pruning bad sequence areas with EditView etc. prior to assembly is necessary. Sequence connection is primarily made with regions of high reliabilities of sequence files (phred scores) by phrap, and sequence regions with low scores are ignored. This is especially convenient for assembly with a large number of sequence files. It is superior to assemble function of other softwares such as Auto Assembler (ABI) which requires editing sequences prior to assembly.

To assemble sequences, open terminal and move to "edit_dir" of the project.
$ cd /home/username/project/edit_dir
Run phredPhrap script.
$ phredPhrap
A number of lines of messages appear in the terminal window, and it finishes within a few seconds.
If you wish to assemble and edit many projects one after another, a shell script is available here. Copy it to the parent folder of these projects for assembling.


6. A SIMPLEST TUTORIAL OF CONSED

     The consed is a multi-functional software. Read the "QUICK TOUR OF CONSED" section of "README.txt" provided with consed for details. I present here a simplest tutorial for assembling a fish mitochondrial genome. The method presented here may be different from the standard usage of the software.

     After phredPhrap finishes, start consed. Make sure your current directory is "edit_dir", and you have added PATHs and a consed-starting script mentioned above.
$ consed

     If you assemble and edit contigs in many projects one after another for such as population studies, a shell script is available here. Copy it to the parent folder of these projects.
opening dialog

The start-up window appears, double click on an ace file.

consed main window

The consed main window shows a list of contigs. Select one of them and double click on it.

aligned reads window

     The Aligned Reads window appears. It shows connection of sequences in text (reads). We usually represent sequences of mtDNA as L-strand. However, phrap connects sequences regardless of their directions, and the consensus in the window frequently appears as H-strand sequence. In this case, click on Compl Cont button to reverse complement the contig. We cannot edit sequences on the Aligned Reads window. Letters for nucleotides are inversely colored in regions with higher phred scores. Higher the phred score, lighter the background of the letters, so we can see quality of sequences approximately. To edit sequences, show chromatograms of the read (trace) and edit sequences on the trace window. To close the Aligned Reads window, click on Dismiss button, not X button. To customize the initial window size, click here.

Display traces for all reads

     To show traces, click center wheal on each read in the Aligned Reads window. Otherwise, all reads at position clicked with right mouse button can be shown choosing "Display traces for all reads" in a dialog appeard upon the mouse click. Click on a sequence text in the trace window makes the base blinking red and enables sequence editing. Sequence cursor can be moved with cursor keys over bases and reads. Strike out a base with "*" to delete the base. Push space bar at a base to insert "*" just before the base. This character can be overwritten by any bases, so that we can insert bases by pushing space bar. It is unnecessary to edit sequences for regions with low scores, even if they do not agree with the consensus. You may undo only once at the Trace window. Click undo button to do this. For multiple undo, close the Aligned Reads window and use Undo Edit ... button.

Trace Window

     To add sequence files for gap closing etc., once quit consed, copy sequence files to "chromat_dir" folder, and run phredPhrap. Add New Reads and Miniassembly functions are implemented in consed and work as well.

remove primer sequence

     Walking a fish mitochondrial genome with primers, sequence reads sometimes reach at the opposite end. Using primers with some mismatch bases, phred may take the mismatch base for consensus, if phred scores at regions of opposite primer sequences are high. Otherwise, even if phred does not take the mismatched primer sequences as the consensus, the mismatched base positions with high scores are listed as problems. It is inconvenient to finish the project.

primer sequence removed

     To tell phrap not to use primer sequences for assembly, fill x's to left (or right) end of the read to mask the primer sequence clicking center wheel at the beginning of the primer sequence and choose "Change to x's to left (right)" of a menu appeared. Regions filled with x's are ignored by phrap. While x is originally for masking vector sequences, primer sequences are here compared to vector sequence.

Navigate menu

     When it seems that sequences have assembled to a single contig of full mitochondrial genome, do a finishing process. Click on Navigate button of Aligned Reads window and choose the top item of the menu. Then a list of unclarity or problems found on the contig appears. Select an item in the list and click on Go button to jump to that position. Check contig quality around that position showing traces etc. According to the contig quality or presence of small gaps add some reads and repeat phredPhrap.

problems

     The Navigate function can be used at any time of assembly process, but I suggest to check all the traces by eyes at initial stages of the process rather than checking only problematic points jumping with the Navigate function.
     After checking quality of the contig and if it is good, make sure the contig is circular checking both ends of the contig in case of mitochondrial genome. For circular DNAs, phrap seems to open the circle where overlap between reads is short and makes a linear contig. Then the same sequences should be present at both ends of the contig, though they may be short.

Search for String

     To check sequence identity at both ends, use Search for String function and Compare Cont function. Go to one end of the contig. Find a stretch of bases (about 10 bases) with high scores close to the end. Click on Search for String button. Input the sequence to the query field and search. Feel so lucky if the query string were found at two positions and were close to both ends.

found

     Choose one of these positions and jump to it, clicking on Go button. Click on Compare Cont button, and then Compare Contigs window opens. Move to Searching Contigs window again. Choose the other position and jump. Click on Compare Cont button also at this position. The Compare Contigs window shows two sequences side by side. Click on Align button and then two sequences are aligned in the lower side of the window.

Compare Contigs

     If sequences from both ends are identical, the contig should be circular. Some disagreements may exist near the ends of the aligned region. If mismatches are made by one low quality read and the other has high scores, it may be OK to ignore the sequence of low quality.

aligned

     Alignment in the Compare Contigs window indicates how reads overlap with each other on a circular DNA. It points positions for exporting consensus sequence without redundancy.

file menu for export consensus

     Finally export the consensus sequence to file. Click on File button of the Aligned Reads window. Choose Export Consensus (with Options) in a menu popped down. Input the beginning and end positions and export. Or choose Export Consensus, save the entire consensus sequence data, and delete redundancy afterward with a sequence editor.

export consensus

     The consensus sequence data file is saved in "edit_dir" folder.


7. ACKNOWLEDGMENTS

     Ken-ichi Hayashizaki (Kitasato Univ) kindly set up our systems including the Linux, LAN and installation of packages. Contents of this page is written by KS based only on what he has learned since the system was set up and worked.


RETURN
HOME