The CitationRank algorithm
- I. The Core Gene
- II. The Extended Gene
- III. The algorithm
- IV. Application of the algorithm
- V. References
The Core Gene
For a gene i, the ai, bi, ci, di values, representing the number of co-citations (ai or bi) and non co-citations (ci or di) with SADR X or other than X respectively, were counted and the citation rate ratio (CRR) was calculated as:
.A gene was assigned as a core gene if its CRR exceeded zero with a and c exceeding three. A core gene is also called the parent gene of his extended genes (see definition below).
The Extended Gene
Genes whose CRR was equal to zero but were connected to the core gene in the gene-gene knowledge chain network (GKCN) were assigned as extended genes. An extended gene is also called the child gene of the core genes.
The algorithm
Sorting SADR related genes by their CRR value potentially creates false negatives. For example, it would be wrong to assign a low importance to a gene because currently its citation rate is low, since our knowledge of the molecular mechanism of SADRs is limited, especially when the gene is biologically linked to other genes with high CRRs, it remains unclear, therefore, whether this gene should be omitted. This problem is particularly acute in SADR research, where knowledge of SADR related genes is scarce anyway. Following the logic of the Google PageRank algorithm (1) (see Fig. 1 below), we put forward an algorithm named CitationRank. PageRank is based on the premise that the original rank of pagei can be measured by the probability 1-d, which is the probability of an internet surfer randomly choosing it over all web pages. The surfer can also arrive at pagei with probability d from other web pages holding hyperlinks to it.

[+] Enlarge the image
Fig. 1 The concept of Google PageRank algorithm: A web page should be highly ranked if linked by other highly ranked pages. This picture is from the reference (2).
In CitationRank, the original rank of genei is defined as the likelihood that a researcher would access it in the course of looking at papers of a specific SADR topic whilst browsing the literature. We use (1-d)CRRi as a measure of this possibility. The researcher can also "think of" genei when looking at other genes which are co-cited with genei in the literature. The links in the algorithm can be defined as edges in GKCN. Thus in the following iteration equation, the citation rank of genei (cri) consists of two terms,
, 1≤i≤N, j≠i,where cri[n] denotes the citation rank (CR) of genei in the nth iteration. The initial rank vector is taken as R[0]=CRR/‖CRR‖1. W∈
N×N is the connectivity matrix of GKCN, with wij representing the number of Pubmed entries where genei and genej are co-cited. The linesj equals
. The CitaionRank uses the parameter d in the range [0..1] to control the weighting of CRR and the network connectivity in the rank calculations. It has been proved that convergence of the iteration is guaranteed for all 0 < d < 1 (3). Hence the Jacobi iteration was performed to solve the vector R.
The application of the CitationRank algorithm in prioritizing genes in SADR Gengle
The CitationRank algorithm is extended to the sorting of the core and extended genes in this database. Genes are sorted and presented to the client by their decreasing relevance to topic X, which was measured by the CR values. In Google, web pages relevant to the user’s key word are sorted mainly based on their PageRank (1). In our system one can browse the SADR-related genes as if browsing the search results from Google. Taking SJS for example (Fig. 2), core genes are sorted by their descending CR and are presented as gene cards with general information presented to help the user to pursue their interests.
Within each card, the relevant extended genes, if any, in GKCN are displayed, and the SJS-related literature containing this gene is displayed. Here the impact factor of the gene denotes the mean reference impact factor of the SJS-related Pubmed entries carrying this gene. One can go to the detailed information page by clicking on the gene name.
References:
1. Brin, S. and Page, L. (1998) The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30, 107-117.
2. http://en.wikipedia.org/wiki/PageRank.
3. Morrison, J.L., Breitling, R., Higham, D.J. and Gilbert, D.R. (2005) GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics, 6, 233.


