Category Archives: Publications

Ho Jang, Youngmi Hur and Hyunju Lee. Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data. Scientific Reports, 2016 May 09; 6:25582 (IF: 5.578) (JCR: 5/57, 8.8%, MULTIDISCIPLINARY SCIENCES).

Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data

  • Author : Jang Ho, Youngmi Hur,and Hyunju Lee
  • Published Date : 2016
  • Category : Bioinformatics and Text Mining 
  • Place of publication : Scientific Reports

 

Abstract

DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes.

 

Wonjun Choi, Chan-Hun Choi, Young Ran Kim, Seon-Jong Kim, Chang-Su Na and Hyunju Lee. HerDing: herb recommendation system to treat diseases using genes and chemicals. Database (Oxford), 2016 March 15; 2016:baw011 (IF: 3.372) (JCR: 7/57, 12.3%, MATHEMATICAL & COMPUTATIONAL BIOLOGY).

HerDing: herb recommendation system to treat diseases using genes and chemicals.

  • Author :Wonjun Choi, Chan-Hun Choi, Young Ran Kim, Seon-Jong Kim, Chang-Su Na and Hyunju Lee
  • Published Date : 2016
  • Category : Bioinformatics and Text Mining 
  • Place of publication : Database-Oxford

 

Abstract

In recent years, herbs have been researched for new drug candidates because they have a long empirical history of treating diseases and are relatively free from side effects. Studies to scientifically prove the medical efficacy of herbs for target diseases often spend a considerable amount of time and effort in choosing candidate herbs and in performing experiments to measure changes of marker genes when treating herbs. A computational approach to recommend herbs for treating diseases might be helpful to promote efficiency in the early stage of such studies. Although several databases related to traditional Chinese medicine have been already developed, there is no specialized Web tool yet recommending herbs to treat diseases based on disease-related genes. Therefore, we developed a novel search engine, HerDing, focused on retrieving candidate herb-related information with user search terms (a list of genes, a disease name, a chemical name or an herb name). HerDing was built by integrating public databases and by applying a text-mining method. The HerDing website is free and open to all users, and there is no login requirement.

Database URL: http://combio.gist.ac.kr/herding

Daeyong Jin and Hyunju Lee (2015) A Computational Approach to Identifying Gene-microRNA Modules in Cancer. PLoS Computational Biology, 2015 Jan 22; 11(1):e1004042. (IF: 4.829) (JCR: 3/52, 5.8%, MATHEMATICAL & COMPUTATIONAL BIOLOGY).

A Computational Approach to Identifying Gene-microRNA Modules in Cancer.

  • Author : Daeyong Jin and Hyunju Lee
  • Published Date : 2015
  • Category : Bioinformatics and Text Mining 
  • Place of publication : PLoS Computational Biology

 

Abstract

MicroRNAs (miRNAs) play key roles in the initiation and progression of various cancers by regulating genes. Regulatory interactions between genes and miRNAs are complex, as multiple miRNAs can regulate multiple genes. In addtion, these interactions vary from patient to patient and even among patients with the same cancer type, as cancer development is a heterogeneous process. These relationships are more complicated because transcription factors and other regulatory molecules can also regulate miRNAs and genes. Hence, it is important to identify the complex relationships between genes and miRNAs in cancer. In this study, we propose a computational approach to constructing modules that represent these relationships by integrating the expression data of genes and miRNAs with gene-gene interaction data. First, we used a biclustering algorithm to construct modules consisting of a subset of genes and a subset of samples to incorporate the heterogeneity of cancer cells. Second, we combined gene-gene interactions to include genes that play important roles in cancer-related pathways. Then, we selected miRNAs that are closely associated with genes in the modules based on a Gaussian Bayesian network and Bayesian Information Criteria. When we applied our approach to ovarian cancer and glioblastoma (GBM) data sets, 33 and 54 modules were constructed, respectively. In these modules, 91% and 94% of ovarian cancer and GBM modules, respectively, were explained either by direct regulation between genes and miRNAs or by indirect relationships via transcription factors. In addition, 48.4% and 74.0% of modules from ovarian cancer and GBM, respectively, were enriched with cancer-related pathways, and 51.7% and 71.7% of miRNAs in modules were ovarian cancer-related miRNAs and GBM-related miRNAs, respectively. Finally, we extensively analyzed significant modules and showed that most genes in these modules were related to ovarian cancer and GBM.

 

Jonghyun Han and Hyunju Lee (2015) Adaptive Landmark Recommendations for Travel Planning: Personalizing and Clustering Landmarks using Geo-Tagged Social Media. Pervasive and Mobile Computing. 18:4-17 (IF: 2.079) (COMPUTER SCIENCE, NFORMATION SYSTEMS: 20/139)

Adaptive Landmark Recommendations for Travel Planning: Personalizing and Clustering Landmarks using Geo-Tagged Social Media. Pervasive and Mobile Computing.

  • Author : Jonghyun Han and Hyunju Lee
  • Published Date : 2014
  • Category : Mining in Social Network
  • Place of publication : Pervasive and Mobile Computing

 

Abstract

When travelers plan their trips, landmark recommendation systems considering the properties of their trips will be convenient to help travelers determine locations they will visit. Because interesting content may vary according to travelers and their situations, it is important to recommend personalized landmarks by considering them and their trips. In this paper, we propose an approach that adaptively recommends clusters of landmarks using geo-tagged social media. We first examine the impact of spatial and temporal properties of a trip on the distribution of popular places through large-scale data analysis. Our approach is to compute the significance of landmarks for travelers according to the spatial and temporal properties of their trips. Then, we generate clusters of recommended landmarks, which have similar theme or are contiguous, by utilizing histories of travels’ trajectories. Performances of recommended landmarks by our approach are evaluated against several baseline approaches, showing increased accuracy and satisfaction, compared to the baselines. Through a user study, we also verify that it is applicable to lesser-known places and reflective of local events and seasonal changes. Thus, we expect that the approach is helpful in developing personalized recommendations.

Bayabaatar Amgalan and Hyunju Lee (2014) WMAXC: a weighted maximum clique method for identifying condition-specific sub-network. PLoS One, 2014 Aug 22; 9(8): e104993 (IF: 3.534)

WMAXC: a weighted maximum clique method for identifying condition-specific sub-network.

 

Abstract

Sub-networks can expose complex patterns in an entire bio-molecular network by extracting interactions that depend on temporal or condition-specific contexts. When genes interact with each other during cellular processes, they may form differential co-expression patterns with other genes across different cell states. The identification of condition-specific sub-networks is of great importance in investigating how a living cell adapts to environmental changes. In this work, we propose the weighted MAXimum clique (WMAXC) method to identify a condition-specific sub-network. WMAXC first proposes scoring functions that jointly measure condition-specific changes to both individual genes and gene-gene co-expressions. It then employs a weaker formula of a general maximum clique problem and relates the maximum scored clique of a weighted graph to the optimization of a quadratic objective function under sparsity constraints. We combine a continuous genetic algorithm and a projection procedure to obtain a single optimal sub-network that maximizes the objective function (scoring function) over the standard simplex (sparsity constraints). We applied the WMAXC method to both simulated data and real data sets of ovarian and prostate cancer. Compared with previous methods, WMAXC selected a large fraction of cancer-related genes, which were enriched in cancer-related pathways. The results demonstrated that our method efficiently captured a subset of genes relevant under the investigated condition.

 

Hee-Jin Lee, Tien Cuong Dang, Hyunju Lee, and Jong C. Park (2014) OncoSearch: Cancer Gene Search Engine with Literature Evidence Nucleic Acids Research (9 May 2014) (IF: 8.278).

OncoSearch: Cancer Gene Search Engine with Literature Evidence  Nucleic Acids Research.

  • Author : Heejin Lee, Tien Cuong Dang, Hyunju Lee, and Jong C. Park
  • Published Date : 2014
  • Category : Bioinformatics and Text Mining 
  • Place of publication : Nucleic acids research

 

Abstract

In order to identify genes that are involved in oncogenesis and to understand how such genes affect cancers, abnormal gene expressions in cancers are actively studied. For an efficient access to the results of such studies that are reported in biomedical literature, the relevant information is accumulated via text-mining tools and made available through the Web. However, current Web tools are not yet tailored enough to allow queries that specify how a cancer changes along with the change in gene expression level, which is an important piece of information to understand an involved gene’s role in cancer progression or regression. OncoSearch is a Web-based engine that searches Medline abstracts for sentences that mention gene expression changes in cancers, with queries that specify (i) whether a gene expression level is up-regulated or down-regulated, (ii) whether a certain type of cancer progresses or regresses along with such gene expression change and (iii) the expected role of the gene in the cancer. OncoSearch is available through http://oncosearch.biopathway.org

Paper :   link 

Website :   link

Media covered :  전자신문 (Electronic Times) (2014. 05. 22) 국제신문 (2014.05. 22)뉴스1 (News1) (2014.05.22)

Song, B. and Lee, H. (2012) Prioritizing Disease Genes by Integrating Domain Interactions and Disease Mutations in a Protein-Protein Interaction Network,IJICIC, 8(2), 1327-1338

Prioritizing Disease Genes by Integrating Domain Interactions and Disease Mutations in a Protein-Protein Interaction Network

Abstract

Complex diseases such as cancer are involved in inter-relationship amongseveral genes, with protein-protein interaction networks being extensively studied in at-tempts to reveal the relationship between genes and diseases. Although these studies haveshown promising results for identifying disease genes, it is not systemically studied that aprotein functions differently depending on its interaction partners in the network since aprotein can have multiple functions. In this study, domains are considered as functionalunits of proteins and we investigate how disease-related mutations in domains can be usedto identify other disease genes in a domain-domain interaction network. We subsequentlypropose a computational method to predict disease genes based on the following two as-sumptions. The first assumption is that proteins closely interacting with known diseaseproteins in a protein interaction network are likely to be involved in the same disease.Second, although two proteins are in the same distance from known disease genes in aprotein interaction network, the protein interacting with known disease genes through adomain with mutation is more likely to be related to the disease than other proteins thatinteract through domains with no mutation. As a result, when the proposed approach isapplied to five diseases, it highly ranks disease-related genes compared to a model usingonly a protein interaction data set.

Azad, A., Shahid, S., Noman N., and Lee, H. (2011) Prediction of Plant Promoters Based on hexamers and Random Triplet Pair Analysis. Algorithms for Molecular Biology, 6:19 (IF: 2.80)

Prediction of Plant Promoters Based on hexamers and Random Triplet Pair Analysis

  • Author : A K M Azad, Saima Shahid, Nasimul Noman, Hyunju Lee
  • Published Date : 2011
  • Category : 
  • Place of publication : Algorithms for Molecular Biology

Abstract

BACKGROUND:

With an increasing number of plant genome sequences, it has become important to develop a robust computational method for detecting plant promoters. Although a wide variety of programs are currently available, prediction accuracy of these still requires further improvement. The limitations of these methods can be addressed by selecting appropriate features for distinguishing promoters and non-promoters.

METHODS:

In this study, we proposed two feature selection approaches based on hexamer sequences: the Frequency Distribution Analyzed Feature Selection Algorithm (FDAFSA) and the Random Triplet Pair Feature Selecting Genetic Algorithm (RTPFSGA). In FDAFSA, adjacent triplet-pairs (hexamer sequences) were selected based on the difference in the frequency of hexamers between promoters and non-promoters. In RTPFSGA, random triplet-pairs (RTPs) were selected by exploiting a genetic algorithm that distinguishes frequencies of non-adjacent triplet pairs between promoters and non-promoters. Then, a support vector machine (SVM), a nonlinear machine-learning algorithm, was used to classify promoters and non-promoters by combining these two feature selection approaches. We referred to this novel algorithm as PromoBot.

RESULTS:

Promoter sequences were collected from the PlantProm database. Non-promoter sequences were collected from plant mRNA, rRNA, and tRNA of PlantGDB and plant miRNA of miRBase. Then, in order to validate the proposed algorithm, we applied a 5-fold cross validation test. Training data sets were used to select features based on FDAFSA and RTPFSGA, and these features were used to train the SVM. We achieved 89% sensitivity and 86% specificity.

CONCLUSIONS:

We compared our PromoBot algorithm to five other algorithms. It was found that the sensitivity and specificity of PromoBot performed well (or even better) with the algorithms tested. These results show that the two proposed feature selection methods based on hexamer frequencies and random triplet-pair could be successfully incorporated into a supervised machine learning method in promoter classification problem. As such, we expect that PromoBot can be used to help identify new plant promoters. Source codes and analysis results of this work could be provided upon request.

Hur, Y. and Lee, H. (2011) Wavelet-based identification of DNA focal genomic aberrations from single nucleotide polymorphism arrays. BMC Bioinformatics, 12:146 (IF: 3.43)

Wavelet-based identification of DNA focal genomic aberrations from single nucleotide polymorphism arrays

  • Author : Youngmi Hur and Hyunju Lee
  • Published Date : 2011
  • Category : 
  • Place of publication : BMC Bioinformatics

Abstract

Background

Copy number aberrations (CNAs) are an important molecular signature in cancer initiation, development, and progression. However, these aberrations span a wide range of chromosomes, making it hard to distinguish cancer related genes from other genes that are not closely related to cancer but are located in broadly aberrant regions. With the current availability of high-resolution data sets such as single nucleotide polymorphism (SNP) microarrays, it has become an important issue to develop a computational method to detect driving genes related to cancer development located in the focal regions of CNAs.

Results

In this study, we introduce a novel method referred to as the wavelet-based identification of focal genomic aberrations (WIFA). The use of the wavelet analysis, because it is a multi-resolution approach, makes it possible to effectively identify focal genomic aberrations in broadly aberrant regions. The proposed method integrates multiple cancer samples so that it enables the detection of the consistent aberrations across multiple samples. We then apply this method to glioblastoma multiforme and lung cancer data sets from the SNP microarray platform. Through this process, we confirm the ability to detect previously known cancer related genes from both cancer types with high accuracy. Also, the application of this approach to a lung cancer data set identifies focal amplification regions that contain known oncogenes, though these regions are not reported using a recent CNAs detecting algorithm GISTIC: SMAD7 (chr18q21.1) and FGF10 (chr5p12).

Conclusions

Our results suggest that WIFA can be used to reveal cancer related genes in various cancer data sets.

Oh, M., and Song, B., and Lee, H. (2010) CAM:web tool for combining arrayCGH and microarray gene expression data from multiple samples. Computers in Biology and Medicine, 40(9):781-785. (IF: 1.269)

CAM:web tool for combining arrayCGH and microarray gene expression data from multiple samples

  • Author : Mira Oh, Bongjun SongHyunju Lee
  • Published Date : 2010
  • Category : 
  • Place of publication : Computers in Biology and Medicine

Abstract

We develop a web-based tool for Combining Array CGH copy number aberration data and Microarray gene expression data (CAM). This tool analyzes these two data sets from multiple samples to detect genes having both DNA copy number aberrations (CNAs) and gene expression changes. CAM provides several statistical methods for identifying CNAs, which are consistent across multiple samples. Identified CNAs and their correlated gene expression changes are then visualized along the chromosomes. As a result, CAM is a useful tool for identifying disease related genes when these two types of data sets are available. To illustrate the various analysis outputs of CAM, we subsequently provide ten sets of example data from seven cancer types.