Author Archives: Admin

Sehwan Moon

Sehwan Moon



  • 2018.03 – Current : Gwangju Institute of Science and Technology, Gwangju, Korea
  • 2014.03 – 2018.02 Catholic University (B.S.-Biotechnology / B.S.-Computer Engineering)

Research Topic

  • Bioinformatics




Phone: +82-62-715-3160
E-mail :

Hajeong Son

Hajeong Son



  • 2017.09 – Current : Gwangju Institute of Science and Technology, Gwangju, Korea

Research Topic

  • Text Mining, Deep Learning




Phone: +82-62-715-3160
E-mail :

우수 논문상

장호, 김정균 학생이 우수 논문상을 수상했습니다.



  • 한국생물정보시스템생물학회(Korean Society of Bioinformatics, KSBi ) – 우수 논문상 (2016.08.19)
  • 2016 Qualcomm-GIST Innovation Award – IT Research Paper Award (2016.12.16)


  • 한국생명정보학회(Korean Society of Bioinformatics, KSBi ) – 우수 논문상 (2017.11.02)

Graduation on Feb 2017



2014.10 Social Event

2014 년 10월 소셜이벤트 주제는 풍암지구 채식뷔페에서의 채식음식 맛보기 체험이었습니다.

콩고기, 밀까스 등 기존에는 맛볼수 없었던 다양한 채식음식을 음미하면서 즐거운 소셜이벤트 시간을 보냈습니다.

아래는 단체사진!


Social Event on May 7, 2013

Dear , Lab members,

The social event of April will be held on May 7.
Actually, it should’ve been held oneday of last month, but it has been delayed due to midterm exam schedules and member’s busy working.
Finally, now i am pleased to announce that i will hold the social event on May 7.

The detailed plan is listed below :
2013-5-7 (Tues)

16:00 – 17:50
 The soccer game to enhance member’s health and sociality.
( Location : big football ground next to the main gate of school )
( we will play basketball if the number of participants is not enough. )
18:00 – 20:00  we will eat Sam-gyup-sal (Pork belly) for dinner, which is called ‘삼겹살’ in korean.
( for those who cannot eat pork, you can choose any other food. )

Participating is not your obligation. So, feel free not to join but, i hope all of you will join because it’s going to be fun.
If you want to participate, please prepare your clothes and 10,000 won.
One more thing, If you cannot join, please let me know by e-mailing me in advance.

Thank you very much for reading.
I am looking forward to seeing you on the ground soon.


CAM00078   CAM00077


Song, B. and Lee, H. (2012) Prioritizing Disease Genes by Integrating Domain Interactions and Disease Mutations in a Protein-Protein Interaction Network,IJICIC, 8(2), 1327-1338

Prioritizing Disease Genes by Integrating Domain Interactions and Disease Mutations in a Protein-Protein Interaction Network


Complex diseases such as cancer are involved in inter-relationship amongseveral genes, with protein-protein interaction networks being extensively studied in at-tempts to reveal the relationship between genes and diseases. Although these studies haveshown promising results for identifying disease genes, it is not systemically studied that aprotein functions differently depending on its interaction partners in the network since aprotein can have multiple functions. In this study, domains are considered as functionalunits of proteins and we investigate how disease-related mutations in domains can be usedto identify other disease genes in a domain-domain interaction network. We subsequentlypropose a computational method to predict disease genes based on the following two as-sumptions. The first assumption is that proteins closely interacting with known diseaseproteins in a protein interaction network are likely to be involved in the same disease.Second, although two proteins are in the same distance from known disease genes in aprotein interaction network, the protein interacting with known disease genes through adomain with mutation is more likely to be related to the disease than other proteins thatinteract through domains with no mutation. As a result, when the proposed approach isapplied to five diseases, it highly ranks disease-related genes compared to a model usingonly a protein interaction data set.

Azad, A., Shahid, S., Noman N., and Lee, H. (2011) Prediction of Plant Promoters Based on hexamers and Random Triplet Pair Analysis. Algorithms for Molecular Biology, 6:19 (IF: 2.80)

Prediction of Plant Promoters Based on hexamers and Random Triplet Pair Analysis

  • Author : A K M Azad, Saima Shahid, Nasimul Noman, Hyunju Lee
  • Published Date : 2011
  • Category : 
  • Place of publication : Algorithms for Molecular Biology



With an increasing number of plant genome sequences, it has become important to develop a robust computational method for detecting plant promoters. Although a wide variety of programs are currently available, prediction accuracy of these still requires further improvement. The limitations of these methods can be addressed by selecting appropriate features for distinguishing promoters and non-promoters.


In this study, we proposed two feature selection approaches based on hexamer sequences: the Frequency Distribution Analyzed Feature Selection Algorithm (FDAFSA) and the Random Triplet Pair Feature Selecting Genetic Algorithm (RTPFSGA). In FDAFSA, adjacent triplet-pairs (hexamer sequences) were selected based on the difference in the frequency of hexamers between promoters and non-promoters. In RTPFSGA, random triplet-pairs (RTPs) were selected by exploiting a genetic algorithm that distinguishes frequencies of non-adjacent triplet pairs between promoters and non-promoters. Then, a support vector machine (SVM), a nonlinear machine-learning algorithm, was used to classify promoters and non-promoters by combining these two feature selection approaches. We referred to this novel algorithm as PromoBot.


Promoter sequences were collected from the PlantProm database. Non-promoter sequences were collected from plant mRNA, rRNA, and tRNA of PlantGDB and plant miRNA of miRBase. Then, in order to validate the proposed algorithm, we applied a 5-fold cross validation test. Training data sets were used to select features based on FDAFSA and RTPFSGA, and these features were used to train the SVM. We achieved 89% sensitivity and 86% specificity.


We compared our PromoBot algorithm to five other algorithms. It was found that the sensitivity and specificity of PromoBot performed well (or even better) with the algorithms tested. These results show that the two proposed feature selection methods based on hexamer frequencies and random triplet-pair could be successfully incorporated into a supervised machine learning method in promoter classification problem. As such, we expect that PromoBot can be used to help identify new plant promoters. Source codes and analysis results of this work could be provided upon request.

Hur, Y. and Lee, H. (2011) Wavelet-based identification of DNA focal genomic aberrations from single nucleotide polymorphism arrays. BMC Bioinformatics, 12:146 (IF: 3.43)

Wavelet-based identification of DNA focal genomic aberrations from single nucleotide polymorphism arrays

  • Author : Youngmi Hur and Hyunju Lee
  • Published Date : 2011
  • Category : 
  • Place of publication : BMC Bioinformatics



Copy number aberrations (CNAs) are an important molecular signature in cancer initiation, development, and progression. However, these aberrations span a wide range of chromosomes, making it hard to distinguish cancer related genes from other genes that are not closely related to cancer but are located in broadly aberrant regions. With the current availability of high-resolution data sets such as single nucleotide polymorphism (SNP) microarrays, it has become an important issue to develop a computational method to detect driving genes related to cancer development located in the focal regions of CNAs.


In this study, we introduce a novel method referred to as the wavelet-based identification of focal genomic aberrations (WIFA). The use of the wavelet analysis, because it is a multi-resolution approach, makes it possible to effectively identify focal genomic aberrations in broadly aberrant regions. The proposed method integrates multiple cancer samples so that it enables the detection of the consistent aberrations across multiple samples. We then apply this method to glioblastoma multiforme and lung cancer data sets from the SNP microarray platform. Through this process, we confirm the ability to detect previously known cancer related genes from both cancer types with high accuracy. Also, the application of this approach to a lung cancer data set identifies focal amplification regions that contain known oncogenes, though these regions are not reported using a recent CNAs detecting algorithm GISTIC: SMAD7 (chr18q21.1) and FGF10 (chr5p12).


Our results suggest that WIFA can be used to reveal cancer related genes in various cancer data sets.

Oh, M., and Song, B., and Lee, H. (2010) CAM:web tool for combining arrayCGH and microarray gene expression data from multiple samples. Computers in Biology and Medicine, 40(9):781-785. (IF: 1.269)

CAM:web tool for combining arrayCGH and microarray gene expression data from multiple samples

  • Author : Mira Oh, Bongjun SongHyunju Lee
  • Published Date : 2010
  • Category : 
  • Place of publication : Computers in Biology and Medicine


We develop a web-based tool for Combining Array CGH copy number aberration data and Microarray gene expression data (CAM). This tool analyzes these two data sets from multiple samples to detect genes having both DNA copy number aberrations (CNAs) and gene expression changes. CAM provides several statistical methods for identifying CNAs, which are consistent across multiple samples. Identified CNAs and their correlated gene expression changes are then visualized along the chromosomes. As a result, CAM is a useful tool for identifying disease related genes when these two types of data sets are available. To illustrate the various analysis outputs of CAM, we subsequently provide ten sets of example data from seven cancer types.