Author Archives: Combio

Ho Jang, Jeongkyun Kim, and Hyunju Lee (2014) Data and Text Mining for Cancer Research. KIISE, 32 (3):61-70 (March 2014).

Data and Text Mining for Cancer Research.

Paper :   link 

Website :   link

Hee-Jin Lee, Tien Cuong Dang, Hyunju Lee, and Jong C. Park (2014) OncoSearch: Cancer Gene Search Engine with Literature Evidence Nucleic Acids Research (9 May 2014) (IF: 8.278).

OncoSearch: Cancer Gene Search Engine with Literature Evidence  Nucleic Acids Research.

  • Author : Heejin Lee, Tien Cuong Dang, Hyunju Lee, and Jong C. Park
  • Published Date : 2014
  • Category : Bioinformatics and Text Mining 
  • Place of publication : Nucleic acids research



In order to identify genes that are involved in oncogenesis and to understand how such genes affect cancers, abnormal gene expressions in cancers are actively studied. For an efficient access to the results of such studies that are reported in biomedical literature, the relevant information is accumulated via text-mining tools and made available through the Web. However, current Web tools are not yet tailored enough to allow queries that specify how a cancer changes along with the change in gene expression level, which is an important piece of information to understand an involved gene’s role in cancer progression or regression. OncoSearch is a Web-based engine that searches Medline abstracts for sentences that mention gene expression changes in cancers, with queries that specify (i) whether a gene expression level is up-regulated or down-regulated, (ii) whether a certain type of cancer progresses or regresses along with such gene expression change and (iii) the expected role of the gene in the cancer. OncoSearch is available through

Paper :   link 

Website :   link

Media covered :  전자신문 (Electronic Times) (2014. 05. 22) 국제신문 (2014.05. 22)뉴스1 (News1) (2014.05.22)

Social Event on December 30th, 2013

12월 소셜 이벤트를 위해 아침 일찍 메가박스에 방문하여 영화예매를 하였습니다. (다음부턴 전화예약을 이용하는 것이 좋을 것 같습니다.)


영화 보기 전에 저녁식사로 콩나물해장국을 먹고 난 뒤, 사장님께 단체사진 한 장을 부탁드렸습니다.


메가박스 내에는 관람객이 너무 많아서 부끄러움 많은 우리 연구실 학생들에게는 이 사진이 최선이었던 것 같습니다^0^


영화시작시간이 다 달라서 몇몇 학생들은 오락을 하면서 시간을 보내기도 했습니다.


영화가 끝난 후 돌아오는 길에 각자 본 영화에 대한 감상과 후기를 공유하며 이번 12월의 소셜이벤트를 마무리 하였습니다.  2014년에는 이번 소셜이벤트에 부득이하게 참여하지 못한 Bayar와 서지윤 학생을 비롯하여 새로운 신입생들과도 함께하는 더 즐거운 소셜이벤트가 되기를 기대합니다.

Hee-Jin Lee, Sang-Hyung Shim, Mi-Ryoung Song, Hyunju Lee, and Jong C. Park (2013) CoMAGC: a Corpus with Multi-faceted Annotations of Gene-Cancer Relations. BMC Bioinformatics, 14:323 (14 November 2013) (IF: 3.02)

CoMAGC: a Corpus with Multi-faceted Annotations of Gene-Cancer Relations.

  • Author : Hee-Jin Lee,  Sang-Hyung Shim, Mi-Ryoung Song, Hyunju Lee, and Jong C. Park
  • Published Date : 2013
  • Category : Bioinformatics and Text Mining
  • Place of publication : BMC Bioinformatics


Background: In order to access the large amount of information in biomedical literature about genes implicated in various cancers both efficiently and accurately, the aid of text mining (TM) systems is invaluable. Current TM systems do target either gene-cancer relations or biological processes involving genes and cancers, but the former type produces information not comprehensive enough to explain how a gene affects a cancer, and the latter does not provide a concise summary of gene-cancer relations.

Result: In this paper, we present a corpus for the development of TM systems that are specifically targeting gene-cancer relations but are still able to capture complex information in biomedical sentences. We describe CoMAGC, a corpus with multi-faceted annotations of gene-cancer relations. In CoMAGC, a piece of annotation is composed of four semantically orthogonal concepts that together express 1) how a gene changes, 2) how a cancer changes and 3) the causality between the gene and the cancer. The multi-faceted annotations are shown to have high inter-annotator agreement. In addition, we show that the annotations in CoMAGC allow us to infer the prospective roles of genes in cancers and to classify the genes into three classes according to the inferred roles. We encode the mapping between multi-faceted annotations and gene classes into 10 inference rules. The inference rules produce results with high accuracy as measured against human annotations. CoMAGC consists of 821 sentences on prostate, breast and ovarian cancers. Currently, we deal with changes in gene expression levels among other types of gene changes.

Availability: The corpus is available at under the terms of the Creative Commons Attribution License (
Conclusions: The corpus will be an important resource for the development of advanced TM systems on gene-cancer relations.

Shinhyuk Kim, Daeyong Jin and Hyunju Lee (2013) Predicting drug-target interactions using drug-drug interactions. PLoS One. 8(11): e80129. (IF: 3.730).

Predicting drug-target interactions using drug-drug interactions.



Computational methods for predicting drug-target interactions have become important in drug research because they can help to reduce the time, cost, and failure rates for developing new drugs. Recently, with the accumulation of drug-related data sets related to drug side e ffects and pharmacological data, it has became possible to predict potential drug-target interactions. In this study, we focus on drug-drug interactions (DDI), their adverse eff ects (DDIAE) and pharmacological information (DDIPharm), and investigate the relationship among chemical structures, side eff ects, and DDIs from several data sources. In this study, DDIPharm data from the STITCH database, DDIAE from, and drug-target pairs from ChEMBL and SIDER were first collected. Then, by applying two machine learning approaches, a support vector machine (SVM) and a kernel-based L1-norm regularized logistic regression (KL1LR), we showed that DDI is a promising feature in predicting drug-target interactions. Next, the accuracies of predicting drug-target interactions using DDI were compared to those obtained using the chemical structure and side e ffects based on the SVM and KL1LR approaches, showing that DDI was the data source contributing the most for predicting drug-target interactions.


5th International Symposium on Languages in Biology and Medicine (LBM 2013)

The Fifth International Symposium on Languages in Biology and Medicine (LBM 2013) will be held at the University of Tokyo, Japan on December 12th and 13th. LBM is a biennial interdisciplinary forum that brings together researchers in biology, chemistry, medicine, public health and informatics to discuss and exploit cutting edge language technologies.

Social Event on July 1st, 2013

6시에 화석시대에서 저녁식사를 함께하였습니다. 연구실의 모든 학생들이 참석을 하였고 G-SURF활동을 위해 학부에서 온 서지윤, 우승훈 학생이 같이 하였습니다. 식사 후 원래 예정되어 있던 볼링이 인원수용문제로 취소 되어 농구 게임으로 변경되었습니다. 어두워 질 때까지 농구를 즐긴 후 남은 인원들과 가벼운 뒤풀이를 가진 후 7월의 소셜이벤트는 끝을 맺었습니다.

Azad, A.K.M and Lee, H. (2013) Voting-based cancer module identification by combining topological and data-driven properties. PLoS One, 2013 Aug 5;8(8):e70498 (IF: 3.730)

Voting-based cancer module identification by combining topological and data-driven properties.

  • Author : A.K. M Azad and Hyunju Lee
  • Published Date : 2013
  • Category : Bioinformatics and Text Mining 
  • Place of publication : PLoS One



Recently, computational approaches integrating copy number aberrations (CNAs) and gene expression (GE) have been extensively studied to identify cancer-related genes and pathways. In this work, we integrate these two data sets with protein-protein interaction (PPI) information to find cancer-related functional modules. To integrate CNA and GE data, we first built a gene-gene relationship network from a set of seed genes by enumerating all types of pairwise correlations e.g. GE-GE, CNA-GE, and CNA-CNA over multiple patients. Next, we propose a voting-based cancer module identification algorithm by combining topological and data-driven properties (VToD algorithm) by using the gene-gene relationship network as a source of data-driven information, and the PPI data as topological information. We applied the VToD algorithm to 266 glioblastoma multiforme (GBM) and 96 ovarian carcinoma (OVC) samples that have both expression and copy number measurements, and identified 22 GBM modules and 23 OVC modules. Among 22 GBM modules, 15, 12, and 20 modules were significantly enriched with cancer-related KEGG, BioCarta pathways, and GO terms, respectively. Among 23 OVC modules, 19, 18, and 23 modules were significantly enriched with cancer-related KEGG, BioCarta pathways, and GO terms, respectively. Similarly, we also observed that 9 and 2 GBM modules and 15 and 18 OVC modules were enriched with cancer gene census (CGC) and specific cancer driver genes, respectively. Our proposed module-detection algorithm significantly outperformed other existing methods in terms of both functional and cancer gene set enrichments. Most of the cancer-related pathways from both cancer data sets found in our algorithm contained more than two types of gene-gene relationships, showing strong positive correlations between the number of different types of relationship and CGC enrichment q-values (0.64 for GBM and 0.49 for OVC). This study suggests that identified modules containing both expression changes and CNAs can explain cancer-related activities with greater insights.


  • Software website :   link
  • Paper: link

Kim, J., So, S., Lee, H., Park, J. C., Kim, J.J., and Lee, H. (2013) DigSee: Disease Gene Search Engine with Evidence sentences (version cancer). Nucleic acids research, 41(Web server issue) (IF: 8.026)

DigSee: Disease Gene Search Engine with Evidence sentences (version cancer)..

  • Author : Jeongkyun KimSeongeun So, Heejin Lee, Jong C. Park, Jung-Jae Kim, and Hyunju Lee
  • Published Date : 2013
  • Category : Bioinformatics and Text Mining 
  • Place of publication : Nucleic acids research



Biological events such as gene expression, regulation, phosphorylation, localization, and
protein catabolism play important roles in the development of diseases. Understanding the
association between diseases and genes can be enhanced with the identification of involved
biological events in this association. Although biological knowledge has been accumulated in
several databases and can be accessed through the Web, there is no specialized Web tool yet allowing for a query into the relationship among diseases, genes, and biological events. For this task, we developed DigSee to search Medline abstracts for evidence sentences describing that “genes” are involved in the development of “cancer” through “biological events”. DigSee is available through


Paper :   link 

Website :   link

Media covered :   전자신문 (Electronic Times) (2013. 07. 05), 디지털타임스 (Digital Times) (2013.07. 04), 경제투데이 (2013. 07. 05), 뉴스1 (News1) (2013.07.05)

Jang, H. and Lee, H. (2012) Meta-Analysis of Pain Relief Effects by Laser Irradiation on Joint Areas, Photomedicine and Laser Surgery, 30(8) (IF: 1.255)

Meta-Analysis of Pain Relief Effects by Laser Irradiation on Joint Areas

  • Author : Ho JangHyunju Lee
  • Published Date : 2012
  • Category : Laser therapy
  • Place of publication : Photomedicine and Laser Surgery



Background: Laser therapy has been proposed as a physical therapy for musculoskeletal disorders and has attained popularity because no side effects have been reported after treatment. However, its true effectiveness is still controversial because several clinical trials have reported the ineffectiveness of lasers in treating pain. Methods: In this systematic review, we investigate the clinical effectiveness of low-level laser therapy (LLLT) on joint pain. Clinical trials on joint pain satisfying the following conditions are included: the laser is irradiated on the joint area, the PEDro scale score is at least 5, and the effectiveness of the trial is measured using a visual analogue scale (VAS). To estimate the overall effectiveness of all included clinical trials, a mean weighted difference in change of pain on VAS was used. Results: MEDLINE is the main source of the literature search. After the literature search, 22 trials related to joint pain were selected. The average methodological quality score of the 22 trials consisting of 1014 patients was 7.96 on the PEDro scale; 11 trials reported positive effects and 11 trials reported negative effects. The mean weighted difference in change of pain on VAS was 13.96mm (95% CI, 7.24–20.69) in favor of the active LLLT groups. When we only considered the clinical trials in which the energy dose was within the dose range suggested in the review by Bjordal et al. in 2003 and in World Association for Laser Therapy (WALT) dose recommendation, the mean effect sizes were 19.88 and 21.05mm in favor of the true LLLT groups, respectively. Conclusions: The review shows that laser therapy on the joint reduces pain in patients. Moreover, when we restrict the energy doses of the laser therapy into the dose window suggested in the previous study, we can expect more reliable pain relief treatments.


Paper : link