Author Archives: Combio

Seungchul Lee#, Jingu Lee#, Sung Hoon Sim#, Yeonghun Lee, Kyung Chul Moon, Cheol Lee, Woong-Yang Park, Nayoung K. D. Kim, Se-Hoon Lee$, and Hyunju Lee$ (2017) Comprehensive somatic genome alterations of urachal carcinoma. Journal of Medical Genetics, Online Published (March 27 2017) (IF: 5.650) (JCR 2015: 19/166, 11.145%, GENETICS & HEREDITY)

Comprehensive somatic genome alterations of urachal carcinoma.

  • Author : Seungchul Lee#, Jingu Lee#, Sung Hoon Sim#, Yeonghun Lee, Kyung Chul Moon, Cheol Lee, Woong-Yang Park, Nayoung K. D. Kim, Se-Hoon Lee$, and Hyunju Lee$
  • Published Date : 2017
  • Category : Bioinformatics and Text Mining 
  • Place of publication : Journal of Medical Genetics

 

Abstract

Background: Urachal cancer is a rare cancer that develops in the urachus. Because of its rarity, standard treatment therapies for urachal cancer are not established, and chemotherapeutic regimens for bladder cancer have been unsuccessful for patients with urachal cancer. Hence, we aim to understand a systematic molecular characterization of urachal cancer.

Methods: We identified somatic single nucleotide variations (SNVs)/indels and somatic copy number aberrations (SCNAs) in the 17 patients by using whole-exome sequencing (WES) and OncoScanTM platform (Affymetrix) as follows: tumour-normal paired sequencing (WES, n = 10), tumour-only sequencing (WES, n = 1; targeted deep sequencing, n = 16), and OncoScanTM (n = 17).

Results: Our analyses identified 27 genes with somatic SNVs and indels, as well as six genes (APC, COL5A1, KIF26B, LRP1B, SMAD4, and TP53) that were recurrent in at least two patients. By analysing the SCNAs, we found that the extent of chromosomal amplifica tion was highly associated with the patient’s cancer stage. Interestingly, 35% (6/17) of the patients had focal DNA amplifications in FGFR family genes. The integration of somatic SNVs, indels, and SCNAs revealed significant alterations in the MAPK signalling pathways.

Conclusions: Our genome wide analysis of urachal cancer suggests that molecular characteristics may be important for the treatment of urachal cancer.

Wonjun Choi, Baeksoo Kim, Hyejin Cho, Doheon Lee and Hyunju Lee* (2016) A corpus for plant-chemical relationships in the biomedical domain. BMC Bioinformatics, 2016 September 20; 17:386 (IF: 2.435) (JCR 2015: 10/56, 17.9%, MATHEMATICAL & COMPUTATIONAL BIOLOGY).

A corpus for plant-chemical relationships in the biomedical domain.

 

Abstract

Background: Plants are natural products that humans consume in various ways including food and medicine. They have a long empirical history of treating diseases with relatively few side effects. Based on these strengths, many studies have been performed to verify the effectiveness of plants in treating diseases. It is crucial to understand the chemicals contained in plants because these chemicals can regulate activities of proteins that are key factors in causing diseases. With the accumulation of a large volume of biomedical literature in various databases such as PubMed, it is possible to automatically extract relationships between plants and chemicals in a large-scale way if we apply a text mining approach. A cornerstone of achieving this task is a corpus of relationships between plants and chemicals.

Results: In this study, we first constructed a corpus for plant and chemical entities and for the relationships between them. The corpus contains 267 plant entities, 475 chemical entities, and 1,007 plant–chemical relationships (550 and 457 positive and negative relationships, respectively), which are drawn from 377 sentences in 245 PubMed abstracts. Inter-annotator agreement scores for the corpus among three annotators were measured. The simple percent agreement scores for entities and trigger words for the relationships were 99.6 and 94.8 %, respectively, and the overall kappa score for the classification of positive and negative relationships was 79.8 %. We also developed a rule-based model to automatically extract such plant–chemical relationships. When we evaluated the rule-based model using the corpus and randomly selected biomedical articles, overall F-scores of 68.0 and 61.8 % were achieved, respectively.

Conclusion: We expect that the corpus for plant–chemical relationships will be a useful resource for enhancing plant research. The corpus is available at http://combio.gist.ac.kr/plantchemicalcorpus.

Corpus URL: http://combio.gist.ac.kr/herding

Daeyong Jin and Hyunju Lee* (2016) Prioritizing cancer-related microRNAs by integrating microRNA and mRNA datasets. Scientific Reports, 2016 October 13; 6:35350 (IF: 5.228) (JCR 2015: 7/62, 11.3%, MULTIDISCIPLINARY SCIENCES).

Prioritizing cancer-related microRNAs by integrating microRNA and mRNA datasets.

  • Author : Daeyong Jin and Hyunju Lee
  • Published Date : 2016
  • Category : Bioinformatics and Text Mining 
  • Place of publication : Scientific Reports

 

Abstract

MicroRNAs (miRNAs) are small non-coding RNAs regulating the expression of target genes, and they are involved in cancer initiation and progression. Even though many cancer-related miRNAs were identified, their functional impact may vary, depending on their effects on the regulation of other miRNAs and genes. In this study, we propose a novel method for the prioritization of candidate cancer-related miRNAs that may affect the expression of other miRNAs and genes across the entire biological network. For this, we propose three important features: the average expression of a miRNA in multiple cancer samples, the average of the absolute correlation values between the expression of a miRNA and expression of all genes, and the number of predicted miRNA target genes. These three features were integrated using order statistics. By applying the proposed approach to four cancer types, glioblastoma, ovarian cancer, prostate cancer, and breast cancer, we prioritized candidate cancer-related miRNAs and determined their functional roles in cancer-related pathways. The proposed approach can be used to identify miRNAs that play crucial roles in driving cancer development, and the elucidation of novel potential therapeutic targets for cancer treatment.

 

Baeksoo Kim, Jihoon Jo, Jonghyun Han, Chungoo Park* and Hyunju Lee* (2017) In silico re-identification of properties of drug target proteins. BMC Bioinformatics, 31 May 2017;18(Suppl 7):248. (IF: 2.448) (JCR 2016: 10/57, 17.5%, Oncology) ($: co-corresponding authors). (Presented at DTMBIO 2016 in conjuction with CIKM, Indianapolis, USA)

In silico re-identification of properties of drug target proteins.

  • Author : Baeksoo Kim, Jihoon Jo, Jonghyun Han,  Chungoo Park and Hyunju Lee
  • Published Date : 2016
  • Category : Bioinformatics
  • Place of publication : BMC Medical Informatics and Decision Making

 

Abstract

Computational approaches in the identification of drug targets are expected to reduce time and effort in drug development. Advances in genomics and proteomics provide the opportunity to uncover properties of druggable genomes. Although several studies have been conducted for distinguishing drug targets from non-drug targets, they mainly focus on the sequences and functional roles of proteins. Many other
properties of proteins have not been fully investigated. In this study, we first confirm previously known properties of drug targets with a higher statistical power by analyzing larger sets of drugs and targets. We then suggest new properties, such as gene essentiality, gene expression levels, tissue specificity, and solvent accessibility. We predict drug targets based on these features using a support vector machine and
a random forest method. We believe that our study will provide a new aspect in inferring drug-target interactions.

Jonghyun Han and Hyunju Lee* (2016) Characterizing the interests of social media users: Refinement of a topic model for incorporating heterogeneous media. Information Sciences, 2016 September 01; 358-359:112-128 (IF: 3.364) (JCR 2015: 8/144, 5.56%, COMPUTER SCIENCE, INFORMATION SYSTEMS)

Characterizing the interests of social media users: Refinement of a topic model for incorporating heterogeneous media.

  • Author : Jonghyun Han and Hyunju Lee
  • Published Date : 2016
  • Category : Mining in Social Network
  • Place of publication : Information Sciences

 

Abstract

Recent research has focused on extracting personal interest data from social media. Although many methods have been developed, accurately estimating users’ interests is often difficult because messages on social media are short and are not classified into any predefined categories. We propose a new method to overcome this problem by incorporating heterogeneous media, such as news. In our method, we first extract explicit features and implicit topics of categories using news media, where implicit topics are determined using a refined topic model. Next, we describe social media messages using these features and topics to estimate users’ interests. Compared with several other approaches, our approach provides more accurate estimations of users’ interests. We also demonstrate that the accuracy of friend recommendations is increased using the users’ interests estimated by our method. Thus, we expect that the proposed approach could be helpful for enhancing the personalization of social media services.

Wangin Kim, Sangbin Park, Chanhun Choi, Youg Ran Kim, Inkyu Park, Changseob Seo, Daehwan Youn, Wook Shin, Yumi Lee, Donghee Choi, Mirae Kim, Hyunju Lee, Seonjong Kim, and Changsu Na (2016) Evaluation of Anti-Inflammatory Potential of the New Ganghwaljetongyeum on Adjuvant-Induced Inflammatory Arthritis in Rats. Evidence-Based Complementary and Alternative Medicine, 2016 June 13; 2016:1230294 (IF 1.931) (JCR 2015: 7/24, 29.2%, INTEGRATIVE & COMPLEMENTARY MEDICINE).

Evaluation of Anti-Inflammatory Potential of the New Ganghwaljetongyeum on Adjuvant-Induced Inflammatory Arthritis in Rats.

  • Author :Wangin Kim, Sangbin Park, Chanhun Choi, Youg Ran Kim, Inkyu Park, Changseob Seo, Daehwan Youn, Wook Shin, Yumi Lee, Donghee Choi, Mirae Kim, Hyunju Lee, Seonjong Kim, and Changsu Na
  • Published Date : 2016
  • Category : Bioinformatics and Text Mining 
  • Place of publication : Evidence-Based Complementary and Alternative Medicine

 

Abstract

Ganghwaljetongyeum (GHJTY) has been used as a standard treatment for arthritis for approximately 15 years at the Korean Medicine Hospital of Dongshin University. GHJTY is composed of 18 medicinal herbs, of which five primary herbs were selected and named new Ganghwaljetongyeum (N-GHJTY). The purpose of the present study was to observe the effect of N-GHJTY on arthritis and to determine its mechanism of action. After confirming arthritis induction using complete Freund’s adjuvant (CFA) in rats, N-GHJTY (62.5, 125, and 250 mg/kg/day) was administered once a day for 10 days. In order to determine pathological changes, edema of the paws and weight were measured before and for 10 days after N-GHJTY administration. Cytokine (TNF-α, IL-1β, and IL-6) levels and histopathological lesions in the knee joint were also examined. Edema in the paw and knee joint of N-GHJTY-treated rats was significantly decreased at 6, 8, and 10 days after administration, compared to that in the CFA-control group, while weight consistently increased. Rats in N-GHJTY-treated groups also recovered from the CFA-induced pathological changes and showed a significant decline in cytokine levels. Taken together, our results showed that N-GHJTY administration was effective in inhibiting CFA-induced arthritis via anti-inflammatory effects while promoting cartilage recovery by controlling cytokine levels.

Ho Jang, Youngmi Hur and Hyunju Lee. Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data. Scientific Reports, 2016 May 09; 6:25582 (IF: 5.578) (JCR: 5/57, 8.8%, MULTIDISCIPLINARY SCIENCES).

Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data

  • Author : Jang Ho, Youngmi Hur,and Hyunju Lee
  • Published Date : 2016
  • Category : Bioinformatics and Text Mining 
  • Place of publication : Scientific Reports

 

Abstract

DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes.

 

Wonjun Choi, Chan-Hun Choi, Young Ran Kim, Seon-Jong Kim, Chang-Su Na and Hyunju Lee. HerDing: herb recommendation system to treat diseases using genes and chemicals. Database (Oxford), 2016 March 15; 2016:baw011 (IF: 3.372) (JCR: 7/57, 12.3%, MATHEMATICAL & COMPUTATIONAL BIOLOGY).

HerDing: herb recommendation system to treat diseases using genes and chemicals.

  • Author :Wonjun Choi, Chan-Hun Choi, Young Ran Kim, Seon-Jong Kim, Chang-Su Na and Hyunju Lee
  • Published Date : 2016
  • Category : Bioinformatics and Text Mining 
  • Place of publication : Database-Oxford

 

Abstract

In recent years, herbs have been researched for new drug candidates because they have a long empirical history of treating diseases and are relatively free from side effects. Studies to scientifically prove the medical efficacy of herbs for target diseases often spend a considerable amount of time and effort in choosing candidate herbs and in performing experiments to measure changes of marker genes when treating herbs. A computational approach to recommend herbs for treating diseases might be helpful to promote efficiency in the early stage of such studies. Although several databases related to traditional Chinese medicine have been already developed, there is no specialized Web tool yet recommending herbs to treat diseases based on disease-related genes. Therefore, we developed a novel search engine, HerDing, focused on retrieving candidate herb-related information with user search terms (a list of genes, a disease name, a chemical name or an herb name). HerDing was built by integrating public databases and by applying a text-mining method. The HerDing website is free and open to all users, and there is no login requirement.

Database URL: http://combio.gist.ac.kr/herding

Bayarbaatar Amgalan and Hyunju Lee (2015) DEOD: Uncovering dominant effects of cancer-driver genes based on a partial covariance selection method. Bioinformatics, 2015 Aug 1;31(15):2452-60. (IF: 4.981) (JCR: 4/52, 7.7%, MATHEMATICAL & COMPUTATIONAL BIOLOGY).

DEOD: Uncovering dominant effects of cancer-driver genes based on a partial covariance selection method.

 

Abstract

Motivation: The generation of a large volume of cancer genomes has allowed us to identify disease-related alterations more accurately, which is expected to enhance our understanding regarding the mechanism of cancer development. With genomic alterations detected, one challenge is to pinpoint cancer-driver genes that cause functional abnormalities.

Results: Here, we propose a method for uncovering the dominant effects of cancer-driver genes (DEOD) based on a partial covariance selection approach. Inspired by a convex optimization technique, it estimates the dominant effects of candidate cancer-driver genes on the expression level changes of their target genes. It constructs a gene network as a directed-weighted graph by integrating DNA copy numbers, single nucleotide mutations, and gene expressions from matched tumor samples, and estimates partial covariances between driver genes and their target genes. Then, a scoring function to measure the cancer-driver score for each gene is applied. To test the performance of DEOD, a novel scheme is designed for simulating conditional multivariate normal variables (targets and free genes) given a group of variables (driver genes). When we applied the DEOD method to both the simulated data and breast cancer data, DEOD successfully uncovered driver variables in the simulation data, and identified well-known oncogenes in breast cancer. In addition, two highly ranked genes by DEOD were related to survival time. The copy number amplifications of MYC (8q24.21) and TRPS1 (8q23.3) were closely related to the survival time with p-values = 0.00246 and 0.00092, respectively. The results demonstrate that DEOD can efficiently uncover cancer-driver genes.

Availability: DEOD was implemented in Matlab, and source codes and data are available at http://combio.gist.ac.kr/softwares/

Daeyong Jin and Hyunju Lee (2015) A Computational Approach to Identifying Gene-microRNA Modules in Cancer. PLoS Computational Biology, 2015 Jan 22; 11(1):e1004042. (IF: 4.829) (JCR: 3/52, 5.8%, MATHEMATICAL & COMPUTATIONAL BIOLOGY).

A Computational Approach to Identifying Gene-microRNA Modules in Cancer.

  • Author : Daeyong Jin and Hyunju Lee
  • Published Date : 2015
  • Category : Bioinformatics and Text Mining 
  • Place of publication : PLoS Computational Biology

 

Abstract

MicroRNAs (miRNAs) play key roles in the initiation and progression of various cancers by regulating genes. Regulatory interactions between genes and miRNAs are complex, as multiple miRNAs can regulate multiple genes. In addtion, these interactions vary from patient to patient and even among patients with the same cancer type, as cancer development is a heterogeneous process. These relationships are more complicated because transcription factors and other regulatory molecules can also regulate miRNAs and genes. Hence, it is important to identify the complex relationships between genes and miRNAs in cancer. In this study, we propose a computational approach to constructing modules that represent these relationships by integrating the expression data of genes and miRNAs with gene-gene interaction data. First, we used a biclustering algorithm to construct modules consisting of a subset of genes and a subset of samples to incorporate the heterogeneity of cancer cells. Second, we combined gene-gene interactions to include genes that play important roles in cancer-related pathways. Then, we selected miRNAs that are closely associated with genes in the modules based on a Gaussian Bayesian network and Bayesian Information Criteria. When we applied our approach to ovarian cancer and glioblastoma (GBM) data sets, 33 and 54 modules were constructed, respectively. In these modules, 91% and 94% of ovarian cancer and GBM modules, respectively, were explained either by direct regulation between genes and miRNAs or by indirect relationships via transcription factors. In addition, 48.4% and 74.0% of modules from ovarian cancer and GBM, respectively, were enriched with cancer-related pathways, and 51.7% and 71.7% of miRNAs in modules were ovarian cancer-related miRNAs and GBM-related miRNAs, respectively. Finally, we extensively analyzed significant modules and showed that most genes in these modules were related to ovarian cancer and GBM.