Author Archives: Combio

Bayabaatar Amgalan, Ider Tseveendorj, Hyunju Lee (2018) An Integrative Model for the Identification of Key Players of Cancer Networks. Applied Mathematical Modelling, 2018 June 01, 58:65-75. (JCR 2016: 19/100, 19%, MATHEMATICS, INTERDISCIPLINARY APPLICATIONS)

An Integrative Model for the Identification of Key Players of Cancer Networks.

  • Author : Bayabaatar Amgalan and Hyunju Lee
  • Published Date : 2018
  • Category : Bioinformatics
  • Place of publication : Applied Mathematical Modelling

 

Abstract

Uncovering miscoordination in a biological network is essential for the understanding of cellular malfunctions in cancer. Integrative analysis across multiple cellular levels may provide an opportunity to elucidate the miscoordination between the regulatory mechanisms in cancer cells.

Here, we propose an integrative model for the identification of key players of the cancer-activated Multi-Type Interaction (MTI) gene network (KPOCN). To measure the functional associations between genes, using DNA copy number aberrations (CNAs) and gene expressions (GEs), we constructed three interacting weighted graphs: GEs affected by CNAs, CNAs by CNAs, and GEs by GEs. These three weighted graphs were mapped onto a single graph, in order to construct a MTI gene network by using their optimal combination. Finally, the effect of a single gene was determined by using the centrality and betweenness of node scores in the MTI network.

We first tested KPOCN using simulated datasets, and afterward, we applied this model to the real breast cancer datasets. KPOCN was shown to identify successfully key regulators with their corresponding response variables (targets) when using the simulated data, and identified well-known breast cancer oncogenes. These results demonstrated that our model can be used for an efficient identification of key genes that affect cancer development. Source codes are available at http://gcancer.org/KPOCN.

Ho Jang and Hyunju Lee (2018) Identification of cancer driver genes in focal genomic aberrations from whole-exome sequencing data. Bioinformatics, 2018 Feb 1;34(3):519-521. (IF: 7.307) (JCR 2016: 2/57, 3.5%, MATHEMATICAL & COMPUTATIONAL BIOLOGY)

Identification of cancer driver genes in focal genomic aberrations from whole-exome sequencing data.

  • Author : Ho Jang and Hyunju Lee
  • Published Date : 2018
  • Category : Bioinformatics
  • Place of publication : Bioinformatics

 

Abstract

Summary:

Whole-exome sequencing (WES) data have been used for identifying copy number aberrations in cancer cells. Nonetheless, the use of WES is still challenging for identification of focal aberrant regions in multiple samples that may contain cancer driver genes. In this study, we developed a wavelet-based method for identifying focal genomic aberrant regions in the WES data from cancer cells (WIFA-X). When we applied WIFA-X to glioblastoma multiforme and lung adenocarcinoma datasets, WIFA-X outperformed other approaches on identifying cancer driver genes.

Availability:

R source code is available at http://gcancer.org/wifax.

Hyejin Cho, Wonjun Choi and Hyunju Lee (2017) A method for named entity normalization in biomedical articles: application to diseases and plants. BMC Bioinformatics, 13 October 2017;18(1):451. (IF: 2.448) (JCR 2016: 10/57, 17.5%, MATHEMATICAL & COMPUTATIONAL BIOLOGY)

A method for named entity normalization in biomedical articles: application to diseases and plants.

 

Abstract

Background: In biomedical articles, a named entity recognition (NER) technique that identifies entity names from texts is an important element for extracting biological knowledge from articles. After NER is applied to articles, the next step is to normalize the identified names into standard concepts (i.e., disease names are mapped to the National Library of Medicine’s Medical Subject Headings disease terms). In biomedical articles, many entity normalization methods rely on domain-specific dictionaries for resolving synonyms and abbreviations. However, the dictionaries are not comprehensive except for some entities such as genes. In recent years, biomedical articles have accumulated rapidly, and neural network-based algorithms that incorporate a large amount of unlabeled data have shown considerable success in several natural language processing problems.

Results: In this study, we propose an approach for normalizing biological entities, such as disease names and plant names, by using word embeddings to represent semantic spaces. For diseases, training data from the National Center for Biotechnology Information (NCBI) disease corpus and unlabeled data from PubMed abstracts were used to construct word representations. For plants, a training corpus that we manually constructed and unlabeled PubMed abstracts were used to represent word vectors. We showed that the proposed approach performed better than the use of only the training corpus or only the unlabeled data and showed that the normalization accuracy was improved by using our model even when the dictionaries were not comprehensive. We obtained F-scores of 0.808 and 0.690 for normalizing the NCBI disease corpus and manually constructed plant corpus, respectively. We further evaluated our approach using a data set in the disease normalization task of the BioCreative V challenge. When only the disease corpus was used as a dictionary, our approach significantly outperformed the best system of the task.

Conclusions: The proposed approach shows robust performance for normalizing biological entities. The manually constructed plant corpus and the proposed model are available at http://gcancer.org/plant and http://gcancer.org/normalization, respectively.

Seungchul Lee#, Jingu Lee#, Sung Hoon Sim#, Yeonghun Lee, Kyung Chul Moon, Cheol Lee, Woong-Yang Park, Nayoung K. D. Kim, Se-Hoon Lee$, and Hyunju Lee$ (2017) Comprehensive somatic genome alterations of urachal carcinoma. Journal of Medical Genetics, 2017 August 01; 54(8):572-578 (IF: 5.650) (JCR 2016: 19/166, 11.145%, GENETICS & HEREDITY)

Comprehensive somatic genome alterations of urachal carcinoma.

  • Author : Seungchul Lee#, Jingu Lee#, Sung Hoon Sim#, Yeonghun Lee, Kyung Chul Moon, Cheol Lee, Woong-Yang Park, Nayoung K. D. Kim, Se-Hoon Lee$, and Hyunju Lee$
  • Published Date : 2017
  • Category : Bioinformatics and Text Mining 
  • Place of publication : Journal of Medical Genetics

 

Abstract

Background: Urachal cancer is a rare cancer that develops in the urachus. Because of its rarity, standard treatment therapies for urachal cancer are not established, and chemotherapeutic regimens for bladder cancer have been unsuccessful for patients with urachal cancer. Hence, we aim to understand a systematic molecular characterization of urachal cancer.

Methods: We identified somatic single nucleotide variations (SNVs)/indels and somatic copy number aberrations (SCNAs) in the 17 patients by using whole-exome sequencing (WES) and OncoScanTM platform (Affymetrix) as follows: tumour-normal paired sequencing (WES, n = 10), tumour-only sequencing (WES, n = 1; targeted deep sequencing, n = 16), and OncoScanTM (n = 17).

Results: Our analyses identified 27 genes with somatic SNVs and indels, as well as six genes (APC, COL5A1, KIF26B, LRP1B, SMAD4, and TP53) that were recurrent in at least two patients. By analysing the SCNAs, we found that the extent of chromosomal amplifica tion was highly associated with the patient’s cancer stage. Interestingly, 35% (6/17) of the patients had focal DNA amplifications in FGFR family genes. The integration of somatic SNVs, indels, and SCNAs revealed significant alterations in the MAPK signalling pathways.

Conclusions: Our genome wide analysis of urachal cancer suggests that molecular characteristics may be important for the treatment of urachal cancer.

Wonjun Choi, Baeksoo Kim, Hyejin Cho, Doheon Lee and Hyunju Lee* (2016) A corpus for plant-chemical relationships in the biomedical domain. BMC Bioinformatics, 2016 September 20; 17:386 (IF: 2.435) (JCR 2015: 10/56, 17.9%, MATHEMATICAL & COMPUTATIONAL BIOLOGY).

A corpus for plant-chemical relationships in the biomedical domain.

 

Abstract

Background: Plants are natural products that humans consume in various ways including food and medicine. They have a long empirical history of treating diseases with relatively few side effects. Based on these strengths, many studies have been performed to verify the effectiveness of plants in treating diseases. It is crucial to understand the chemicals contained in plants because these chemicals can regulate activities of proteins that are key factors in causing diseases. With the accumulation of a large volume of biomedical literature in various databases such as PubMed, it is possible to automatically extract relationships between plants and chemicals in a large-scale way if we apply a text mining approach. A cornerstone of achieving this task is a corpus of relationships between plants and chemicals.

Results: In this study, we first constructed a corpus for plant and chemical entities and for the relationships between them. The corpus contains 267 plant entities, 475 chemical entities, and 1,007 plant–chemical relationships (550 and 457 positive and negative relationships, respectively), which are drawn from 377 sentences in 245 PubMed abstracts. Inter-annotator agreement scores for the corpus among three annotators were measured. The simple percent agreement scores for entities and trigger words for the relationships were 99.6 and 94.8 %, respectively, and the overall kappa score for the classification of positive and negative relationships was 79.8 %. We also developed a rule-based model to automatically extract such plant–chemical relationships. When we evaluated the rule-based model using the corpus and randomly selected biomedical articles, overall F-scores of 68.0 and 61.8 % were achieved, respectively.

Conclusion: We expect that the corpus for plant–chemical relationships will be a useful resource for enhancing plant research. The corpus is available at http://combio.gist.ac.kr/plantchemicalcorpus.

Corpus URL: http://combio.gist.ac.kr/herding

Daeyong Jin and Hyunju Lee* (2016) Prioritizing cancer-related microRNAs by integrating microRNA and mRNA datasets. Scientific Reports, 2016 October 13; 6:35350 (IF: 5.228) (JCR 2015: 7/62, 11.3%, MULTIDISCIPLINARY SCIENCES).

Prioritizing cancer-related microRNAs by integrating microRNA and mRNA datasets.

  • Author : Daeyong Jin and Hyunju Lee
  • Published Date : 2016
  • Category : Bioinformatics and Text Mining 
  • Place of publication : Scientific Reports

 

Abstract

MicroRNAs (miRNAs) are small non-coding RNAs regulating the expression of target genes, and they are involved in cancer initiation and progression. Even though many cancer-related miRNAs were identified, their functional impact may vary, depending on their effects on the regulation of other miRNAs and genes. In this study, we propose a novel method for the prioritization of candidate cancer-related miRNAs that may affect the expression of other miRNAs and genes across the entire biological network. For this, we propose three important features: the average expression of a miRNA in multiple cancer samples, the average of the absolute correlation values between the expression of a miRNA and expression of all genes, and the number of predicted miRNA target genes. These three features were integrated using order statistics. By applying the proposed approach to four cancer types, glioblastoma, ovarian cancer, prostate cancer, and breast cancer, we prioritized candidate cancer-related miRNAs and determined their functional roles in cancer-related pathways. The proposed approach can be used to identify miRNAs that play crucial roles in driving cancer development, and the elucidation of novel potential therapeutic targets for cancer treatment.

 

Baeksoo Kim, Jihoon Jo, Jonghyun Han, Chungoo Park* and Hyunju Lee* (2017) In silico re-identification of properties of drug target proteins. BMC Bioinformatics, 31 May 2017;18(Suppl 7):248. (IF: 2.448) (JCR 2016: 10/57, 17.5%, Oncology) ($: co-corresponding authors). (Presented at DTMBIO 2016 in conjuction with CIKM, Indianapolis, USA)

In silico re-identification of properties of drug target proteins.

  • Author : Baeksoo Kim, Jihoon Jo, Jonghyun Han,  Chungoo Park and Hyunju Lee
  • Published Date : 2016
  • Category : Bioinformatics
  • Place of publication : BMC Bioinformatics

 

Abstract

Computational approaches in the identification of drug targets are expected to reduce time and effort in drug development. Advances in genomics and proteomics provide the opportunity to uncover properties of druggable genomes. Although several studies have been conducted for distinguishing drug targets from non-drug targets, they mainly focus on the sequences and functional roles of proteins. Many other
properties of proteins have not been fully investigated. In this study, we first confirm previously known properties of drug targets with a higher statistical power by analyzing larger sets of drugs and targets. We then suggest new properties, such as gene essentiality, gene expression levels, tissue specificity, and solvent accessibility. We predict drug targets based on these features using a support vector machine and
a random forest method. We believe that our study will provide a new aspect in inferring drug-target interactions.

Jonghyun Han and Hyunju Lee* (2016) Characterizing the interests of social media users: Refinement of a topic model for incorporating heterogeneous media. Information Sciences, 2016 September 01; 358-359:112-128 (IF: 3.364) (JCR 2015: 8/144, 5.56%, COMPUTER SCIENCE, INFORMATION SYSTEMS)

Characterizing the interests of social media users: Refinement of a topic model for incorporating heterogeneous media.

  • Author : Jonghyun Han and Hyunju Lee
  • Published Date : 2016
  • Category : Mining in Social Network
  • Place of publication : Information Sciences

 

Abstract

Recent research has focused on extracting personal interest data from social media. Although many methods have been developed, accurately estimating users’ interests is often difficult because messages on social media are short and are not classified into any predefined categories. We propose a new method to overcome this problem by incorporating heterogeneous media, such as news. In our method, we first extract explicit features and implicit topics of categories using news media, where implicit topics are determined using a refined topic model. Next, we describe social media messages using these features and topics to estimate users’ interests. Compared with several other approaches, our approach provides more accurate estimations of users’ interests. We also demonstrate that the accuracy of friend recommendations is increased using the users’ interests estimated by our method. Thus, we expect that the proposed approach could be helpful for enhancing the personalization of social media services.

Wangin Kim, Sangbin Park, Chanhun Choi, Youg Ran Kim, Inkyu Park, Changseob Seo, Daehwan Youn, Wook Shin, Yumi Lee, Donghee Choi, Mirae Kim, Hyunju Lee, Seonjong Kim, and Changsu Na (2016) Evaluation of Anti-Inflammatory Potential of the New Ganghwaljetongyeum on Adjuvant-Induced Inflammatory Arthritis in Rats. Evidence-Based Complementary and Alternative Medicine, 2016 June 13; 2016:1230294 (IF 1.931) (JCR 2015: 7/24, 29.2%, INTEGRATIVE & COMPLEMENTARY MEDICINE).

Evaluation of Anti-Inflammatory Potential of the New Ganghwaljetongyeum on Adjuvant-Induced Inflammatory Arthritis in Rats.

  • Author :Wangin Kim, Sangbin Park, Chanhun Choi, Youg Ran Kim, Inkyu Park, Changseob Seo, Daehwan Youn, Wook Shin, Yumi Lee, Donghee Choi, Mirae Kim, Hyunju Lee, Seonjong Kim, and Changsu Na
  • Published Date : 2016
  • Category : Bioinformatics and Text Mining 
  • Place of publication : Evidence-Based Complementary and Alternative Medicine

 

Abstract

Ganghwaljetongyeum (GHJTY) has been used as a standard treatment for arthritis for approximately 15 years at the Korean Medicine Hospital of Dongshin University. GHJTY is composed of 18 medicinal herbs, of which five primary herbs were selected and named new Ganghwaljetongyeum (N-GHJTY). The purpose of the present study was to observe the effect of N-GHJTY on arthritis and to determine its mechanism of action. After confirming arthritis induction using complete Freund’s adjuvant (CFA) in rats, N-GHJTY (62.5, 125, and 250 mg/kg/day) was administered once a day for 10 days. In order to determine pathological changes, edema of the paws and weight were measured before and for 10 days after N-GHJTY administration. Cytokine (TNF-α, IL-1β, and IL-6) levels and histopathological lesions in the knee joint were also examined. Edema in the paw and knee joint of N-GHJTY-treated rats was significantly decreased at 6, 8, and 10 days after administration, compared to that in the CFA-control group, while weight consistently increased. Rats in N-GHJTY-treated groups also recovered from the CFA-induced pathological changes and showed a significant decline in cytokine levels. Taken together, our results showed that N-GHJTY administration was effective in inhibiting CFA-induced arthritis via anti-inflammatory effects while promoting cartilage recovery by controlling cytokine levels.

Ho Jang, Youngmi Hur and Hyunju Lee. Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data. Scientific Reports, 2016 May 09; 6:25582 (IF: 5.578) (JCR: 5/57, 8.8%, MULTIDISCIPLINARY SCIENCES).

Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data

  • Author : Jang Ho, Youngmi Hur,and Hyunju Lee
  • Published Date : 2016
  • Category : Bioinformatics and Text Mining 
  • Place of publication : Scientific Reports

 

Abstract

DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes.