Subscribe to our Newsletter and get informed about new publication regulary and special discounts for subscribers!

ILNS > ILNS Volume 44 > Text Mining Approach to Analyse the Relation...
< Back to Volume

Text Mining Approach to Analyse the Relation between Obesity and Breast Cancer Data

Full Text PDF


Biomedical research needs to leverage and exploit large amount of information reported in scientific publication. Literature data collected from publications has to be managed to extract information, transforms into an understandable structure using text mining approaches. Text mining refers to the process of deriving high-quality information from text by finding relationships between entities which do not show direct associations. Therefore, as an example of this approach, we present the link between two diseases i.e. breast cancer and obesity.Obesity is known to be associated with cancer mortality, but little is known about the link between lifetime changes in BMI of obese person and cancer mortality in both males and females. In this article, literature data for obesity and breast cancer was obtained using PubMed database and then methodologies which employs groups of common genes and keywords with their frequency of occurrence in the data were used, aimed to establish relation between obesity and breast cancer visualized using Pi-charts and bar graphs. From the data analysis, we obtained 1 gene which showed the link between both the diseases and validated using statistical analysis and disease-connect web server. We also proposed 8 common higher frequency keywords which could be used for indexing while searching the literature for obesity and breast cancer in combination.


International Letters of Natural Sciences (Volume 44)
A. Kumar et al., "Text Mining Approach to Analyse the Relation between Obesity and Breast Cancer Data", International Letters of Natural Sciences, Vol. 44, pp. 1-9, 2015
Online since:
July 2015

[1] Funk, C. S., I. Kahanda, A. Ben-Hur and K. M. Verspoor (2015). Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct., J Biomed Semantics6: 9.


[2] Preiss, J., M. Stevenson and R. Gaizauskas (2015). Exploring Relation Types for Literature-based Discovery., J Am Med Inform Assoc.


[3] Ramezankhani, A., O. Pournik, J. Shahrabi, F. Azizi and F. Hadaegh (2015). An application of association rule mining to extract risk pattern for type 2 diabetes using tehran lipid and glucose study database., Int J Endocrinol Metab13(2): e25389.


[4] Burkhart, K. K., D. Abernethy and D. Jackson (2015). Data Mining FAERS to Analyze Molecular Targets of Drugs Highly Associated with Stevens-Johnson Syndrome., J Med Toxicol.


[5] Bravo, A., J. Pinero, N. Queralt-Rosinach, M. Rautschka and L. I. Furlong (2015). Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research., BMC Bioinformatics16(1): 55.


[6] Taghizadeh, N., H. M. Boezen, J. P. Schouten, C. P. Schroder, E. G. Vries and J. M. Vonk (2015). BMI and Lifetime Changes in BMI and Cancer Mortality Risk., PLoS One10(4): e0125261.


[7] Scholz, C., U. Andergassen, P. Hepp, C. Schindlbeck, T. W. Friedl, N. Harbeck, M. Kiechle, H. Sommer, H. Hauner, K. Friese, B. Rack and W. Janni (2015).


[8] Rapid-I: Rapid Miner., Rapid - I. Rapid - I, n. d. Web. 10 Nov. (2012).

[9] Jyoti Rani, S. Ramachandan and Ab. Rauf Shah (2014). Text mining of PubMed abstracts. R package version 1. 0. 4.

[10] Kevin Becker, Douglas Hosack, Glynn Dennis, Richard A Lempicki, Tiffani J Bright , Chris Cheadle and Jim Engel (10 December 2003), PubMatrix: atool for multiplex literature mining BMC Bioinformatics, Vol. 4, No. 161.


[11] Ashburner et al, Gene ontology: tool for the unification of biology (2000) Nat Genet 25(1): 25-9.

[12] Liu CC, Tseng YT, Li W, Wu CY, Mayzus I, Rzhetsky A, Sun F, Waterman M, Chen JJ, Chaudhary PM, Loscalzo J, Crandall E, Zhou XJ. (2014).

[13] Layla Oesper, Daniele Merico, Ruth Isserlin and Gary D Bader (2011). WordCloud: a cytoscape plugin to create a visual semantic summary of networks. Source Code for Biology and Medicine, 6: 7.

Show More Hide
Cited By:

[1] N. Ali, E. Amer, H. Zayed, Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017, Vol. 639, p. 280, 2018