Toward Effective Search and Knowledge Discovery in Online Health Forums


Online health data, such as online health forums, are growing rapidly, becoming a valuable resource for patients and caregivers to seek information and to ask for support. Medical researchers can also discover new knowledge from the big health data. However, health forum data are mostly semi-structured or unstructured, posing challenges for automated processing to obtain semantics. In this project, we utilize the big health data and investigate a patient-centered information extraction, classification, and integration to support effective search and evidence-based knowledge discovery.



We propose a patient-centered approach: connecting the dots of information by a semantic information unit: patient, for effective information search and knowledge discovery in online health forums.

We have been identifying several key research challenges, including effective keyword search, learning thread reply structure for better information retrieval and exploration, and knowledge discovery in online health forums.

We have been developing techniques to address these problems to enable patients, caregivers, doctors, and researchers to effectively leverage the rich information in health forum for information search and evidence-based knowledge discovery. 

  • Investigating techniques for effective keyword search on online health forum data. Traditional search on online forums is based on basic syntactic units in the semistructured data, such as a post or a thread. Post-based search considers each post as an information unit (like a document) and returns a post if it contains the query keywords; on the other hand, thread-based search takes each thread as an information unit and considers a thread as relevant if the posts in the thread collectively contain the query keywords. When a user searches a health forum, for example "Vitamin, aggression", where the user looks for information of other patients who have used Vitamin to alleviate aggression, the user expects the matches of all query keywords refer to the same patient. However, post-based or thread-based search only checks syntactic information units, either a post or a thread, but is oblivious to the semantics. It is common that multiple posts refer to the same patient, and one thread contains information of multiple patients. Thus existing approaches suffer from either low recall or low precision. We propose patient-centered information extraction for effective keywords search on health forums based on the semantic information unit, patient. Our proposed approach can better meet the expectation of the query user.

  • Investigating techniques for learning thread reply structure in online health forums. The thread reply structure, the reply relationships between posts within a thread, is very important for patient-centered information retrieval and exploration on the health forums. However, such reply relationships are not always available on health forums. We propose to leverage person reference relationships, combined with a statistical machine learning model, to learn the unknown thread structure on health forums.

  • Investigating techniques for evidence-based patient-centered knowledge discovery in online health forums. We propose to connect pieces of medical information, such as diseases, symptoms, treatments and effects, in online health forums by their semantic information units, patients, for effective knowledge discovery. Adverse drug reactions (ADRs) is a serious health problem, and is estimated as the fourth leading cause of death in the United States. We propose a patient-centered and experience-aware mining framework for effective ADR discovery using online health forum data.






            Yi Chen < >


            Jinhe Shi < >


             Yunzhong Liu

             Mike Citro

             Viraj Bhalala



This project is supported by The Leir Charitable Foundations and NSF CAREER Award 1322406.