A Data Science Approach to Enterprise Security

Pratyusa K. Manadhata
Hewlett-Packard Laboratories


Organizations routinely collect billions of security alerts and events from their networks for compliance and forensic analysis. These event data, collected from both security products and non-security network elements, contain valuable information, e.g., early signs of a new attack. Separating signal from noise at this scale, however, is challenging and is a relatively unexplored research area. In this talk, we introduce a data science approach, i.e., designing algorithms and building systems to identify attacks and other malicious activities from big security data, and to build tools to aid network administrators in responding to them. We highlight key challenges in implementing the approach, e.g., in data collection and storage, algorithm design, privacy, and usability, and identify research opportunities to address them. We also present an example to demonstrate our approach’s feasibility- we use large-scale graph inference to detect malicious domains accessed by hosts in an enterprise network from the enterprise’s HTTP proxy logs. Our experiments on data collected from a large worldwide network show that we can identify previously unknown malicious domains in a scalable and reliable manner.