ML for network intrusion detection problems -- a short, lightly annotated bibliography

Attention Conservation Notice : In response to a twitter thread, this is a short list of papers from my Zotero library related to the intersection of ML/data science and network security, grouped into vague categories reflecting my memory of the paper. Probably at least a little dated.

Longer disclaimer: I should note up front that this is not intended to be a definitive reference; I haven’t done a lot on network based intrusion detection for a while, and this list is almost certainly a bit dated. It also reflects my own biases; I generally only save papers that grab my attention for one reason or another. The fact that a particular paper isn’t on this list doesn’t mean it’s not a good paper worth a read; if your favorite paper (or even, alas, your own paper) isn’t on this list, please forgive me. Email me a copy and I’ll update this list at some point.

I’ve tried to sort them based on what I remember about the paper, or a brief skim if I couldn’t remember anything at all, apologies if I’ve misfiled one.

With that list of caveats: a lightly annotated bibliography of papers from my library…

One of the first papers I’m aware of that tries to apply “data science” approaches to intrusion detection; it’s a rule-based approach for a single system

Denning, D.E., 1987. An intrusion-detection model. IEEE Transactions on software engineering, (2), pp.222-232.

Signature based systems – these aren’t really ML-based systems (although Bro can be extended in that way), but a lot of ML-based network intrusion detection systems often seek to emulate or extend them.

Paxson, V., 1999. Bro: a system for detecting network intruders in real-time. Computer networks, 31(23-24), pp.2435-2463.
Roesch, M., 1999, November. Snort: Lightweight intrusion detection for networks. In Lisa (Vol. 99, No. 1, pp. 229-238).

“Position” papers that point out issues/difficulties/frustrations with network intrusion detection in the real world, both ML related and not:

Sommer, Robin, and Vern Paxson. “Outside the closed world: On using machine learning for network intrusion detection.” In 2010 IEEE symposium on security and privacy, pp. 305-316. IEEE, 2010.
Axelsson, S., 2000. The base-rate fallacy and the difficulty of intrusion detection. ACM Transactions on Information and System Security (TISSEC), 3(3), pp.186-205.
Axelsson, S., 2000. Intrusion detection systems: A survey and taxonomy.
Ptacek, T.H. and Newsham, T.N., 1998. Insertion, evasion, and denial of service: Eluding network intrusion detection. SECURE NETWORKS INC CALGARY ALBERTA.

Discussion of publicly available data (spoiler, there really isn’t any great publicly available data for this sort of thing, but you really don’t want to be using DARPA’98/KDD’99)

Mahoney, M.V. and Chan, P.K., 2003, September. An analysis of the 1999 DARPA/Lincoln Laboratory evaluation data for network anomaly detection. In International Workshop on Recent Advances in Intrusion Detection (pp. 220-237). Springer, Berlin, Heidelberg.
Lippmann, R., Cunningham, R.K., Fried, D.J., Graf, I., Kendall, K.R., Webster, S.E. and Zissman, M.A., 1999, September. Results of the DARPA 1998 Offline Intrusion Detection Evaluation. In Recent advances in intrusion detection (Vol. 99, pp. 829-835).
Ertoz, L., Eilertson, E., Lazarevic, A., Tan, P.N., Kumar, V., Srivastava, J. and Dokas, P., 2004. Minds-minnesota intrusion detection system. Next generation data mining, pp.199-218.
Tavallaee, M., Bagheri, E., Lu, W. and Ghorbani, A.A., 2009, July. A detailed analysis of the KDD CUP 99 data set. In 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications (pp. 1-6). IEEE.

Applying ML to deep packet inspection, both signature generation and anomaly detection

Wang, K., Parekh, J.J. and Stolfo, S.J., 2006, September. Anagram: A content anomaly detector resistant to mimicry attack. In International Workshop on Recent Advances in Intrusion Detection (pp. 226-248). Springer, Berlin, Heidelberg.
Lee, W., Stolfo, S.J. and Mok, K.W., 2000. Adaptive intrusion detection: A data mining approach. Artificial Intelligence Review, 14(6), pp.533-567.
Wang, K. and Stolfo, S.J., 2004, September. Anomalous payload-based network intrusion detection. In International Workshop on Recent Advances in Intrusion Detection (pp. 203-222). Springer, Berlin, Heidelberg.
Rieck, K. and Laskov, P., 2007. Language models for detection of unknown attacks in network traffic. Journal in Computer Virology, 2(4), pp.243-256.
Harang, R. and Mell, P., 2016, October. Micro-signatures: The Effectiveness of Known Bad N-Grams for Network Anomaly Detection. In International Symposium on Foundations and Practice of Security (pp. 36-47). Springer, Cham.
Krueger, T., Krämer, N. and Rieck, K., 2010, September. ASAP: Automatic semantics-aware analysis of network payloads. In International Workshop on Privacy and Security Issues in Data Mining and Machine Learning (pp. 50-63). Springer, Berlin, Heidelberg.
Lee, W., Stolfo, S.J. and Mok, K.W., 1999. A data mining framework for building intrusion detection models. In Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No. 99CB36344) (pp. 120-132). IEEE.
Newsome, J., Karp, B. and Song, D., 2005, May. Polygraph: Automatically generating signatures for polymorphic worms. In 2005 IEEE Symposium on Security and Privacy (S&P’05) (pp. 226-241). IEEE.
Rafique, M.Z. and Caballero, J., 2013, October. Firma: Malware clustering and network signature generation with mixed network behaviors. In International Workshop on Recent Advances in Intrusion Detection (pp. 144-163). Springer, Berlin, Heidelberg.

Anomaly detection/unsupervised methods

Perdisci, R., Ariu, D., Fogla, P., Giacinto, G. and Lee, W., 2009. McPAD: A multiple classifier system for accurate payload-based anomaly detection. Computer networks, 53(6), pp.864-881.
Lee, W. and Xiang, D., 2000, May. Information-theoretic measures for anomaly detection. In Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001 (pp. 130-143). IEEE.
Portnoy, L., 2000. Intrusion detection with unlabeled data using clustering.
Perdisci, R., Gu, G. and Lee, W., 2006, December. Using an Ensemble of One-Class SVM Classifiers to Harden Payload-based Anomaly Detection Systems. In ICDM (Vol. 6, pp. 488-498).

Before-their-time award: adversarial papers from before “Adversarial ML” was quite so much a thing (there’s a lot more in this vein, I’ve just downsampled to network intrusion-related ones):

Perdisci, R., Dagon, D., Lee, W., Fogla, P. and Sharif, M., 2006, May. Misleading worm signature generators using deliberate noise injection. In 2006 IEEE Symposium on Security and Privacy (S&P’06) (pp. 15-pp). IEEE.
Laskov, P., 2014, May. Practical evasion of a learning-based classifier: A case study. In 2014 IEEE symposium on security and privacy (pp. 197-211). IEEE.
Barreno, M., Nelson, B., Joseph, A.D. and Tygar, J.D., 2010. The security of machine learning. Machine Learning, 81(2), pp.121-148.
Fogla, P. and Lee, W., 2006, October. Evading network anomaly detection systems: formal reasoning and practical techniques. In Proceedings of the 13th ACM conference on Computer and communications security (pp. 59-68). ACM.

Other good papers (mostly of the what-it-says-on-the-tin variety):

Holz, T., Gorecki, C., Rieck, K. and Freiling, F.C., 2008, February. Measuring and Detecting Fast-Flux Service Networks. In NDSS.
Gu, G., Perdisci, R., Zhang, J. and Lee, W., 2008. Botminer: Clustering analysis of network traffic for protocol-and structure-independent botnet detection.
Gu, G., Zhang, J. and Lee, W., 2008. BotSniffer: Detecting botnet command and control channels in network traffic.

Written on June 4, 2019