Faulds: A Non-Parametric Iterative Classifier for Internet-Wide OS Fingerprinting

-

### Rating (1--4):
+ 3: Weak Accept -- This paper may have flaws, but I would not argue against it at a major conference

### What did this paper do well?

+ It is well-structured, presenting materials in a very logical, careful way. I like the use of symbols and mathematical equations ! (one obvious reason is that the authors have been working on this topic for a while - main author published the Hershel paper)
+ Great sample size (67M web servers)
+ Final math model is impressive

### Where did this paper fall short?

+ I have doubts regarding the originality of this paper. The key idea of this paper is about the "non-parametric Expectation-Maximization (EM) estimator" which sounds very close to the "EM Algorithms for nonparametric estimation of mixing distributions" by Kenneth Train from UC Berkeley (https://eml.berkeley.edu/~train/EMtrain.pdf) published in 2007. This paper did not cite Kenneth's work. Investigating whether the two ideas are the same or not is out of the scope of this assignment ...
+ The paper cited 17 os fingerprinting via network papers, 5 of which belong to the "SYN method". However, the paper did not give a brief summary of how the non-parametric EM estimator method can be better. There are few details here and there regarding that but then it is too few considering the author spent a good paragraph on comparing Faulds with nmap (a tool that was designed for a totally different purpose - which they admitted)
+ Some important math concepts need to be explained a little bit further. For example, the votatility was referenced to a paper rather than some brief explanations. That leads to difficulties in understanding the set of alpha and theta and why this paper wants to calculate those as the model's output rather than model's input. Such key elements should be included in the "Background" section.

### What did you learn from reading this paper?

+ SYN probing is such a common and effective method to fingerprint the network
+ Probability and other discrete math theories are very important for constructing models to be used in solving networking problems

### What questions do you have about the paper or the area?

+ With more fields in IPv6 headers, does it mean there will be more ways for fingerprinting over the network?
+ What are the methods to obfuscate packets and prevent these kinds of os finger printing?
+ Is there any possibility to finger print network security softwares such as IDS, IPS, firewalls, etc ?