Abstract
Websites have become the most popular way of distributing malware, so it is important to detect those websites before users visit them. Prior approaches to detecting malware distribution websites have either suffered from low accuracy or incurred high overhead. We propose to consider the disparity between the claimed "identity" of a website and the observed one. Given a website, our system collects clues that show the identity that this website claims, and measures disparity between its domain and content using textual relevance. Our disparity measure has significantly little overhead and is not prone to content noise. Experimental results demonstrate that our mechanism detects malware distribution websites with considerably high accuracies, especially without noticeable overhead.
Original language | English |
---|---|
Pages (from-to) | 2907-2916 |
Number of pages | 10 |
Journal | Information |
Volume | 16 |
Issue number | 5 |
State | Published - May 2013 |
Keywords
- Drive-by downloads
- Machine learning
- Malware distribution
- Reasoning
- Usable security