Microsoft SmartScreen

The information presented here has been kindly provided by Hey, Doug S., Tom and SomeGuy with a little backup from me.

The title does not adequately sum up the information presented below, so for those of you who don’t need a full rundown, here is a quick summary.

  • Websites scraped by a range of IP addresses belonging to various service providers (Most notably 64.124.203.xx, 207.46.x.x, 208.50.101.x)
  • Seemingly random get’s of unindexed content, again originating from a range of IP addresses belonging to various service providers

The behaviour above can be seen if someone surf’s your site with Internet Explorer with SmartScreen filtering enabled. There is a range of IP addresses that these requests will come from. For the details please read on.

Overview

The information contained in this page represents the combined efforts of Hey, Doug S., Tom and SomeGuy, who all arrived at my blog after searching for information about one or more of the IP address in the ranges 64.124.203.xx, 207.46.x.x and 208.50.101.x. I posted some time ago about my site being scraped by 208.50.101.153 and since then, numerous people have ended up here as a result of looking for information. Now, thanks largely to these guys (who have noticed similar behaviour on their own sites or access to unindexed or protected areas of their sites by IP addresses in these ranges), we can provide some answers.

Observations of Strange Behaviour

Outlined below is the ‘strange’ behaviour that we have observed in relation to these IP addresses.

Site scraped

My first encounter of these IP addresses was when my site was scraped by 208.50.101.153.


208.50.101.153 - - [25/Mar/2009:13:47:09 +0000] "GET /wp/index.php/archives/90/ HTTP/1.1" 301 483 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3"
208.50.101.153 - - [25/Mar/2009:13:47:10 +0000] "GET /wp/index.php/archives/90 HTTP/1.1" 200 30958 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3"
208.50.101.153 - - [25/Mar/2009:13:47:16 +0000] "GET /wp/index.php/archives/91/ HTTP/1.1" 301 483 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3"

This sample log show a portion of the scrape from 208.50.101.153. You can clearly see that it is not identifying itself as an MSNBot server, in fact there is no link to MS other than the fact it claims to be IE 6 running on (I believe) Win2K.

Unexpected HTTP GET’s

This behaviour was the main indicator for everyone else, and took the form of specific files being requested from protected or un-indexed areas of their websites used mainly for filesharing with family and friends. A request would be made for one of these files, and shortly after, another request for the same file would come in from one of the IP addresses above. In my testing, I was able to partially recreate this behaviour, but in my case, the requests clearly identified themselves as being msnbot and they were seemingly part of a normal indexing of the site.


x.x.x.x - - [14/Apr/2010:19:42:25 +0100] "GET /brsat1.jpg HTTP/1.1" 200 15780 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
x.x.x.x - - [14/Apr/2010:19:42:26 +0100] "GET /favicon.ico HTTP/1.1" 200 17795 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
207.46.13.132 - - [14/Apr/2010:19:50:09 +0100] "GET /robots.txt HTTP/1.1" 404 469 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
207.46.13.132 - - [14/Apr/2010:19:51:09 +0100] "GET / HTTP/1.1" 200 7259 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"

This snippet from my server log clearly shows my initial request for brsat1.jpg via IE with SmartScreen enabled followed some minutes later by the requests from MSNBot. The same file was requested from another domain without SmartScreen enabled and it was not indexed by MSNBot.

In this example, it’s largely irrelevant as it’s a small file put there for testing purposes. But, this behaviour (check out Doug S.’s page about this for some example logs) can result in poor performance for the intended recipient of the file if the file is large and it’s served from a bandwidth restricted server (personal server on an ADSL line for example).

Microsoft’s SmartScreen Filter

The key element to this puzzle appears to be Microsoft’s SmartScreen filtering tool. This features most prominently in Internet Explorer (see SmartScreen in IE for more information), but is also used by other Microsoft software (I have found links to Outlook and, as reported by Doug. S, Hotmail) to perform security checks.

What actually happens when a user with SmartScreen enabled requests a URL with IE is that as well as making a connection to the source server to request the required item, IE also makes a connection to another IP address. This connection can be made using either the HTTP or HTTPS protocols on standard ports.


No. Time Source Destination Protocol Info
30 0.207548 192.168.0.75 213.199.170.72 TCP iclpv-pm > https [SYN] Seq=0 Win=65535 Len=0 MSS=1260
31 0.210847 192.168.0.75 213.199.170.72 TCP iclpv-nls > https [SYN] Seq=0 Win=65535 Len=0 MSS=1260
36 0.249408 213.199.170.72 192.168.0.75 TCP https > iclpv-pm [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1420
37 0.249420 192.168.0.75 213.199.170.72 TCP iclpv-pm > https [ACK] Seq=1 Ack=1 Win=65535 Len=0
38 0.249651 192.168.0.75 213.199.170.72 TLSv1 Client Hello
39 0.252584 213.199.170.72 192.168.0.75 TCP https > iclpv-nls [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1420
40 0.252593 192.168.0.75 213.199.170.72 TCP iclpv-nls > https [ACK] Seq=1 Ack=1 Win=65535 Len=0
41 0.252797 192.168.0.75 213.199.170.72 TLSv1 Client Hello
66 0.309034 213.199.170.72 192.168.0.75 TCP [TCP segment of a reassembled PDU]
67 0.310782 213.199.170.72 192.168.0.75 TCP [TCP segment of a reassembled PDU]
68 0.310798 192.168.0.75 213.199.170.72 TCP iclpv-pm > https [ACK] Seq=103 Ack=2521 Win=65535 Len=0
69 0.312267 213.199.170.72 192.168.0.75 TCP [TCP segment of a reassembled PDU]
70 0.313703 213.199.170.72 192.168.0.75 TCP [TCP segment of a reassembled PDU]
71 0.313728 192.168.0.75 213.199.170.72 TCP iclpv-nls > https [ACK] Seq=103 Ack=2521 Win=65535 Len=0
72 0.315591 213.199.170.72 192.168.0.75 TCP [TCP Dup ACK 67#1] https > iclpv-pm [ACK] Seq=2521 Ack=103 Win=65433 Len=0
73 0.355127 213.199.170.72 192.168.0.75 TCP [TCP segment of a reassembled PDU]
74 0.356003 213.199.170.72 192.168.0.75 TLSv1 Server Hello, Certificate, Server Hello Done
75 0.356036 192.168.0.75 213.199.170.72 TCP iclpv-pm > https [ACK] Seq=103 Ack=4510 Win=65535 Len=0
76 0.356444 192.168.0.75 213.199.170.72 TLSv1 Client Key Exchange, Change Cipher Spec, Encrypted Handshake Message
77 0.358103 213.199.170.72 192.168.0.75 TCP [TCP segment of a reassembled PDU]
78 0.358883 213.199.170.72 192.168.0.75 TLSv1 Server Hello, Certificate, Server Hello Done
79 0.358894 192.168.0.75 213.199.170.72 TCP iclpv-nls > https [ACK] Seq=103 Ack=4510 Win=65535 Len=0
80 0.359444 192.168.0.75 213.199.170.72 TLSv1 Client Key Exchange, Change Cipher Spec, Encrypted Handshake Message
81 0.406154 213.199.170.72 192.168.0.75 TLSv1 Change Cipher Spec, Encrypted Handshake Message
82 0.407434 192.168.0.75 213.199.170.72 TLSv1 Application Data
83 0.412782 213.199.170.72 192.168.0.75 TLSv1 Change Cipher Spec, Encrypted Handshake Message
84 0.414096 192.168.0.75 213.199.170.72 TLSv1 Application Data
85 0.414169 192.168.0.75 213.199.170.72 TLSv1 Application Data
86 0.474276 213.199.170.72 192.168.0.75 TLSv1 Application Data
87 0.474331 192.168.0.75 213.199.170.72 TCP iclpv-pm > https [ACK] Seq=1330 Ack=5400 Win=64646 Len=0
88 0.474545 192.168.0.75 213.199.170.72 TCP iclpv-pm > https [FIN, ACK] Seq=1330 Ack=5400 Win=64646 Len=0
91 0.495581 213.199.170.72 192.168.0.75 TCP https > iclpv-nls [ACK] Seq=4553 Ack=1352 Win=65535 Len=0
92 0.497943 213.199.170.72 192.168.0.75 TLSv1 Application Data
93 0.497973 192.168.0.75 213.199.170.72 TCP iclpv-nls > https [ACK] Seq=1352 Ack=5400 Win=64646 Len=0
94 0.498041 192.168.0.75 213.199.170.72 TCP iclpv-nls > https [FIN, ACK] Seq=1352 Ack=5400 Win=64646 Len=0
95 0.518246 213.199.170.72 192.168.0.75 TCP https > iclpv-pm [ACK] Seq=5400 Ack=1331 Win=65535 Len=0
96 0.542349 213.199.170.72 192.168.0.75 TCP https > iclpv-nls [ACK] Seq=5400 Ack=1353 Win=65535 Len=0

This packet capture from WireShark clearly shows the connection my machine made to one of the SS servers whilst it was downloading a file from my test website using IE with SS enabled.

The Global Crossing Link

We’re not entirely sure where Global Crossing fit into all of this except that some of the IP address ranges involved are listed as belong to Global Crossing or other companies that are associated with them. There is also some history between Global Crossing and Microsoft (Global Crossing Cuts Costs, Unifies Communications with Integrated Solution) and well, Global Crossing do provide enterprise grade co-location (Global Crossing Enterprise Co-location), so it’s entirely possible that Microsoft are using Global Crossings services. Without more information, to imply anything else would be pure speculation.

IP Addresses Linked To SmartScreen

The following IP addresses are, we believe all linked to SS in some way, either as a server that IE (or other SS clients) connect to during requests or as a server that requests content from websites. Those with *’s have been confirmed by Mircosoft thanks to Tom who opened a ticket with them about this issue.

  • 208.50.101.152
  • 208.50.101.153*
  • 208.50.101.154*
  • 208.50.101.155*
  • 208.50.101.156*
  • 208.50.101.157
  • 208.50.101.158*
  • 64.124.203.71*
  • 64.124.203.72*
  • 64.124.203.73*
  • 64.124.203.74*
  • 64.124.203.75
  • 64.124.203.76*
  • 64.124.203.77*
  • 64.124.203.78*
  • 65.54.95.28
  • 65.54.95.30
  • 65.55.195.250
  • 213.199.170.72
  • 213.199.177.156

As well as the addresses above, the following address have been found to be linked to SmartScreen activity. Thanks to Aaron for these.

  • 74.217.148.72
  • 74.217.148.73
  • 74.217.148.75
  • 74.217.148.76

As well as the addresses above, the following addresses have been found to be linked to SmartScreen activity. Thanks to Bloodclaw for these.

  • 74.217.148.74
  • 74.217.148.78

Work Around

This will only work if your server is running a case sensitive operating system such as Linux. The SS servers make their request using purely lowercase characters. Including uppercase in the filenames will prevent SS from obtaining the file as your server will hit them with a 404 error (as stated though, your server must be running a case sensitive OS). This will of course only work until the SS developers fix their bot to have it use the actual filenames and not just a lowercase version.

The other option to prevent access to these files is to drop them in a protected directory which requires authorisation. This is the official line from Microsoft in a response Tom received to the ticket he opened with them.

Update – 19th August 2013 – Bloodclaw emailed me with some new IP addresses for the list. He has also suggested this work around.

“I also noticed they all have their referer set to http://temp.com, I don’t know why Microsoft would do that and that may be temporary as the domain name suggests, but if other people are getting the same type of requests with that same referer it can easily be blocked on both Windows and Linux”

Related Links

Related External Links

Both comments and pings are currently closed.

Bad Behavior has blocked 153 access attempts in the last 7 days.