Scraped by 208.50.101.153
This is a bit of a long shot… if you are a regular visitor with your own blog… do you have any information about this IP address (208.50.101.153)?
The reason I’m asking is because it’s linked to a company called Global Crossing, and today at around 11am server time, this address and at least one other connected to Global Crossing scraped my site… absolutely every link was followed.
So… if you know anything about this IP address and they’ve visited you’re site, I’d love to know about it. And of course, if you are anything to do with Global Crossing… why in the hell have you scraped my site? Or, if you are the person who scraped it, why? I shall be googling for my own content, and if I find it reproduced anywhere without my permission, I’ll be coming down on you with the full force of the copyright laws available to me… given these IPs reside stateside, the laws are pretty dam good. You have been warned!
Update:- Well, well…. another scrape… at the moment it’s only partial, this time from 78.159.112.96 a company called netdirekt (German), and who are they partnered with… Global Crossing!
Tags: Microsoft, SmartScreen




Global Crossing is like a telecomunications company. Are you sure they’re not just the parent company of your service provider or something? Maybe they’re scraping sites to see if you’re obliging with the legal terms of use and so on. Still not nice, but probably less sinister than what you originally thought.
Could be, but there are too many occassions (in my opinion at least) where sites are scraped. I’ve got about 20 subnets in a firewall ban list for scraping the site. If they are a search engine or, as you suggest, some robot checking I’m complying with T+Cs, then they should provide a meaningful user agent… not something that looks like any other browser agent.
Anyhow… I’ll just keep banning the scrapers until my ISP tells me to unban them
This IP and 3 others related to it have hit my site on Saturday nights. My site hasn’t even been rolled out yet. It’s not on any search engines as far as I know. I don’t know what it is.
208.50.101.152
208.50.101.153
208.50.101.154
208.50.101.155
Well I’m no nearer an answer. Since I posted this, I’ve banned a whole bunch of other subnets that were scraping the site with what were claiming to be normal user agents like IE and Opera. Since I have a dedicated server, it’s easy enough for me to just ban the range which is what I’ve been doing if the range relates to a hosting company.
With dynamically allocated IPs such as broadband addresses it’s a bit trickier as you could be banning potential visitors.
As for them hitting your site without it being on a search engine… they may well be using domain WHOIS data as the source of their list or you may even find that your domain registrar is partnered with them.
Many of my family and friends are not very computer savy. Sometimes to share big files, such as many pictures or video clips, I will put them in a hidden spot on my web site and send them a link to the directory with instructions to highlight the file and “right click” and then select “save taget as” to get the file to their computer (windows based method).
I just had a case where 208.50.101.152 issued only one HTTP GET request only 9 minutes after my family member fetched a video clip. I can only think of one way that 208.50.101.152 could have aquired the path name, via a tracking cookie on my family members computer. I have tried some experiments to figure things out in more detail, so far unsuccesful.
I have had similar experiences (well, 3 times now) from: 208.50.101.153 and 208.50.101.154 and 207.46.164.13 and 207.46.164.20 and 64.124.203.71 and 64.124.203.76. They all will issue an HTTP GET command directly to the path without any preceeding directory listing and always shortly (anywhere from minutes to a couple of hours) after the real intended person got the file. On purpose, I now use some upper case letters in the file name and path because these undesired “GET” requests are always converted to lower case, and therfore they get a 404 (Not Found) respose.
Interesting information, thanks Doug. I’d be interested to know about any more of your findings regarding these addresses.
We’re seeing the same behavior from the same IP addresses (208.50.101.152, 153, 154). Any idea what’s going on?
-Tom
Well, this is quite interesting…
If you use the ARIN IP WHOIS and check out the 207.46.x.x addresses, they come back as this range… 207.46.0.0 – 207.46.255.255. This belongs to Microsoft. The 208.50.x.x addresses, they come back as this range… 208.48.224.0 – 208.50.127.255. This belongs to Global Crossing. And the 64.124.x.x addresses come back as 64.124.0.0 – 64.125.255.255. This belongs to Abovenet Communications.
Could they be some kind of content check (for piracy perhaps), or some kind of proxy? Do the user agents give any clue as to the purpose of the requests?
I have had the same exact experience as Doug S… I gave a link to a friend to download some files, and minutes later, the 208.50.101.154 and .154 ips went to grab the same 2 files that were downloaded… since this whole dir is password protected, theres no way it could have been indexed by a search engine, and when I checked all my logs for this ip this is the first time it came up, and the only URLS it attempted were the 2 direct URLS to the files downloaded… the only thing I can think of is that this a sign of some spyware / trojan on my friends machine.. what legit company would intercept downloads of files that they have no idea the contents on… when i nslookup the IPs they dont resolve but if you resole 208.50.101.3 .4 .5 .6 it leads to .brady.com which supposedly is some contractor.. perhaps its just resold IP space as .83 lets to swgci.elliott-turbo.com, could .153 and .154 be leased space deliberately used without a DNS record to leave minimal trace as they may be used for malicious purposes?
I’ve made some progress figuring this out, but I’m still not 100% convinced it’s legit.
First, I noticed this only happens with Internet Explorer, and not Firefox.
Second, I noticed (with Sysinternals TCPview and WireShark) that when downloading one of my files, my Internet Explorer makes a connection to a Microsoft IP address (65.55.13.61, 65.54.225.100, others) on port 443 (SSL).
I found some references to 65.54.225.100 in the Internet. One forum thread talks about it being a Microsoft SmartScreen (Phishing) filter server address. A Microsoft document about Trojan:Win32/Vundo.BH and another
website about W32.Vundo/MS Juan Trojan Virus name it as a rogue server that is used to distribute ads or commands to this trojan! How odd that Microsoft would name one of their own IP addresses as a server that serves up ads to malware?
To test the SmartScreen (Phishing) filter theory, I disabled SmartScreen in IE (Tools, Internet Options, \Advanced\ tab, scroll to Security section, un-check \Enable Smart Screen Filter\. Indeed, that prevents the mystery connections from downloading my files after I download them.
So clearly the mystery connections that are downloading my files are learning of the URLs from the Microsoft SmartScreen filter.
What I’m not sure of is whether this is normal behavior for Microsoft to sample our downloaded files from the web server after we’ve downloaded them, or if this is some kind of hijack that is piggybacking on the data sent to the SmartScreen server.
I have NOT been able to get this to happen on any of three other machines in our office, and they all have SmartScreen enabled. I noticed that they are all using SmartScreen servers other than 65.54.225.100, so maybe only certain SmartScreen servers have this behavior?
Still working on it. If anyone is interested in trying this, please let me know if disabling SmartScreen makes this mystery download stop for you too. And let me know which Microsoft server your IE is hitting if you can.
-Tom
Note: I have since found a second machine that does this, and it is not using the same SmartScreen filter that my PC uses. It is at a different company, so I don’t suspect a common infection on both machines.
-Tom
(sorry, I meant “…not using the same SmartScreen server IP that my PC uses…”)
Tom,
Great info… just wanted to confirm Im understanding properly: You using the same URL to a file download on multiple PCs with SmartScreen on, correct? And only 2 machines trigger the phantom GET ?
If so, that is very weird.. even if using different SmartScreen servers, why would they work off different back-end databases.. that wouldn’t make any sense as each if individual server held its own information, then all the information from all the other servers would have to relearn what every other server has learned and would be redundant work, and also minimize the effectiveness of the SmartScreen system as a whole, as a user could be visiting a site which one SS server knows as unsafe, but another SS server doesnt know about…
Ok, I’ve just been trying some of this stuff (thanks for all this info btw guys, great job).
I fired up IE, turned on SmartScreen and downloaded a single file from one of my websites. Following the download something made a connection to 65.54.81.157 and 65.54.81.162. Again, this range (65.52.0.0 – 65.55.255.255) belongs to Microsoft. I then started monitoring the access logs on the web server and as yet (over 5 minutes after the original download) there have been no other connections to the webserver.
I just tried another couple of files, from the same server, and this time the connection was to 213.199.161.250 (again, this belongs to Microsoft in this range: 213.199.160.0 – 213.199.191.255). If I turn off screen filter and try another couple of files, there are no additional connections.
Either way though, I have not seen any extra connections to the webserver so I would have to say, the additional connections you guys are seeing are a little bit weird.
Brady.com rings a bell with me for some reason… I think I ended up there when I was initially investigating the scraping of my site, and my initial reaction was that it just didn’t seem leggit for some reason. But, I can’t be sure… I have a sneaking suspicion that I actually ended up there investigating someone who was posting on the site… trying to decide if their post was legit or not, and after seeing the site I canned it.
My philosophy with alot of this is that if these sites don’t provide a nice meaningful user agent, I block HTTP and HTTPS for the whole IP range using iptables, but only if I can guarantee the range is a data center.
If any of you guys want me to try anything specific (like downloading some files etc. or maybe you want to try downloading something from my server so we can check whether we get these additional connections), drop me a mail to ‘athena at outer hyphen reaches dot com’.
Just an update: Here is a complete list of the IP addresses that we have noticed downloading our files after a SmartScreen lookup so far…
208.50.101.153
208.50.101.153
208.50.101.154
208.50.101.155
208.50.101.155
208.50.101.156
208.50.101.158
208.50.101.158
64.124.203.71
64.124.203.72
64.124.203.73
64.124.203.74
64.124.203.76
64.124.203.77
64.124.203.78
64.124.203.78
Very intersting stuff regarding this SmartScreen.. might install IE to do some investigation.. Im very surprised that there isn’t any other information regarding this on the web, and also surprised that MS would do this as if they are intercepting everyones DLs it would be huge negative publicity regarding the privacy implications of this.. also strange how a company as big as MS would be leasing hosting in which their class C network is being shared with other sites…if we can get a more concrete picture of whats going on this kind of story would make slashdot front page…
Also wondering, for the machines that can replicate this behavior with SmartScreen installed, i wonder if there are any registry traces of these IPs ? Im wondering if SmartScreen does some legitimate transfer of URLS to a MS server, but if say these addresses are stored in registry, some malicious program may have modified the IPs that SmartScreen sends to, effectively intercepting the URLS that were meant for legitimate use.. I am not using IE so havent tested yet, but just throwing ideas out there..
That is an interesting question…. when I first tried this, IE made connections to a 65 address (which is in the US), then when I tried again, it made connections to a 213 address (which is in the UK, my home). After that, all requests went to the UK address so there should be some storage of this somewhere.
Intersting postings since mine the other day.
I tried enabling SmartScreen on a couple of computers, but did not get any undesired HTTP GET afterwards.
I asked the person that downloaded the file the other day and that was followed by the “HTTP GET” from 208.50.101.152, and they had SmartScreen disabled.
I think I had a case on March 28th where the browser was Safari on a MAC, however I am not certain as the filepath/name was all lower case and there were some other odd things going on at the same time.
I still have been unable to recreate the situation. I have been trying.
Doug S,
Wow that would really throw a wrench into what we’ve found out so far, if that person that triggered the phantom GET had SmartScreen disabled… did you get a chance to explain where the exact setting is in IE, as Im wondering if its possible that the person just wasn’t aware that they had it on and thought it was some external software that they would have to DL to use, rather than something built into IE ?
Tom: When I had SmartScreen enabled, IE went to 65.55.195.250 Port 443 and 65.54.95.30 or 65.54.95.28 port 80
hmm.. googled around and found some statements about SmartScreen:
“Some information about files that you download from the web such as name and file path may also be sent to Microsoft.” – http://www.ghacks.net/2010/03/08/how-to-disable-the-smartscreen-filter-in-internet-explorer/
“Checks software downloads against a dynamically updated list of reported malicious software sites.” – http://www.microsoft.com/security/filters/smartscreen.aspx
..so this looks like this is MS admitting that they do this… I suppose they might obfuscate their IPs and domain names so that a guy with a malicious site cant just block *.microsoft.com or microsoft commonly known Class B networks and effectively easily sidestep MSes whole anti phising infrastructure…
Also, for the guys getting the mysterious GETs… were you hosting files and giving people a dynamic dns hostname, in which the IP doesnt have a reverse DNS record pointing to the hostname you gave out ? IE: say your true ISP hostname is 1-2-3-4.isp.net and the IP is 1.2.3.4, when you nslookup 1.2.3.4 it resolves to 1-2-3-4.isp.net but since its a dynamic ip, you have a dynamic dns host like dynamic.dyndns.org, and gave the dynamic.dns.org name out for ppl to download from ?
I am doing this and I got the phantom GETs.. the reason i ask is because of:
“Use the fully-qualified domain name. All domains should reverse to actual domain names, not numeric IP addresses. This means a URL should look like “microsoft.com” and not “207.46.19.30.”” – https://phishingfilter.microsoft.com/PhishingFilterFaq.aspx
Although they are using a lesser restrictive definition, in SMTP/sendmail world, it is a very common check for authenticity to make sure that the reverse DNS of the IP address is the actual hostname you are giving out, otherwise many email servers will not accept your email.. this affects people using dynamic dns services.. I am thinking that Microsoft might be using this same logic to elevate your site as “potentially suspicious” and thus going the extra step of triggering these phantom GETs.. ie someone is browsing your .dyndns.org with SmartFilter, then smartfilter sends the URL to MS, and MS takes the IP, checks the reverse dns for it, and sees that it is 1-2-3-4.isp.com and it says, “hey, the hostname in the browser is blah.dyndns.org, not 1-2-3-4.isp.com, something phishy might be going on, lets check to make sure this guys DLs are safe”, and then triggers the phantom GET to try to check to make sure the file is ok…
Wow… you really have been busy on this, I’m sure everyone will appreciate the information.
The only weird thing about the dynamic DNS theory is the fact that some of these requests don’t appear to be coming from Microsoft ranges. The 208.50 range for example is a Global Crossing range.
definitely agreed that theres alot of open questions and so far nothing is definitive…just trying to throw out ideas…
regarding the non MS IPS doing the scans, i was thinking that perhaps it is possible that MS was leasing servers outside their known domains and IP ranges so that people hosting spam sites cant just firewall out *.microsoft.com or all the known MS Class Bs, like 207.46.*, 65.55.*, etc – otherwise it would be very easy for spammers to evade being analyzed by MSs anti-spyware scans…kind of like police doing undercover work..
You could be spot on, and it would make sense for them to do that for the exact reasons you’ve stated.
it also looks like there is known business relationships between GlobalCrossing and Microsoft : http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=4000003561
it looks like MS did some work regarding the software infrastructure at GlobalCrossing…
And GlobalCrossing offers services such as hosting/colo : http://www.globalcrossing.com/enterprise/colocation/colocation_landing.aspx
it could be possible that MS needed to host some of their SmartScreen/AnitPhising infrastructure outside of their own network, and since they had a good relationship with GlobalCrossing, they got them to host it…
I was wondering since Athena was not experiencing phantom GETs on your server, I am wondering is that because your site is already popular and indexed (the scrape) enough on the web so that its been put into a “safe site” list, negating the need to do checks on downloads (which would be heavy cost on bandwidth/storage resources and would preferably be avoided if not necessary) – IE maybe the scrape was some scan from MS and it determined that your site has enough content that was deemed safe enough so that it put you in a known safe category, and didn’t find the need for it to check downloads from your site…
Like in my situation I don’t have my site on search engines, and I just newly created the domain name I gave out for my friend to DL some files from, so it would have been a completely new site that couldn’t have been in any existing known site lists from MS, so maybe due to the lack of information on my site, and possibly that my reverse DNS doesnt match the domain name I gave out, maybe that was sufficient condition for MS to trigger the phantom GET..
I’m going to give this theory a try as soon as I can (most likely tomorrow evening). I have a couple of domain sitting around doing nothing so they won’t feature anywhere… I’ll see what happens and let you guys know.
Hi,
Yes, I explained clearly the setting location for SmartScreen, including screen shots with photoshop added red arrows pointing to the “Tools” and “internet options” and….
I am certain it was off for my case from the other day.
My web site is not third party hosted and reverse lookups work as expected.
Also, it has been indexed on several search engines for several years.
The hidden directory from my case the other day was just created a couple of hours before the strange “GET”.
For all of my tests, I have been making new hidden directories for each test.
\SomeGuy\, so far I’m on-board with the theory that this is Microsoft doing the checks from non-Microsoft IP addresses so as not to be easily filtered. That’s also why I assume they are spoofing their user agent (it wouldn’t make much sense to call it \SmartScreenBot\ because then the bad people would just write scripts to show the Bot a clean and legal version of the website).
It seems they could do a better job at disguising their identity though… given that (assuming this theory is correct) we discovered a dozen of their \undercover\ ip addresses accidentally!
For what it’s worth, our web server is on a static IP, with DNS 208.109.172.79 ip-208-109-172-79.ip.secureserver.net. Also, several dozen of my tests were all in the same single folder, but all tests were with different filenames. So they’re not shy about testing the same folder twice.
-Tom R.
I still have not been able to re-create the undesired “GET” requests. I have gone back to two people that had previously created the situation.
Tom: You have been able create the situation and even switch it on and off. Would you be willing to try to create the situation on my web site?
If no, no problem. If yes, read on (and I realize that is not a well controlled test, since this forum in open to all):
Test 1: Purpose: To see if Tom’s method creates the senario on my web site:
Setup:Created new user, new folder, new file. Uses upper case characters.
1.) Go to http://www.smythies.com/~richard/A/
2.) In the directory listing see a file. Highlight the file and “rick click”.
3.) select “save target as” and put the file somewhere on your computer (Just delete it later on)
I will observe my logs.
Test2: Purpose: The original posting reported following every link. Embed a link and see if it is followed; Also embed e-mail addresses and see if I ever see those addresses in the mail logs (I.E. is e-mail address harvesting involved?). Uses only lower case characters.
1.) Go to http://www.smythies.com/~richard/b/
2.) In the directory listing see two files. Highlight and normal click the first one (a). Do not follow the embedded link. (Note: it is on purpose that I did not give the entire link path herein.)
I will observe my logs, and over the next few weeks my mail logs.
Disclaimer: My site is pretty basic. If you decide to look around, prepare to be underwhelmed.
… Doug S.
Apologies for the delay in authing this one, I’ve been out and about on a study session.
Im very interested to see the results of Athenas and Doug S’s tests…
Still, I am a bit disturbed about how Doug S received those phantom GETs, while the users that legitimately DLed the files stated that they definitely didn’t have SmartFilter on…
Unfortunately I’ve not been able to get around to trying my tests yet, but what I’m proposing is to setup to of my idle domains… put the same file on both and then download from the first one with SmartScreen enabled. Then do the same on a new browser session with it disabled. I’ll watch the network traffic on both and then keep an eye on the server logs. I also need to make sure that when I do this, I don’t have any blocks on my firewall which will prevent these extra connections.
When I’ve done it, I’ll let you all know what I find. I’ve been thinking too, that when we’ve got a handle on whats going on, I’ll write it up on my Wiki (I’ll need some nicknames for you all so credit can be given to you guys for figuring all this out) so that it’s easier to follow than a bunch of blog comments.
Ok, I’ve done some tests.. here are the results so far…
I basically activated two of my dormant domains, set the hosting up for them and uploaded a file to each. The file was a small JPG. Starting with SS disabled, I downloaded the JPG from domain 1. That was 23 minutes ago. There were no spurious connections in the TCP capture and there have been no other connections to that domain from anywhere.
Then I enabled SS and I downloaded the same JPG form domain 2. There was an HTTPS connection to 213.199.170.72 which is registered to MS Internet Datacenters with a comment that it is used by the european IDC’s. That was 20 minutes ago. 8 minutes after that, MSNBot connected from 207.46.13.132 and requested robots.txt. This was 404’d obviously. 1 minute later, MSNBot connected again and did a get on /, so I’m guessing it received the standard Plesk index page for a newly created domain.
Then I got to thinking… what if it only really checks certain file types… so I uploaded an EXE to domain 2 and downloaded that with SS enabled. Again, there was a connection to 213.199.170.72. That was over 10 minutes ago, and as yet there have been no other connections to the server. When I downloaded the EXE, the download dialog did include the statement that “SmartScreen has checked this download and did not report any threats” along with the option to report an unsafe download.
I’ll keep my eyes on the server logs, but I certainly don’t appear to be getting the same activity occurring as you guys are. Maybe the european SS isn’t as thorough as the US one??? I don’t know.
Quick update… there has been no further activity on my server other than stuff I initiated. I even created a basic web page, renamed the EXE to.mp3.exe and include a link to that in the web page. Nothing. There was a bit more traffic between IE and the SS server (this time at 213.199.177.156 – IP address was queried… DNS query for urs.microsoft.com which returned a CNAME of urs.microsoft.com.nsatc.net. This domain is registered to Level3 Communications – level3.com – who appear to have a content delivery network… amongst other things).
Ok, here’s the answer we’ve been looking for…
I opened a ticket with Microsoft, and finally got an answer from their Policy and Risk for Windows Live Safety Platform team. They confirmed that the non-Microsoft IP addresses I provided (208.50.x.x and 64.124.x.x) are indeed related to SmartScreen!
He went on to say “Our SmartScreen® filter technology accesses publicly available files and analyzes those for malware. If you need to prevent our system from accessing these files, we recommend that you require authentication.”
I’m still not sure why some files don’t get scanned (Athena, all of my files were .zip files, so you might want to try that. Also, it sometimes took 5 or 6 hours). But I’m satisfied with their confirmation about the non-Microsoft IP addresses, and I think I can stop worrying about it.
This is definitely a reminder not to rely on “security by obscurity” because as we’ve learned, your web browser (and your DNS provider, and your anti-virus software, etc.) is probably doing more behind the scenes that you realize!
Great work everyone! Thanks for helping get to the bottom of this!
-Tom R.
Acadia Systems, Inc.
I’m at work now, so I can’t check my server logs to see if there was a big delay. I’ll do that later and try a ZIP file too. I have to say, it does seem that the case is solved. I’ll put a page together on my wiki about this and link everyone up. I’ll also let you know if my server has seen any more activity.
Does seem that we have an answer. I can understand them no wanting to publish this information, but when you’re site gets a visit like this from some obscure location, it does kind of get you thinking… certainly I’ve banned a whole bunch of IP ranges that I figured were just scraping the site. All of them are data centers in different parts of the world, so who knows… maybe I’ve blocked access by the SmartScreen servers.
Thanks to all you guys for all the information and for helping to put peoples concerns to bed.
In one of my postings a few days I said “I think I had a case on March 28th where the browser was Safari on a MAC”. I now think the strange HTTP GET was caused by a user using MSIE 8.0. Sorry for any confusion I added.
Also, I have finally tracked down the actual user from an event from February 24th. That user did have the SmartScreen Filter enabled.
My own (as yet incomplete) notes on this subject are at: http://www.smythies.com/~doug/strange_get.html
The explaination for why I had a case where the SmartScreen flter was disabled on IE yet I still got the direct “HTTP GET” from 208.50.101.152: That person uses (Microsoft) Hotmail for e-mail and Hotmail uses SmartScreen technology. See also: http://postmaster.live.com/FightingJunk.aspx . The link also mentions safelists and confidence levels, as “SomeGuy” eluded to above. Tom R.: thanks for your work on this. Athena: Thanks for this blog spot. It was only reference I could find for others having the same issues as me. I have a link to here in my web notes, and I’ll add a link to your Wiki notes when they are done.
I’ve written a couple of wiki pages about all this. I think they sum it up nicely. If you guys can give it a once over for any errors etc. I would appreciate it.
Microsoft SmartScreen
Very good write up, that obviously took considerable time. Thanks for your efforts. I have added a link to your wiki pages to my notes page.
Thanks Doug, and thanks to you and the other guys for all the information. Judging by the search strings that come through, we’re not alone in wanting to know what’s going on with those addresses
So I have one of these on my apache server on windows. And it was right after giving a buddy a link to a file. What i am wondering is that if myself or someone else could write a script or file or for all intensive purposes a “virus” that would stop this behavior from happening. I personally don’t want someone looking at the information on my webserver, especially since i only set it up for my school work and there is nothing of any value to anyone other than me. So just how much trouble could one get into for creating a “virus” and then giving the link to someone who woul have the “kill code” but when this phantom request comes in it shuts off the system doing the requests?
How much trouble could you get in for doing that… quite a lot I suspect
As for stopping this behaviour… don’t let your friends use IE, and if you do, make sure SmartScreen is turned off. Don’t use Hotmail for sending them links etc., because that also uses SmartScreen filtering and can result in the same behavior.
with regards to the ‘workaround’ would it not be possible to create a simple code in htaccess to redirect the requests for those particular IP ranges?
i’ve not got enough knowledge but surely it could be possible to come up with the most common file types (zip, mp3, pdf, exe) and redirect requests to the non-private ones?
any ideas anyone?
tia
actually – hotlink protection might go some of the way in helping:
http://www.askapache.com/htaccess/mod_rewrite-tips-and-tricks.html#prevent-hotlinking
I’m sure there are a multitude of ways to prevent SmartScreen from grabbing the files, but the easiest for most people is to use their hosting control panel (whatever that may be) to create password protected folders.
out of interest – has anyone been able to confirm whether if the smartscreen user accesses the link via an url like http://user:pass@whatever.com whether this is forwarded to microsoft and whether any attempt is made (with or without the password)?
I certainly haven’t made any attempt to test URLs which include the username and password required to authenticate with the server. Maybe you could give it a try and let us know what happens?
Was just revisting your site.. great job on the write up, nice to see a clear and concise page offering everyone an explanation for the seemingly suspicious behavior that many people may have been very worried about!
Thanks SomeGuy. Judging by the search strings I’m seeing, we definitely weren’t the only ones curious about these IP addresses, so the information from everyone has been really useful
Thanks for all the info!! explains a lot. Was puzzled for a few hours about this until I found your blog.
You’re welcome Eran, glad it helped
thanks for this information. I was certainly intrigued when viewing my httpd logs.
Thanks for this usefull info! We got the same GET from 208.xxxx.
Thanks for the info – I found out about it myself and only when I made the connection with SmartScreen Filter, I was able to find this entry. Maybe it is possible to add a few more/other keywords so it can be found more easily? I was google’ing for “lowercase HTTP requests” “MSN” “available.above.net” “browsing history” since that were the only hints I got from the logfile. Once I had “Smartscreen Filter” +lowercase, I was set. Thanks again for the detailed information!
I just put some files on my web server for a friend to download. I saw him begin to download, and within seconds, the same files were downloaded multiple times from these IP addresses. They appear to be coming back for more, so I’m replacing the files with 4 gigabyte piles of random crap for them to suck on.
64.124.203.71
64.124.203.72
64.124.203.73
64.124.203.74
64.124.203.77
74.217.148.72
74.217.148.73
74.217.148.75
74.217.148.76
208.50.101.152
208.50.101.153
208.50.101.156
208.50.101.157
208.50.101.158
Thanks for those addresses Aaron. I’ve added the new ones to the wiki page that’s been used to collect all the information I’ve been given about SmartScreen. That page is here.
I can confirm that 74.217.148.73 and 74.217.148.76 are also IPs that do those secret GET requests
I got abit sick when I saw the GET requests in the logs as I was moving some db files unprotected from 1 server to another. I feel abit better knowing it was a bot and I’m hoping microsoft doesn’t store them and/or go through them manually.
http://whois.arin.net/rest/net/NET-78-0-0-0-1/pft
http://whois.arin.net/rest/net/NET-208-48-224-0-1/pft
this might help
We’ve already checked out these ranges. The 208 range is Global Crossing in the US and if I recall correctly, the 78 range is one of their data centres in Europe. I think we came to the conclusion that the SmartScreen servers are hosted by Global Crossing for Microsoft.
But thanks for the input. It was only as a result of a number of us working together that we got a handle on this.
Regards
Athena
Would just like to confirm that I noticed these IPs also, and a Google search brought me here.
These hits are on a non-public download site we use just between us and our customers, the non-standard port we use (not HTTP/80) isn’t even open all the time, only when needed. Yet within a few minutes of legit downloads, these IPs show up trying to download)
I do recall reading another article from Microsoft that confirms what this thread discovered; SmartScreen will come back to your site to try and download/index some files. Also interesting to note at no time did any referrer look for robots.txt, so don’t bother trying to block that way. Just FYI that in 2013 it’s still doing it, here are the IPs and bogus user agents I found…
208.50.101.153 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+Trident/4.0;+msn+OptimizedIE8;ESMX;+AskTbGLSV5/5.8.0.12304;+Windows+Live+Messenger+14.0.8117.0416)
74.217.148.74 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+6.0;+Trident/4.0;+SLCC1;+.NET+CLR+2.0.50727;+Media+Center+PC+5.0;+OfficeLiveConnector.1.3;+OfficeLivePatch.0.0;+.NET+CLR+3.5.30729;+InfoPath.1;+.NET+CLR+3.0.30729;+.NET4.0C;+AskTbLMW2/5.11.3.15590;+Windows+Live+Messenger+14.0.8117.0416)
Three more doing the same nasty thing yet in 2014-6-19 (my private website is in Europe):
74.217.148.78
208.50.101.151
208.50.101.152