Odd entries in web log

riley

Perch
I didn't know where to post this, so I put it here.

Over the last couple months I've seen entries like this in my log files (I obfuscated my domain name *****):
Code:
date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken
2004-04-24 18:18:10 66.36.229.77 GET / - 80 guest 168.75.177.2 Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+NT+5.0) - - [url]www.***********.com[/url] 401 1 1326 1812 237 109
There are a few for each of my domains and the client was refused access with a 401 error in each case, but I'm concerned about the persistance of the attempts. They are all from the same IP address: 168.75.177.2

Arin shows the following for that address:
Code:
OrgName:    ClearBlue Technologies
OrgID:      CLEAR-1
Address:    125 Elwood Davis Road
City:       Syracuse
StateProv:  NY
PostalCode: 13219
Country:    US

NetRange:   168.75.0.0 - 168.75.255.255
CIDR:       168.75.0.0/16
NetName:    NAVI-A84B0000-16-0
NetHandle:  NET-168-75-0-0-1
Parent:     NET-168-0-0-0-0
NetType:    Direct Allocation
NameServer: NS1.APPLIEDTHEORY.COM
NameServer: NS2.APPLIEDTHEORY.COM
NameServer: NS3.APPLIEDTHEORY.COM
Comment:
RegDate:
Updated:    2004-02-26

Has anybody else seen log entries like this?

riley
 
After investigating this IP address I have discovered that it is a bad bot (misbehaving spider). I was not able to find out what information this bot looks for; it might be completely harmless and ligitimate, or it could be harvesting email address for spam lists. But the way it tries to access web sites is clearly out-of-bounds for any bot these days.

Bots are supposed to look for and read a robots.txt file in the site's root. In that text file, you can disallow folders and pages; i.e., tell the bot where you don't want it go.

Short of using the robots.txt file, a bot should at least look for meta tags in the default page that opens when it accesses the site. The meta tags can direct the bot to index or not index (process or ignore) the page and direct the bot to follow or not follow any links that might be found in the page.

This bot does neither. Instead it simply tries to browse the root folder. This behavior is unexceptable by today's standards. But with "folder browsing" turned off on the Windows server, the bot gets a "401 Not Authorized" error (as you can see in the log listing in the previous post), so there is no problem. Of course, if you have requested that tech support enable "folder browsing" for your site, this bot will browse its way through your site. On a Linux server, you can ban this IP address with the .htaccess file, but I don't know how to (or if you can) do that on a Windows server. This cannot be done via any Scripting code (ASP, ASP.Net, etc.) because the bot is browsing the folders, not executing any scripts.

I hope that sheds a little light on the issue.
riley
 
Just always have folder browsing disabled.
If you for some odd reason need to browse a folder, make a script that lists all the files in the directory.
 
riley said:
On a Linux server, you can ban this IP address with the .htaccess file, but I don't know how to (or if you can) do that on a Windows server.

Jodohost should BAN THE SUCKER!!!! for us all hehe
 
Back
Top