September 30th, 2005
Todays word is chagrin
cha·grin
n.
A keen feeling of mental unease, as of annoyance or embarrassment, caused by failure, disappointment, or a disconcerting event: To his chagrin, Michael’s spelling and typing let him down again!
- Category:
- Uncategorized
- Comments:
- No Comments »
September 30th, 2005

Where will rambling Ronan turn up next
- Category:
- Uncategorized
- Comments:
- 3 Comments »
September 29th, 2005

Where will rambling Ronan turn up next
UPDATE. Changed picture to
“original”; although I can’t remember Satan running in a city centre ward. I thought the undead were limited to voting only on the North Side!!!
- Category:
- Uncategorized
- Comments:
- 4 Comments »
September 29th, 2005
Got this in my inbox today
There appears to be a problem on this page of your site.
On page http://www.autoschism.com/2005/09/take-off-your-glasses.html
when you click on “astigmatism”,
the link to http://http/www.allaboutvision.com/conditions/astigmatism.htm
gives the error: Domain name lookup failed (may be a transient error).
As recommended by the Robot Guidelines, this email is to explain
our robot’s visit to your site, and to let you know about one of
the problems we found. We don’t store or publish the content of
your pages, but rather use the link information to update our map
of the World Wide Web.
Are these reports helpful? I’d love some feedback. If you prefer
not to receive these occasional error notices please let me know.
Roy Bryant
Initially I thought this was spam. But the link was broken (I had left out the ‘:’ in “http://”, so full marks for identyifying a problem but 0 marks for identifying the actual problem or a solution). And
SevenTwentyFour are a real comany with their own website and everything.
Apparently there is such a thing as the
Robot Guidelines. Not that you could tell that from
the SevenTwentyFour website, which does like to mention that
they confrom to the Robot Guidelines, but doesn’t give a http link to them. Who would have thought that a company which makes some of its money by policing sites for broken links couldn’t provide a
simple link to the set of guidelines they are operating under. Its
http://www.robotstxt.org/wc/guidelines.html
Here is a sample of some of the guidelines
Be Accountable
If you do decide you want to write and/or run one, make sure that if your actions do cause problems, people can easily contact you and start a dialog. Specifically:
- Identify your Web Wanderer
- HTTP supports a
User-agent
field to identify a WWW browser. As your robot is a kind of WWW browser, use this field to name your robot e.g. “NottinghamRobot/1.0″. This will allow server maintainers to set your robot apart from human users using interactive browsers. It is also recommended to run it from a machine registered in the DNS, which will make it easier to recognise, and will indicate to people where you are.
- Identify yourself
- HTTP supports a
From
field to identify the user who runs the WWW browser. Use this to advertise your email address e.g. “j.smith@somehwere.edu”. This will allow server maintainers to contact you in case of problems, so that you can start a dialogue on better terms than if you were hard to track down.
- Announce It
- Post a message to
comp.infosystems.www.providers
before running your robots. If people know in advance they can keep an eye out. I maintain a list of active Web Wanderers, so that people who wonder about access from a certain site can quickly check if it is a known robot — please help me keep it up-to-date by informing me of any missing ones.
- Announce it to the target
- If you are only targetting a single site, or a few, contact its administrator and inform him/her.
- Be informative
- Server maintainers often wonder why their server is hit. If you use the HTTP
Referer
field you can tell them. This costs no effort on your part, and may be informative.
- Be there
- Don’t set your Web Wanderer going and then go on holiday for a couple of days. If in your absence it does things that upset people you are the only one who can fix it. It is best to remain logged in to the machine that is running your robot, so people can use “finger” and “talk” to contact you
The SevenTwentyFour robot is called LinkWalker. And while it does appear to conform to the guidelines, the information being gathered is used to solicit business through email – which is a business practice and so not covered in the guidelines (except for the part which can be paraphrased as “be nice, share”) – and so I was probably right in my initial judgement that the email is spam.
Share Results
OK, so you are using the resources of a lot of people to do this. Do something back:
Keep results
This may sound obvious, but think about what you are going to do with the retrieved documents. Try and keep as much info as you can possibly store. This will the results optimally useful.
Raw Result
Make your raw results available, from FTP, or the Web or whatever. This means other people can use it, and don’t need to run their own servers.
Polished Result
You are running a robot for a reason; probably to create a database, or gather statistics. If you make these results available on the Web people are more likely to think it worth it. And you might get in touch with people with similar interests.
Find out more about robots at the web robots pages and here.
- Category:
- Uncategorized
- Comments:
- No Comments »
September 28th, 2005

Where will rambling Ronan turn up next?
- Category:
- Uncategorized
- Comments:
- 3 Comments »
September 28th, 2005
That thing with Kate Moss is all just coke and mirrors.
UPDATE: in response to the comments I must point out that the above throw-away comment isn’t all my own work and I would like to thank all of those who took time out of their busy schedule to come up with it
- Category:
- Uncategorized
- Comments:
- 8 Comments »