#The quick way to prevent robots visiting your site is put these two lines into the /robots.txt file on your server: # Ce fichier contrôle l'indexation des pages de présentation # Les autres pages peuvent contenir la balise # User-agent: msnbot Disallow: / User-agent: * Disallow: /ARCHIVAGE/ Disallow: /ECORS/ Disallow: /GEO_MAR/ Disallow: /GEO_MAR/LABO/ Disallow: /GEO_MAR/LABO_RESERVE/ #deny Disallow: /GEO_MAR/ECH_COLL/ #deny Disallow: /IMAGES/ Disallow: /SEISCAN/ Disallow: /TRAITEMENT/ Disallow: /en_GB/ Disallow: /cgi-bin/ Disallow: /~ Disallow: /email-addresses/ # voir http://evolt.org/article/rating/18/15126/ #To indicate to robots that certain parts of your server are off-limits to some or all robots: #The first paragraph specifies that the robot called 'webcrawler' has nothing disallowed: it may go anywhere. #The second paragraph indicates that the robot called 'lycra' has all relative URLs starting with '/' disallowed. Because all relative URL's on a server start with '/', this means the entire site is closed off. #The third paragraph indicates that all other robots should not visit URLs starting with /tmp or /log. Note the '*' is a special token; its not a regular expression. #Two common errors: #- Regular expressions are _not_ supported: instead of 'Disallow: /tmp/*' just say 'Disallow: /tmp'. #- You shouldn't put more than one path on a Disallow line (this may change in a future version of the spec) #User-agent: webcrawler #Disallow: #User-agent: lycra #Disallow: / #User-agent: * #Disallow: /tmp #Disallow: /logs