Monday, July 14, 2014

How do I manage website Traffic?

Its a search engine web crawler that most optimistically keep reading resources from our website in form content information. This way it adds up to server load w.r.t to actual regular users. Hence sometimes there is additional load on server due to search engine keeps requesting information.
We can smartly manage or load balanced these traffic with good use of robot.txt, sitemaps and bandwidth throttling.

By using above set of module we can channelized the given traffic and keep updated or educated search engine about our site resources for better search and right information.

To prevent bots from reading unnecessary contents are as follows:-
  1. Images, JavaScript, CSS, layout file
  2. Registration page.
  3. Search Results
  4. .zip,.avi,.docx,.pptx and so on
  5. Pages- that needs- authentication and authorization.
Some examples; user-agent can be browser, search engine..

Disallow access to Images

User-agent: *
Disallow: /images/

Exclude Google Image Search

User-agent: Googlebot-Image
Disallow: /



No comments :