Setting Up Your Website Sitemap

Sitemaps provide a simple way for webmasters to tell search engine crawlers or 'searchbots' which web pages are new or have been updated recently. Whereas robots.txt works in the opposite way, in that it tells bots which files and folders should be excluded from being crawled or cached.

The Role of a Sitemap

A sitemap is a list of URL's and is one of the first pages search engines will automatically visit to speed up the 'caching' or crawling process. It is most commonly an XML file that contains details of each url to be cached, when each file was last updated and how influential each file is relative to other web files. A sitemap is also useful for humans to locate a specific piece of information about a topic or theme. This is particularly useful when searching extremely comprehensive websites with thousands of pages to trawl through. A human sitemap is a basic webpage with a series of links organised in a tree structure.

The concept of a sitemap is to provide a central list of pages for search engine bots to crawl. As new pages are added by webmasters, it is vital that the sitemap file is also updated to reflect the addition or change in content to cache. Search bots may visit several times a day depending upon the frequency a website is updated. The more a sitemap is updated, the greater the likelihood a search bot will return to find out the locations of the new files to cache.

XML is used because it is a simple, flexible and convenience format to collect and parse data. This XML standard (Sitemap 0.9 protocol) format has been agreed by the leading search engines. The XML sitemap not just includes the list of url's on a website but also when they were added, how often they each page is updated and changed and when a search engine should return to re-cache the page. Search bots may ignore instructions in your sitemap and not return until it is convenient to do so. If you have never seen a sitemap you can view our XML sitemap for this site here. The following shows an example of this proper way a xml sitemap is displayed. It only contains 1 URL and uses optional tags:-

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>

There are many free and paid tools for automatically creating a sitemap in minutes. These can help to automatically update your sitemap, each time content is posted. For example, we have used xml-sitemaps.com to build our sitemap on this site. It is one of many simple software tools, that can generate your sitemap in minutes.

There are different types of site maps for different types of content. These can include video, mobile, news or images. So think carefully about which part of your site you need search engines to index, and which files or folders you prefer to be excluded entirely from view. To exclude or 'disallow' some content from being indexed, you will need to update your robots.txt file. This is a fairly straightforward process and you can find out more here.

How Does Fast Indexing Help Your Website?

Speed of crawling becomes more important as more content is added. The faster new web pages are cached in a search engine index, the more chance visitors will find that new page sooner. You can 'ping' search engines after you have created and re-upload your latest sitemap, to say to the bot 'come back here and copy my latest web pages'. There is no guarantee that a search engine is going to include your current content if you do not upload site map. Most bots will find the content eventually anyway, but there is no disadvantage in trying to get your content cached as soon as possible.

If you are publishing a hot the news item or content which is time sensitive, it is essential you have some kind of automatic update of your site map. This ensures your first past the post to your target audience. Likewise, if you are running some sort of dynamic web pages which will change through user updates (such as a forum, blog or advertising system), you need to make sure that these new pages are automatically added to your sitemap. A site map is also useful at identifying historical archived items (which may be difficult for bots to distinguish amongst thousands of other web pages).

It is possible to submit your content directly to search engines own submission web page. However, this is an unnecessary process if you already have links from other websites pointing to your website. One of the primary functions of a bot is to follow every link internally and externally from the website it visits. So the next time a bot finds a link from someone else's website pointing back to yours, the likelihood is it will follow that link to your website and identify any new content from your sitemap. With billions of web pages to index, the crawling process can vary depending upon how frequently your site is updated.