Jump to content

Talk:Sitemaps

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

A previously unlabeled conversation

[edit]

merlinvicki, took the signature off, needed on Talk pages but not article pages

Jwestbrook 22:51, 19 October 2005 (UTC)[reply]

past link to article on Merlinvicki

Jwestbrook 23:34, 24 October 2005 (UTC)[reply]

oops

J\/\/estbrook       18:16, 6 November 2005 (UTC)[reply]

[edit]

Does anyone know the exact date that the links tab became active in Google sitemaps? Siralexf 17:15, 9 February 2007 (UTC)siralexf[reply]

[edit]

I've noticed that in the last few months, some site listed in google results include sublinks. For example, this search for slashdot returns a link to the main slashdot site along with links to Games - Login - Apple - Science beneath the description. Is this one of the benefits of submitting a site map to google? If so, it would be worth mentioning in the article. mennonot 09:39, 19 November 2005 (UTC)[reply]

Not related, Matt Cutt explained it was Google search results improvement: [1] Ivan Bajlo 15:09, 27 December 2005 (UTC)[reply]

Wiki Sitemaps?!

[edit]

Does Mediawiki has an extension that creats a Google Sitemaps auytomaticaly? Or is it is built with integrated sitemap? F16

Yes there is a script to make sitemaps in your wiki's /maintenance directory. Jidanni (talk) 04:25, 15 March 2008 (UTC)[reply]

Sitemap generation tools

[edit]

I'm removing a bunch of links to sites that claim to generate sitemaps... by spidering a web site. Can someone please explain how this is any different from allowing the search engines spider your website? Seems pretty pointless and shady to me. --Imroy 20:53, 31 August 2006 (UTC)[reply]

==

True, but some tools (not exactly the formerly listed ones) do provide some added value, for example editing attributes like change-frequency or priority. Crawling the site is just a way to create the initial URL list then. However, I don't think that listing every sitemaps tool is a good idea, providing links to lists of tools like at code.googgle.com or sitemapstools.com is enough. That said, I do think that linking to a sitemap validator is a good thing. I provide such a free tool (along with tons of sitemaps info, FAQs, a Vanessa Fox interview ...) on my site at smart-it-consulting.com and somebody linked to it a cpl. months ago. Unfortunately, this link is gone too. --Sebastian September/21/2006

==

Why is ROR listed? All major search engines support RSS, but none (!) of them states support for the added ROR fields. If you don't understand what I mean, check this article: http://www.micro-sys.dk/developer/articles/website-sitemap-kinds-comparison.php You can see Google and Yahoo mentions a lot of formats, but none of them is ROR.

--Tom November/10/2007

==

Spidering a website is the only reliable way to create a sitemap particularly for larger, dynamic websites. When search engines crawl your site, they do not produce a sitemap for you. The entire point of "Google Sitemaps" as well as Yahoo's sitemap program is that webmasters are asked to submit a sitemap. The search engines want sitemaps which is why this page exists here. Besides this, a sitemap service can share their findings with the webmaster... which the search engines do not do very well, if at all. Not all pages on the web are coded very well and despite the myriad of articles which explain how to write good code, for many it's easier to get a list of coding and HTTP protocol errors that are specific to their website (pages, server responses, HTTP status errors, etc.). What is shady about it? --MaxPowers 08:23, 25 January 2007 (UTC)[reply]

==

Particularly for larger, dynamic websites the sitemaps should be generated from the underlying database. If dynamic sites use spidering tools to create the sitemap, most probably especially the URLs not visible to SE crawlers will not get included. Makes sense? --Sebastian February/06/2007

Removed another link. Please se WP:EL for more information, specifically, what is not accepted:
Links mainly intended to promote a website. —The preceding unsigned comment was added by Mporcheron (talkcontribs) 23:03, 25 January 2007 (UTC).[reply]

==

Spidering allows a realistic view of any size website and can be used to uncover errors on the page due to template or 'cms error'. One typical example is on WordPress blogs where commenters do not leave a website address and the link is listed as href="http://" when the page is displayed to browsers and SE spiders. This is technically a broken link and is one example of how a spidering service can benefit a webmaster by sharing their findings. Which URLs would not normally be visible, but need to be included in a sitemap? It would seem to reason that if a site wants the SE's to see a page, it should be visible and should have at least some pages linking to it if that page is to do anything within any search engine. All SE's will filter out orphaned pages including Google.

The sitemap programs (not software 'programs') offered by the search engines allow webmasters to share URLs that are not generally spidered, such as multi-level navigation through categories and sub-categories, but if 'normal' navigation is broken to the point that spidering is "impossible", then it is generally a poor navigational structure to begin with. Some spidering services offer a means to get around this anyway using scripted images, but this is probably irrelevant for this discussion.

The biggest problem with db-based systems is that they are very specific to a particular application and do not cover other areas of comprehensive websites (forum, blog, cart, general CMS, static pages, etc. all on one site). I would agree that db-based sitemap generators could be more efficient as they don't require a full page to load, but that efficiency comes at the price of sacrificing completeness in many cases and accuracy from a spiders point of view in all cases.MaxPowers 05:39, 8 February 2007 (UTC)[reply]

Robots.txt "Sitemap:" declaration.

[edit]

The text both at sitemaps.org and here says:

"The <sitemap_location> should be the complete URL to the Sitemap, ..."

Note that "should" is not "must," and as other directives (namely, "Disallow:") use relative URLs, not absolute ones, the language used in the definition of the declaration implies that a relative URL for a site map declaration (only in the "/robots.txt" file) is valid and may be used. If the intent of the definition were to require only fully specified URLs, the language used to specify the declaration syntax needs to be changed. I have noted that some people think that only a fully specified URL can be used in "robots.txt" for a site map declaration; such a conclusion appears erroneous based on the diction used.

I assume that verbs such as "should, must and may" have their usual meanings as in the popular Internet "request for comments" document series.

- D. Stussy, Los Angeles, CA, USA - 08:30, 31 May 2007 (UTC)

Of course you can put it in there that way. It won't break robots.txt. However: I want sitemap-aware bots to figure out where my sitemap is, so I'll give them what they're expecting: a full URL. 198.49.180.40 19:17, 8 June 2007 (UTC)[reply]

There's no reason to believe that a robot can't compute an absolute URL from a relative URI, as it must do so with other relative URIs from other HTML resources it fetches along its indexing (or scanning) journey. In fact, somewhere I have a patch for HTDIG (3.1.6) to do exactly that - accept a relative URI for a sitemap from "/robots.txt". (It adds the sitemap to the stack and uses an external XML processor add-on to process the map or sitemapindex - as it may be either.) - D. Stussy, 01:13, 9 July 2010 (UTC)

Additional: Defining the sitemap as a relative URI is useful especially in virtual hosting where multiple web sites under different domains/hostnames are served by the same physical host, where they may share a globally defined "/robots.txt" file. One entry (e.g. "Sitemap: /sitemap.xml") could point to a site map for every domain on that host. Of course, the individual site maps would be present in each domain's document root and would have different content (or not exist). Such a construct provides for separation of domains and avoids having to make multiple entries in one file that could tell malicious people what other domains are served via the same host while allowing the host adminstrator to set a uniform robots policy. - D. Stussy, 17 June 2011.

I feel like the "should" directive might be outdated. Currently on sitemaps.org (https://www.sitemaps.org/protocol.html#submit_robots), it shows an example line "including the full URL to the sitemap" with no "should" directive. This makes it seem like a full URL is mandatory. Does anyone have a link to a resource that says otherwise? - Kodos84 (talk) 20:58, 11 January 2017 (UTC)[reply]

The page at sitemaps.org has been modified between your comment and mine. That is a change for the worst. In order to use the SAME robots.txt file for all virtually hosted domains, one needs either a relative URL for the sitemap or one has to dynamically generate the robots.txt file to insert the proper domain. That's too complex for many webmasters. Should this ever proceed to an RFC, relative URLs for the sitemap directive should be permitted. -D. Stussy, 27 August 2017.

Submit site map URL

[edit]

www.google.com/webmasters/tools/ping?sitemap= or http://google.com/webmasters/sitemaps/ping?sitemap= ?

See http://www.google.com/support/webmasters/bin/answer.py?answer=34609 —Preceding unsigned comment added by 87.119.120.23 (talk) 13:10, 22 February 2008 (UTC)[reply]

Submission Sitemap Externals

[edit]

I forgot to log in when I added those external links, they contain the "official" method by that said search engine to submit a valid XML sitemap. Neither of the pages attempt to sell a product and keep the valid neutral point of view. Please comment here on any change suggestions.

This can help clean up the article with all the how-to information and allow an external neutral point of view apply any official how to methods for that said search engine supporting the sitemaps feature.

SDSandecki (talk) 06:21, 25 February 2008 (UTC)[reply]

Plain text OK too

[edit]

Mention that sitemaps can also be in plain text format: sitemap.txt, and sitemap.txt.gz. See the Google webmaster tips if you don't believe me. Jidanni (talk) 04:27, 15 March 2008 (UTC)[reply]

Sitemap and sitemaps

[edit]

The two article apparently are speaking of the same thing. But there are some confusion about what sitemap is. Quote of the sitemaps.org (the protocol site):
"Sitemap protocol format consists of XML tags"
For me, sitemap = sitemaps. Maybe we shoud disambiguate "sitemap" as architecture and as protocol. Acaciz (talk) 14:29, 24 April 2009 (UTC)[reply]

Can we remove the "how-to" warning?

[edit]

I read the page today and did not see any how-to content. I think it is time to remove the {howto|article} warning template. I am going to put a message on Ddxc's talk page (s/he was the originator of the how-to template's use). The template has been there since 23 December 2007.

BrotherE (talk) 19:11, 7 September 2009 (UTC)[reply]

I agree. Just read the article and was informed, not trained.

To be honest, ** there is ** how to, the article duplicates the tutorial on sitemaps.org. It this part is removed as it must be, very few content will remain. I still think that it must be merged with Sitemap. Macaldo (talk) 11:44, 17 November 2009 (UTC)[reply]


Sitemap term has Two meanings

[edit]

XML Sitemaps used to direct Search engine parsing, and Site Maps that help in user navigation/ architecture

Neither of the pages attempt to sell a product and keep the valid neutral point of view. Please comment here on any change suggestions.

This can help clean up the article with all the how-to information and allow an external neutral point of view apply any official how to methods for that said search engine supporting the sitemaps feature. —Preceding unsigned comment added by DShantz (talkcontribs) 00:50, 30 March 2010 (UTC)[reply]

Sitemap location statement is wrong and spec is unclear

[edit]

"As the Sitemap needs to be in the same directory as the URLs listed". This is wrong. It needs to be at the level of or above the URLs listed. http://www.sitemaps.org/protocol.html#location gives more detail and also explains how robots.txt can grant permission for foo.com's sitemap to be hosted on bar.com but it doesn't make clear how that affects the path part of the URL nor whether this permission passes down through sitemapindexes to other sitemaps. -- Ralph Corderoy (talk) 18:37, 3 January 2013 (UTC)[reply]

[edit]

Hello fellow Wikipedians,

I have just added archive links to one external link on Sitemaps. Please take a moment to review my edit. If necessary, add {{cbignore}} after the link to keep me from modifying it. Alternatively, you can add {{nobots|deny=InternetArchiveBot}} to keep me off the page altogether. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true to let others know.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—cyberbot IITalk to my owner:Online 09:27, 14 February 2016 (UTC)[reply]

Duplicate page

[edit]

This page seems to be a duplicate and should we merged with https://en.wikipedia.org/wiki/Site_map — Preceding unsigned comment added by Ottokek (talkcontribs) 08:03, 15 July 2016 (UTC)[reply]

The two pages aren't really duplicates, but they could certainly do with merging and demerging — there's a bunch of content at Sitemap that should probably be here and there shouldn't really be more than a passing reference to user-facing sitemaps on this article, which should only be about the XML Sitemaps (sitemap.xml files) used by crawlers. Some cleanup of the two pages would definitely be beneficial. — OwenBlacker (Talk) 09:25, 27 July 2016 (UTC)[reply]
Complicating matters more, this article should be called "Sitemap" as that is the name of the file sitemap.xml, its specification, the generator, etc, etc.
The Google product was called "Google Sitemaps", plural because they would be collecting millions of them, and the website is called "http://sitemaps.org", and it is often called the "sitemaps.org protocol" as a result. But this article isnt primarily about the initial Google product or the website. John Vandenberg (chat) 04:13, 28 February 2018 (UTC)[reply]

This page is now under indefinite pending changes protection.

[edit]

Per Wikipedia:Administrators' noticeboard#Spam / vandalism magnets this page is now under indefinite pending changes protection. See Wikipedia:Pending changes and Wikipedia:Reviewing pending changes. If any of the regulars who watch this page don't have the PC approval right, you can apply at Wikipedia:Requests for permissions or just ask me to do it for you. (PC approval is routinely granted to experienced editors). --Guy Macon (talk) 17:12, 21 May 2020 (UTC)[reply]