Enhance Search Engine Indexing with sitemap.xml
I had this article in progress for several years and sent parts of it to colleagues as needed. Martin Pešout brought me to KontentKing and their great articles — boldly read XML Sitemap or the summary below.
What not to forget
- Include only SEO-relevant URLs. Google considers the URLs listed in
sitemap.xmlto be fundamental and indexes them primarily.
- It’s a great place to define URLs to be excluded from the index. The dismissal URLs listed here are processed fastly.
loctag is required. It must contain an absolute and canonical URL (of course, the same as in the meta tag). We also consider the so-called self-canonical URL to be a canonical URL.
- In multiple languages, do not forget to state the language alternations
hreflang. Don’t duplicate
hreflangin meta tads. Use onz one solution — meta or in a
lastmodtag is optional but significant because it informs the robot about the changes, and therefore the suitability of reindexing. Change the date only if the content was changed significantly. That is not only when correcting typos :-) Google often penalizes frequently updated pages with minimal change.
priotags are not necessary when using
- Look at the specifications for inserting images or videos for larger projects again, to the will of the crawl budget not to include and use JSON-LD.
- Do not embed URLs for articles (news) in the sitemap and instead use the format for RSS / Atom feeds. Don’t forget to write links to feeds in corresponding meta tags.
- For larger projects, it is good to check the current specifications, such as the limit of 50 MB in the uncompressed state (
index.xml.gzcan be used), max. 50,000 URLs, use
What should not get into XML
- Non-canonical pages
- Duplicate pages
- Pages with paging 2+ inclusive
- With parameters or session ID
- Search results (internal)
- Various versions created for sharing (abbreviated for Twitter, e‑mail, etc.)
- URLs created using filtering that are not important for indexing (see SEO formulas and noindex),
- archived pages
- Any 3xx redirects, missing 4xx pages, or 5xx errors
- Pages blocked in
- Pages in noindex
- Pages after submitting the form, etc.
Perhaps the summary helped to orientate oneself in the topic of sitemaps :-)