Instruct search engines to ignore HTML export with noindex directive
>>> [!note] Migrated issue
<!-- Drupal.org comment -->
<!-- Migrated from issue #3481327. -->
Reported by: [jenna.tollerson](https://www.drupal.org/user/147099)
Related to !67
>>>
<h3 id="summary-problem-motivation">Problem/Motivation</h3>
<p>Google and other search engines can index the "Printer friendly" HTML export of book pages, with URLs like <code>/book/export/html/65</code>. My feeling is that the export is not really meant to be discoverable by search engines, rather it's a service to visitors who'd like to access that format.</p>
<h4 id="summary-steps-reproduce">Steps to reproduce</h4>
<p>By using Google Search Console, you often find that Google is discovering and indexing these <code>/book/export/html/*</code> pages.</p>
<p>Even if you add <code>Disallow: /book/export/html/</code> to your robots.txt, Google and other search engines will still discover the "Printer friendly pages", because they are linked from their less-printer friendly counterparts. robots.txt will not block these pages from being discovered <em>because</em> they are linked. In Google Search Console, these pages may be listed as "Indexed, though blocked by robots.txt" </p>
<p>Via <a href="https://developers.google.com/search/docs/crawling-indexing/robots/intro">Google Search Central</a>:</p>
<blockquote><p>Warning: Don't use a robots.txt file as a means to hide your web pages (including PDFs and other text-based formats supported by Google) from Google search results.</p>
<p>If other pages point to your page with descriptive text, Google could still index the URL without visiting the page. If you want to block your page from search results, use another method such as password protection or noindex.</p></blockquote>
<h3 id="summary-proposed-resolution">Proposed resolution</h3>
<p>Add <code><meta name="robots" content="noindex"></code> to the <code><head></head></code> of <code>/templates/book-export-html.html.twig</code>.</p>
<p>Individual sites can make this improvement by modifying <code>book-export-html.html.twig</code> in their own theme, of course, but it makes a lot of sense to me to provide this markup by default.</p>
<h3 id="summary-remaining-tasks">Remaining tasks</h3>
<h3 id="summary-ui-changes">User interface changes</h3>
<h3 id="summary-api-changes">API changes</h3>
<h3 id="summary-data-model-changes">Data model changes</h3>
issue