Issue #3453745: Fix filter false positives
These changes are tested against the HTML below. I think automated tests would be valuable.
A playground of the regex with this HTML is here.
In addition to matching class names that end in language-*
, the regex matches class names that end in lang-*
to be consistent with highlight.js. Matching lang-*
accommodates content that doesn't come through CKEditor (content from copy-paste, from site migrations, from APIs, etc).
Note: matches include class names with leading text such as foo-language-javascript
and foo-lang-javascript
. This is intentional, because highlight.js matches these and will try to highlight them. That may be a bug in highlight.js, but even so there will be an error if the language file hasn't been loaded by this module.
<p>
language-bogus-1
</p>
<pre><code class="language-yaml"># YAML. `class="language-yaml"`
language-bogus-2: 'yaml code'</code></pre>
<pre><code class="lang-php">// PHP. `class="lang-php"`
$foo = 'bar';</code></pre>
<pre><code class="other-class lang-html"><!-- HTML. Class with additional class (`class="other-class lang-html"`) -->
<p>html code, extra class on <code>code</code> tag</p></code></pre>
<pre><code class="foo-lang-javascript">// Javascript. Class with leading text in class name (`class="foo-lang-javascript"`)
const foo = [
'bar',
'baz'
];</code></pre>
<code class="language-delphi">// Delphi. `class="language-delphi"`. This is only a <code>code</code> tag, so it should not match.</code>