Skip to content

Issue #3453745: Fix filter false positives

These changes are tested against the HTML below. I think automated tests would be valuable.

A playground of the regex with this HTML is here.

In addition to matching class names that end in language-*, the regex matches class names that end in lang-* to be consistent with highlight.js. Matching lang-* accommodates content that doesn't come through CKEditor (content from copy-paste, from site migrations, from APIs, etc).

Note: matches include class names with leading text such as foo-language-javascript and foo-lang-javascript. This is intentional, because highlight.js matches these and will try to highlight them. That may be a bug in highlight.js, but even so there will be an error if the language file hasn't been loaded by this module.

<p>
language-bogus-1
</p>

<pre><code class="language-yaml"># YAML. `class="language-yaml"`
language-bogus-2: 'yaml code'</code></pre>

<pre><code class="lang-php">// PHP. `class="lang-php"`
$foo = 'bar';</code></pre>

<pre><code class="other-class lang-html">&lt;!-- HTML. Class with additional class (`class="other-class lang-html"`) --&gt;
&lt;p&gt;html code, extra class on &lt;code&gt;code&lt;/code&gt; tag&lt;/p&gt;</code></pre>

<pre><code class="foo-lang-javascript">// Javascript. Class with leading text in class name (`class="foo-lang-javascript"`)
const foo = [
  'bar',
  'baz'
];</code></pre>

<code class="language-delphi">// Delphi. `class="language-delphi"`. This is only a &lt;code&gt;code&lt;/code&gt; tag, so it should not match.</code>

Merge request reports