Issue #3591045: Strip tags before decoding entities in Xls::formatValue()

Summary

When Strip HTML is enabled, Xls::formatValue() previously decoded HTML entities first and stripped tags after. Content containing entity-encoded literal < or > characters (e.g. low pressure, &lt;1 MPa from a formatted text field) was decoded to a real < and then strip_tags() consumed the rest of the string as if it were an unfinished HTML tag — silently truncating the cell to low pressure, .

Fix

Reverse the order of the two operations:

   protected function formatValue($value) {
     if ($this->stripTags) {
-      $value = Html::decodeEntities($value);
-      $value = strip_tags($value);
+      $value = strip_tags($value);
+      $value = Html::decodeEntities($value);
     }

After strip_tags(), entity-encoded brackets remain as &lt; / &gt; (so they do not trigger tag interpretation). Html::decodeEntities() then turns them back into literal characters for the final value.

Scope

Single change in Xls::formatValue(). The fix transparently covers:

  • Xls directly.
  • Xlsx extends Xls — does not override formatValue().
  • OpenSpoutXlsxEncoder extends Xlsx — does not override either.

No behaviour change for inputs without entity-encoded brackets (the most common case): real HTML tags are still stripped, entities like &amp; still decode correctly. The existing testFormatValue() assertions all still pass without modification.

Test plan

Adds testFormatValueWithEncodedAngleBrackets() covering:

  • The exact reported case: <p>Buffer hydrogen gas holder: low pressure, &lt;1 MPa</p>Buffer hydrogen gas holder: low pressure, <1 MPa.
  • &gt; equivalent: <p>5 &gt; 3 in absolute value</p>5 > 3 in absolute value.
  • Entities adjacent to real tags: <p>If <em>x &lt; y</em> and <em>y &gt; z</em> then x &lt; z.</p>If x < y and y > z then x < z.

Locally: 6 tests, 29 assertions, OK. The pre-existing setAccessible() deprecation warning is unrelated.

phpcs --standard=Drupal,DrupalPractice clean.

Merge request reports

Loading