Skip to content
Snippets Groups Projects

Issue #571548: Prevent Views generating SQL aliases which are too-long or duplicates

Closes #571548

Use safe SQL aliases.

We limit the length of the original alias (to a default maximum of 60 characters), incorporating a hash of the original value for uniqueness if that value was too long to use verbatim.

This prevents subsequent truncation from creating duplicate aliases in cases where two or more aliases are identical up to the point of truncation. This happens particularly easily with relationships, where alias names may be built from several concatenated identifiers.

Edited by jweowu

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • jweowu added 1 commit

    added 1 commit

    • 7a1ef666 - Add type-hinting for the $maxlength argument

    Compare with previous version

  • jweowu resolved all threads

    resolved all threads

  • jweowu added 1 commit

    added 1 commit

    • 2e661f89 - fixup! Add type-hinting for the $maxlength argument

    Compare with previous version

  • jweowu marked this merge request as draft from issue/drupal-571548@2e661f89

    marked this merge request as draft from issue/drupal-571548@2e661f89

  • jweowu added 1 commit

    added 1 commit

    • 4a26e63e - Add type-hinting for the $maxlength argument

    Compare with previous version

  • jweowu marked this merge request as ready

    marked this merge request as ready

  • jweowu added 1 commit

    added 1 commit

    • c9ca7d27 - Add type-hinting for the $maxlength argument

    Compare with previous version

  • Stephen Mustgrave
  • Stephen Mustgrave
  • jweowu added 1 commit

    added 1 commit

    • efb734a5 - Add mixed type-hinting for the $alias argument

    Compare with previous version

  • quietone
  • quietone
  • quietone
  • 592 *
    593 * @param mixed $alias
    594 * The alias to sanitize.
    595 * @param int $maxlength
    596 * The maximum number of characters permitted in an alias. Defaults to 60.
    597 *
    598 * @return mixed
    599 * The sanitized alias string (if a string argument was supplied;
    600 * non-strings are returned unmodified).
    601 */
    602 public static function sanitizeAlias(mixed $alias, int $maxlength = 60): mixed {
    603 if (!is_string($alias)) {
    604 return $alias;
    605 }
    606 if (strlen($alias) > $maxlength) {
    607 $checksum = 'c' . crc32($alias);
    • Why use crc32 which returns an int? In core, it is more common to use md5.

    • Because it produces short hashes, meaning that we don't need to truncate the result as an additional step (which was going to be necessary with every other option I looked at).

    • I think my expectation was also that a function which was designed to return a shorter hash might well be less likely to produce a collision than truncation of longer hashes (which might turn out to have common prefixes). I don't know whether that's actually the case, though (and I imagine the chances are vanishingly small in either case, so I feel like this concern is probably only theoretical).

      I didn't see an obvious reason not to go with crc32 here, but I'll happily defer to someone who understands the maths better than myself.

    • Please register or sign in to reply
  • quietone
  • 596 * The maximum number of characters permitted in an alias. Defaults to 60.
    597 *
    598 * @return mixed
    599 * The sanitized alias string (if a string argument was supplied;
    600 * non-strings are returned unmodified).
    601 */
    602 public static function sanitizeAlias(mixed $alias, int $maxlength = 60): mixed {
    603 if (!is_string($alias)) {
    604 return $alias;
    605 }
    606 if (strlen($alias) > $maxlength) {
    607 $checksum = 'c' . crc32($alias);
    608 $pos = strlen($checksum) + strlen($alias) - $maxlength;
    609 $alias = $checksum . substr($alias, $pos);
    610 }
    611 return strtolower(substr($alias, 0, $maxlength));
    • Suggested change
      615 return strtolower(substr($alias, 0, $maxlength));
      615 return mb_strtolower($alias);

      But why does this need to be converted to lower case? In views I don't see other places where the case of an alias is modified.

    • Potentially nothing more than an aesthetic choice, but I no longer remember.

    • In fact the lower-case-ness was me being consistent with the old code, which did this:

          // PostgreSQL truncates aliases to 63 characters:
          // https://www.drupal.org/node/571548.
      
          // We limit the length of the original alias up to 60 characters
          // to get a unique alias later if its have duplicates
          $alias = strtolower(substr($alias, 0, 60));

      So as aliases are forced to lower-case in current releases, I think it's sensible to maintain that.

    • Please register or sign in to reply
  • quietone
  • 411 412 while (!empty($this->relationships[$alias])) {
    412 413 $alias = $alias_base . '_' . $count++;
    413 414 }
    • Comment on lines 408 to 413

      This looks like it is working to make the alias unique. Perhaps this can be removed?

    • I'd need to examine the code again, but from memory I think that was pre-existing code which protects against the generation of identical aliases which will be used for different purposes.

      The sanitizer doesn't help in that situation (its job is not to return a different alias every time, but rather to return a safe version of what it's given -- and always the same output for a given input), so we probably still need this code.

      (Of course if those 'uniquified' aliases are too long, the sanitizer will produce a unique shorter alias for each of them.)

      Edited by jweowu
    • Yes, that code is to ensure a NEW relationship alias is generated, so it needs to be different to anything already known.

      It does look like there's a bug here, though...

      The code does work if we call addRelationship() twice with the same short alias (edit: make that a short lower-case alias): the sanitizer won't modify that alias, and therefore it will exist as a key in $this->relationships the second time we call the function, and we'll notice that and add a unique numeric suffix to make it different.

      If we call addRelationship() twice with the same LONG alias (or one with upper-case letters) then we'll end up with a conflict, because when we assign $this->relationships[$alias] a few lines later we've sanitized $alias to a shorter value. So the next time we call addRelationship() with the original/long would-be-duplicate alias it won't find it in $this->relationships, and hence it will end up with the same sanitized value as the first call.

      Edited by jweowu
    • Note that we do still NEED to sanitize the alias AFTER the suffix has been added, because the addition of the suffix might be the thing which pushes the length of the alias over the limit.

      If I'm not mis-reading the situation, I suspect we want:

          // Make sure $alias isn't already used.
          $alias_base = $alias;
          $count = 1;
          $alias = $this->sanitizeAlias($alias_base);
          while (!empty($this->relationships[$alias])) {
            $alias = $this->sanitizeAlias($alias_base . '_' . $count++);
          }

      Could someone please review this bug/fix, in case I'm missing some reason why the code was actually ok? (E.g. if the un-sanitized alias ends up as a key in that array as well as the sanitized version.)

      Edited by jweowu
    • Please register or sign in to reply
  • quietone
  • quietone
  • 573 578 return $alias;
    574 579 }
    575 580
    581 /**
    582 * Produce a safe alias value.
    583 *
    584 * We limit the length of an alias string (to a default maximum of 60
    585 * characters), incorporating a hash of the original value for uniqueness
    586 * if that value was too long to use verbatim.
    587 *
    588 * This prevents subsequent truncation from creating duplicate aliases
    589 * in cases where two or more aliases are identical up to the point of
    590 * truncation. This happens particularly easily with relationships,
    591 * where alias names may be built from several concatenated identifiers.
    592 *
    593 * @param mixed $alias
    • Can this be more specific? Isn't this just string|null ?

    • Off the top of my head I'm not 100% certain at this point; I'd need to scour all the code again. The main thing is that it returns any non-string value unchanged.

    • I think this is good as-is; but if anyone feels motivated to establish with certainty that nothing could possibly cause the argument to be anything other than string|null (or perhaps some extended set of specific types), then that's fine by me.

    • Please register or sign in to reply
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading