Issue #3228544: Improves Accuracy of Matching Analytics to node aliases
Issues found that this patch fixes
-
Analytics URls need cleaned, since we are matching to Drupal paths we should remove junk ?query parameters, #tags etc from analytics results before matching.
-
Multiple "rows/results" from analytics need merged before matching to a drupal alias. ("/about", "/about#intro", "/about?referal=fb") need cleaned and merged and associated to the drupal "/about"
-
Mixed case issues, analytics can store different case versions of your paths(what ever the user decides to type in), Drupal also allows mixed case paths to be saved, so for the most accuracy converting to lowercase before comparison leads the most accurate results.
-
Special characters in URLs (& or > or SPACES) in analytics will be &(encoded) and Drupal will just be & in the alias so these need to be equal for accurate matching (escape the Drupal alias)
-
Truncate table before running batches. In order to combine results that could be from different cron queues the table must be truncated and merged with previous results after each queue finishes. This also keep the table from growing indefinitely after a long period of the site running and pages being removed.