Drupal: clean a string for use in URLs

Posted on 19/08/2016

In certain odd cases you will find yourself needing to clean a string so you can use it in a URL. Even though writing such helper function is not hard, you will soon find out that there are edge cases in which maintaining your own code does not make much sense, such as filtering out and converting non-alphanumeric characters. That's where Pathauto module comes in.

Pathauto is one of the most installed modules. It provides you with a way to configure clean URLs for almost any aspect of your website (users, nodes, taxonomies). Even though I was the one that asked the question on how to replace it with custom code, the only instances in which I did not use the module were private websites where URLs did not have any purpose (not even for block visibility).

There are a few reasons why you would want to use Pathauto's function instead of doing this on your own:

  1. You won't have to maintain the function itself.
  2. Pathauto will automatically remove the commonly used words such as "a", "an", "as", etc. This can be configured at /admin/config/search/path/settings for both Drupal 7 and Drupal 8.
  3. Out of the box you get the additional filters and modules that build on top of Pathauto, such as the excellent Transliteration module for Drupal 7 (included in D8 core). This module is an absolute must have for any website where you might need to upload files or deal with different encodings. In short, it transliterates all strings in URLs and filenames so you can get standardized, clean strings that work everywhere and without encoded links. This works for all accented characters as well as non-alphabet writing systems such as Cyrillic.

Finally, here's the code you would need to use for Drupal 7:

  1. // The pathauto_cleanstring() function is located in the pathauto.inc file, so
  2. // we have to load that file first.
  3. module_load_include('inc', 'pathauto', 'pathauto');
  4. $clean_string = pathauto_cleanstring('Your very dirty string, with many URL un-friendly parts!');

And Drupal 8:

  1. $clean_string = \Drupal::service('pathauto.alias_cleaner')->cleanString('Your very dirty string, with many URL un-friendly parts!');

The return value in both cases will be your-very-dirty-string-many-url-un-friendly-parts.

That's pretty much it! If checking the docs online, you will notice some inconsistencies because the pathauto_cleanstring() is still listed as a valid function in Pathauto for D8, while in reality it does not exist anymore. Also, it is still mentioned in 2 comments in the code. The function actually does not exist anymore and is replaced with alias_cleaner service.

I suggest just installing Devel and playing around with this to understand how it works and what kind of strings will you get back.

See also:

  • Pathauto module
  • cleanString() method in AliasCleaner.php file in Pathauto for D8. Unfortunately at the moment docs for this are not available online.
  • Transliteration module for D7. Do note that this module is included in Drupal 8, but the file transliteration does not work yet and will be included only in Drupal 8.3. You can follow this here.
  • pathauto_cleanstring()
  • module_load_include()
  • Question where I asked about replacing Pathauto with custom code