In Drupal the ubiquitous "t" function is used to translate strings to a page language or a given user language. As such in module writing the "t" function should be used extensively to encapsulate all user readable text.
The "t" function works with special placeholders that signal "dynamic information" in a string that needs "extra" filtering or should not be filtered or translated at all, such as URLs. There are three different placeholders that offer three different exceptions to the normal operation of "t".
| Placeholder | Description |
|---|---|
| ! | Prevents all manipulation by "t", text is inserted as is. |
| @ | This placeholder ensures that text is run through "check_plain" so that all special characters are encoded as utf8 HTML. |
| % | This placeholder runs the data through "check_plain" and highlights the output with a theme placeholder which by default is <em> |
"t" as a localization function that can vastly increase the accessibility of your site and, when used properly, and "t" as a security function can harden your site against all known XSS attacks. Since all of your site should be written with translation in mind using the placeholders to filter out potential XSS. Within certain limits, all user input fields should be run through "t" with the placeholder @ at the very least, such that PHP concatenation is non-existent in your site.
So that instead of doing this with Drupal:
$output = "Hello insecure world!"; drupal_set_message(t("My message for you is: ".$output));
You should be writing all output strings using the placeholders, like so:
$output="Hello world!"; drupal_set_message(t("My message for you is: @output", array('@output' => $output)));
Since "t" is primarily a localization function there are some inherent problems with relying solely on "t" to protect against XSS. "t" is designed to handle code based strings and not individual variables which leads to some very common misuses of "t". The placeholders provide a security context for "t" to handle 'some' variables, but a rapidly changing variable will lead to orphaned translation data that in some situations can greatly increase database bloat.
For security purposes "t" relies on "check_plain", which resides in the bootstrap on line 733. An inspection of that code shows:
function check_plain($text) { return drupal_validate_utf8($text) ? htmlspecialchars($text, ENT_QUOTES) : ''; }
Where the Drupal function drupal_validate_utf8 handles a class of UTF8 misinterpretations errors common in IE6 and to a lesser degree in other browsers. It is important to note that this function relies upon the PHP function preg_match to detect invalid UTF8 and will fail silently with no data returned by the function.
And the PHP function htmlspecialchars does the heavy lifting of grabbing a subset of special characters used to insert HTML markup and encodes them as printable HTML entities (as in '&' becomes &).