DP Code Best Practices

From DPWiki
Jump to navigation Jump to search

This page contains the growing set of Guidelines and best practices to be used for developing the DP Code. Refer to DP Code Development for other more general information about the development process.

Coding conventions

Many many people have had their hands in the code over the past years, each with different coding conventions. Starting in April 2021 we're using PHP-CS-Fixer to enforce some common formatting options. It doesn't fix everything so you'll still see some variation.

See SETUP/CODE_STYLE.md for more information on running PHP-CS-Fixer on your code.

Indents

Use 4 spaces for an indent. Most code editors allow you to set them up such that hitting the Tab key will, rather than inserting a tab character, insert 4 spaces. This is the encouraged method. Please don't use tab characters.

PHP tags

Opening tags

To support sites that have short_open_tags disabled, start PHP code with the following:

<?php

not:

<?
<?=
<?PHP

(Actually, <?PHP isn't a problem for sites with short_open_tags=Off, we've just chosen to use the lower-case version for consistency.)

The opening tag should be on a line by itself unless it is embedded within other HTML, eg:

Check out this url: <?php echo $url; ?>

Closing tags

Do not include a PHP closing tag (?>) at the end of the file if the file ends in a PHP block. This is a good best practice to help prevent trailing whitespace from being sent after Header() redirects.

The exception is if it's inline with other HTML, eg:

Check out this url: <?php echo $url; ?>

Brackets

For functions, place brackets on their own line.

function test_function($arg)
{
    echo "this is a test";
} 

For all other blocks, place the bracket on the same line and indent the contents within the bracket, for example:

if ($testing) {
    echo "this is a test";
} 

Always use brackets:

if(true) {
    echo "it's true!";
} else {
    echo "it's false!";
}

Don't do this:

if(true)
    echo "it's true!";
else
    echo "it's false!";

Squashing errors/warnings

PHP supports the ability to suppress an error for a line of code by using the @, see the Error Control. Please don't do this. Instead, change the code to handle the various possible error conditions. Note that much of the code uses this but we're trying to get away from it.

Do this:

$list = [];
if(isset($list[0])) {
    $first_item = $list[0];
}

Don't do this:

$list = [];
$first_item = @$list[0];

Function, Variable, and Class names

Use the underscore method of spacing words instead of camelCase:

function this_is_a_longer_function_name()

and not:

function thisIsALongerFunctionName()

Be as precise and descriptive as possible when naming variables and functions. The following are examples of good variables names:

$proofer_user_id
$round_id
$stage

The following are not good variable names:

$i
$id
$stg

For classes, use PascalCase:

class MenuIcon

and not:

class menu_icon
class menuicon
class menuIcon

Typing

We've recently (circa 2024) started adding type declarations to function calls. This paired with PHPStan can help us find bugs via static analysis. Note that unlike Python, in PHP type definitions are enforced at runtime and PHP will throw an exception if a function is called with a type that doesn't match the declaration.

Until the end of 2024 we officially support PHP 7.4 which does not include the mixed type which somewhat limits our use of type declarations in places. PHP 7.4 does include "this type or null" with the ?type syntax, for instance a "string or null" is ?string.

When adding new functions or modifying existing ones please try to add types where possible, including in the return.

PHPStan

To determine a type, PHPStan relies on:

  1. PHP types
  2. PHPDoc annotations


While both work from PHPStan's perspective, PHPDoc annotations are not enforced by the PHP runtime. This means it's always better to set up PHP types, unless you're concerned about potential exception.


This sets the return type using a PHPDoc annotation:

/** @return int */
function returns_int() {
    return 42;
}


So prefer to set the PHP return type instead:

function returns_int(): int {
    return 42;
}


For more complex cases (see e.g. arrays below) or when you're unsure, it is fine to add them in a PHPDoc annotation. You can find some explantation about the PHPStan type syntax here.

Type enforcement

Any PHP type is automatically enforced by the PHP runtime and will trigger an exception if it can't convert one type to another. This means that this won't trigger an exception:

function take_a_string(string $a): void {
   // Do something with the string.
}

// Working.
take_a_string(12); // Implicit int-to-string conversion.
take_a_string(true); // Implicit bool-to-string conversion.

// Throwing.
take_a_string(new stdClass()); // Exception: cannot convert an object to a string.
take_a_string(['userid1', 'userid2']); // Exception: cannot convert an array to a string.


Any PHPStan annotation won't trigger any PHP exception so there is a few gotchas:

  • PHPStan doesn't enforce return types. This requires level 6 (which is a work-in-progress as of Feb 2025).
  • PHPStan requires a valid PHPDoc string for the annotation.


These are not valid PHPDoc annotations:


Case 1. Comments

/* @return string */
function write_output() {
}

// @return string
function also_write_output() {
}

Why? PHP docs are starts with /**, not /*, or //. PHP Stan ignores annotations in comments.

Fix: /** @return string */


Case 2. Invalid annotation

/** @param $s string */
function takes_a_string($s) {
}

Why? PHP docs put the parameter name last.

Fix: /** @param string $s */

Typing arrays

Arrays are particularly complex to type. This is because PHP only has one type (array) but PHPStan has a much richer syntax that allows us to describe the type of the keys and its value. This means that when typing an array, you should:

  1. Add a array as the PHP type.
  2. Add a PHPDoc annotation that describes the array in more detail for PHPStan.


Let's look at the some cases and their typing syntax.


Case 1: arbitrary string key -> value mapping (aka dictionary)

Use the syntax array<key_type, value_type>.

/** @return array<string, string> */
function returns_dict(): array {
    return $some_global_array;
}


Case 2: string key -> value mapping with known key/values

Use the syntax: array{key1:type1, key2: type2}.

/** @return array{name: string, weight: num} */
function returns_fixed_dict(): array {
    return {'name': 'some_string', 'weight': 100};
}


Case 3: arbitrary index keys to values (aka vectors or lists)

Use the syntax type[] (where type is the superset of the values' types)

/** @return int[] */
function returns_vector(): array {
    return $some_global_list;
}


Case 4: vectors with known indexes

Use the syntax array{0:type1,1:type2} (it's effectively case 2. but with integer keys)

/** @return array{0: string, 1: float} */
function returns_dict(): array {
    return ['id', 42.1];
}


Complex cases:

If you find yourself having a hard time typing something, there may be a few cases:

  1. It calls into function that have a complex type. Maybe those can be simplified?
  2. We are missing some intermediate type that is worth adding to avoid adding complexity to our codebase.


Let's assume some function calling into our DB. Due to the typing of msqli, we don't have a good way of typing those as-is:

/** @return array{string, mixed} */
function fetch_from_db(): array {
    $user_row = mysql_fetch(...);
    return $user_rows;
}

A better approach would be to wrap the row into a class so we can type it:

class MyTableRow {
    public string $user_id;
    public int $last_login_time_ms;

    function __constructor(array $row) {
        $this->user_id = $row['user_id'];
        $this->last_login_time_ms = $row['last_login_time'];
    }
}

/** @return MyTableRow */
function fetch_from_db(): object {
    $user_row = mysql_fetch(...);
    return new MyTableRow($user_rows);
}

Comments

PHP allows three kinds of comments:

// comments
/* comments */
# comments - avoid this one

When possible, use the first comment style (//). If a large block needs to be commented use the second style (/* */). Avoid the shell-style comment (#). In general it is not recommended to check in files with commented out code -- ask yourself why the code is commented and if it needs to remain in the file.

When commenting a function, do so using the DocBlock format:

/**
 * This is a test function
 *
 * This is an optional longer description of this function that tells
 * us more about what it does and maybe some guidance on how it should
 * be used.
 *
 * @param string $arg
 *   Details about this arg go here
 *
 * @return string
 */
function test_function($arg)
{
    echo "this is a test: $arg";
    return "success!";
}

Note: Styling for how a function is documented changed and you may encounter code using older formats.

include() / include_once()

Except for very rare occasions, you should always use include_once() over include().

Only include files in a script that are required for the functionality in that script (.php or .inc).

Sometimes, we add a comment after an include_once() to specify the particular identifiers that we're 'importing' from that file. E.g.

 include_once($relPath.'misc.inc'); // startswith(), attr_safe()

(The reasoning being, if someone isn't familiar with (say) attr_safe(), and thinks maybe it's defined elsewhere in the file, they'll do a search on the name, and land on the include line, which tells them where it comes from.)

Every PHP file must include base.inc

There has been a concerted attempt to improve the rat's nest of include()s and include_once()s in the DP code base. To help prevent include nightmare again, at the beginning of every PHP file include base.inc before including any other files or doing any work.

<?php
$relPath = "./pinc/";
include_once($relPath.'base.inc');

base.inc will take care of correctly setting up gettext() support, session, and creating a connection to the database. base.inc also provides require_login() which should be called after the set of includes and before any work is done (even evaluating $_POST or $_GET variables) if only a logged-in user should access the page.

The only exception to this rule is a PHP file that does only a HTTP header redirect. In those cases, include a comment where the include should be stating why base.inc was not included in the file. For an example, see faq/default.php

Order of Functions

If a file contains both the declaration of a function and calls to it, we generally put the declaration after all the calls. In particular, if a file contains both top-level code and function declarations, we usually put the top-level code first and follow it with the function declarations. (Sometimes we'll put a function decl in the midst of top-level code, if the function is short and the calls are very localized.)

Database design

Integer displays

MySQL supports an extension for specifying the width of integer types, eg:

CREATE TABLE `blah` (
  `code_name` int(4) NOT NULL default 0
);

compared to

CREATE TABLE `blah` (
  `code_name` int NOT NULL default 0
);

The DP code does not use this feature and it is being deprecated in future MySQL versions. When creating new columns or altering old ones, do not include the width.

Database calls

The DP code uses the mysqli interface for talking to the database. We abstract some of that functionality into a DPDatabase singleton class with static methods that handles the connection to the database and some helper functions.

Querying the database

To make queries to the database, use the DPDatabase::query() call with properly-escaped SQL (more on that below). If the SQL query generates any errors (like poorly structured SQL) it will log the errors to the PHP errors log and raise a DBQueryError exception. This ensures that errors are logged but does not expose any error details to the user which might reveal information we would rather not show.

eg:

$sql = "
   SELECT *
   FROM users
";
$result = DPDatabase::query($sql);

If you do not want DPDatabase::query() to throw an exception on error, you can pass 'FALSE' as the second parameter:

$sql = "
    SELECT *
    FROM nonexistent_table
";
$result = DPDatabase::query($sql, false);

Although it might simply be best to catch the exception instead:

try {
    $sql = "
        SELECT *
        FROM nonexistent_table
    ";
    $result = DPDatabase::query($sql, false);
} catch(DPQueryError $exception) {
    // maybe do something about the exception
}

Building SQL

When building SQL, build the SQL into a variable first before passing it into DPDatabase::query(). This ensures they are in a parallel structure to queries that have values that require escaping with sprintf() and makes it easier to debug if we need to output the contents of the final SQL.

eg:

sql = "SELECT * FROM users";
$result = DPDatabase::query($sql);

not:

$result = DPDatabase::query("SELECT * FROM users");

To make the SQL easier to read, make liberal use of whitespace in the code if it won't easily fit on one line, for instance:

$sql = sprintf(
    "
    SELECT username, email
    FROM users
    WHERE email like '%%gmail.com'
        OR email like '%%yahoo.com'
        OR email = '%s'
    ",
    DPDatabase::escape($user_email)
);

Escaping in SQL

To properly escape strings in an SQL statement, wrap them in DPDatabase::escape() calls.

eg:

$sql = sprintf(
    "SELECT * FROM users WHERE username = '%s'",
    DPDatabase::escape($username)
);

To ensure that numbers are escaped, using sprintf() to coerce the value into a number.

eg:

$sql = sprintf(
    "SELECT * FROM users WHERE u_id = %d",
    $user->uid
);

Validating project tables

You can't used escaped table (or column) names in an SQL string. The only dynamic table names that we use in the code are project tables. The proper way to ensure that the table names are correct is to validate them prior to the call.

eg:

validate_projectID($projectid);
$sql = "
    SELECT *
    FROM $projectid
";

Column names as variables

When building queries for project tables, columns are often themselves variables based on which round columns are wanted. To make these queries more readable, include the variable column names directly in the SQL rather than using sprintf(). These queries must take care to only source the column names from reputable sources, like a Round object.

eg:

validate_projectID($projectid);
$sql = "
    SELECT $round->text_column_name
    FROM $projectid
";

not:

validate_projectID($projectid);
$sql = sprintf(
    "
    SELECT %s
    FROM $projectid
    ",
    $round->text_column_name
);

Piecemeal-built queries

Often there are places in the code where an SQL query is built piecemeal and combined together. When doing so, you must take care to correctly validate or escape strings in each piece when being built. Then combine them together without an sprintf():

eg:

$where_clause = sprintf(" AND projectid='%s'", DPDatabase::escape($projectid));
$sql = "
    SELECT username
    FROM projects
    WHERE difficulty = 'average'
        $where_clause
";

not:

$where_clause = sprintf(" AND projectid='%s'", DPDatabase::escape($projectid));
$sql = sprintf(
    "
    SELECT username
    FROM projects
    WHERE difficulty = 'average' %s
    ",
    $where_clause
);

Quoting Metacharacters Properly

Quoting for HTML (outside tags)

To escape variables for output in HTML, use html_safe() from misc.inc. This is mainly for data coming from untrusted sources, like user input, where we don't expect or want HTML. Do not use htmlspecialchars(). html_safe() ensures that the proper DP character set is used.

eg:

echo html_safe($string);

Do not use this function on translated strings within HTML as we consider those coming from a "trusted source" and they may legitimately contain HTML entities or even HTML markup that we want rendered.

Quoting for HTML attributes

To escape variables for use in HTML attributes, use attr_safe() from misc.inc. attr_safe() will not re-escape valid HTML entities, which makes it safe to use on translated strings. This function should only be used for escaping data going into HTML attributes, not elsewhere in the code.

eg:

echo sprintf("<img src='image.jpg' alt='%s'>", attr_safe($image_title));

Quoting for javascript

To escape content for use in javascript, use javascript_safe() from misc.inc.

eg:

 // a string with variable content (either due to using a variable, or translation)
 $string = sprintf(_("You cannot frobnicate the %s."), $value);
 Good:
   echo '<script> ... alert("' . javascript_safe($string) . '"); </script>';
 Bad:
   // $string might contain a " or even </script> if based upon user input.
   echo '<script> ... alert("$string"); </script>';

Whitespace around operators

As mentioned in the Building SQL section, liberal use of white space can make the code easier to read. The code formatter should enforce spaces around most operators, but does not enforce spaces around the concatenation operator (.). Please do add spaces around the . when it is being used as a concatenation operator rather than part of a string.

String localization

DP uses GNU gettext for string localization. The function _() is used to wrap strings that translators will then translate into a different language. All strings in the DP codebase are written in English but that does not preclude someone from "translating" the English strings in the codebase to other English strings via gettext.

There are areas all throughout the code that contain non-gettext-wrapped strings. These should be fixed when identified.

We use a utility called xgettext to scan the source code for calls to _(), gathering up all the strings that need to be translated, so they can be presented to human translators.

Best practices

  • Use _() to wrap any string that would be displayed to the user. (But not 'content' strings such as project titles and project comments.)
  • Wrap large sets of strings together, don't break them apart by sentences for the sake of breaking them apart.
  • Use separate _() markup for separate paragraphs or similar HTML chunks (<li>, <h2>, etc.).
  • Don't include HTML tags in the string if the tags wrap the entire string.
Good:
  $string = "<p>" . _("This is a paragraph.") . "</p>";
Not-so-good:
  $string = _("<p>This is a paragraph.</p>");
  • Do include HTML tags inside the string if the tags go around just one part of the string.
Good:
  $string = _("This is an <b>important</b> paragraph.");
  • Don't include whitespace, such as spaces or newlines, at the beginning or end of the string, because this may not be apparent to someone translating the string.
Good:
  echo _("This is a line."), "\n";
Not-so-good:
  echo _("This is a line.\n");
  • You may break the string over multiple lines, so as to avoid too long lines. But if you do so, keep it a single string literal (otherwise xgettext will not pick up the correct string).
Good:
  $string = _("This is a long 
      long long string.");
  // Translators will not bother about these extra spaces, used merely for indentation.
  // Such code is more readable than when all the text is in a very long line
Bad:
  $string = _("This is a long"
     . " long long string.");
  // It appears that xgettext creates a separate entry for each string literal
  • Translate the string at the literal value in the code, so xgettext will see it.
Good:
  $title = _("WordCheck Page");
  echo $title;
Bad:
  $title = "WordCheck Page";
  echo _($title);
  • Don't use _() around various bits of HTML, CSS or javascript code that are not displayed to the user:
Good:
  $style = "text-align: right;";
Bad:
  $style = _("text-align: right;");
  • Do not forget to translate also values in HTML forms that are displayed to the user. When doing so, use function attr_safe() which will convert any quotes or apostrophe in the translated string for inclusion within the html attribute. attr_safe() is defined in pinc/misc.inc.
Good:
  echo "<input type='submit' value='" . attr_safe(_("Quit")) . "'>";
Also OK:
  echo '<input type="submit" value="' . attr_safe(_("Quit")) . '">';
Bad:
  echo "<input type='submit' value='" . _("Quit") . "'>";
  echo '<input type="submit" value="' . _("Quit") . '">';
  // this will cause a bug if the translated string contains an apostrophe
  // or a double quote character.
Also bad:
  echo "<input type='submit' value=" . _("'Quit'") . ">\n";
  echo '<input type="submit" value=' . _('"Quit"') . '>\n';
  // this is suboptimal because it potentially creates duplicate strings
  // to be translated (e.g. both "'Quit'" and "Quit");
  // it also relies on the translator to escape the apostrophe
  • For strings that include percent signs (%) that are not sprintf() strings (see next section), include a comment above the line to tell xgettext that this is a regular string and not a printf()-like string.
Good:
# xgettext:no-php-format
_('% of newly Joined Users who Proofread')

Leaving comments for the translators

In some cases the strings to be translated are not entirely clear to the translator, or it is possibly ambiguous what the text is doing. A good example is strftime() strings or other highly-replaced strings that when taken out of the code context is nonsense.

Fortunately there is a possibility to have some comments appear in the po files close to their associated string to be translated, where they can be read by the translator. Simply add a block comment in the line just above the string to be translated, with a special keyword TRANSLATORS. Examples:

 // TRANSLATORS: This one-line comment will be shown to the translator
 $string = _("Obscure %x %y %Z string");
 
 /* TRANSLATORS: This longer comment spread on several lines 
 will also be shown to the translator as an explanation for 
 the obscure strftime() string */
 $string2 = _("%A, %B %e, %Y at %X");

The "TRANSLATORS" string is a convention; when running xgettext over the source files the inner engine in the translation center will provide argument --add-comments=TRANSLATORS to tell the tool to extract these comments together with the strings to be translated.

See also: As a further illustration, the gettext manual gives other examples of comments for the translators, to report translation bugs or to handle proper names.

Using sprintf() with gettext

When PHP variables need to be inserted into a translated string, use sprintf().

  • Don't include PHP variables in the string. And don't break a string into fragments to avoid including the variables. If you need to include dynamic values in a string, use sprintf() in combination with _(). For instance:
Good:
  $string = sprintf( _("See also the <a href='%s'>WordCheck FAQ</a>"), 
      "$code_url/faq/wordcheck-faq.php"));
Bad:
  $string = _("See also the <a href='$code_url/faq/wordcheck-faq.php'>WordCheck FAQ</a>";
Also bad:
  $string = _("See also the") . " <a href='$code_url/faq/wordcheck-faq.php'>"
      . _("WordCheck FAQ") . "</a>";
  // translators cannot translate meaningfully fragments that are not complete sentence
  // because the grammar and word order can be very different in translated languages.
  • When using sprintf() containing HTML attribute values, either parameterize just the attribute value itself (without the quotes) or the entire string, but not part of it.
Good:
  $string = sprintf( _("<a href='%s' target='_top'>New Window</a>"),
      "$code_url/blah.php"));
  $string = sprintf( _("<a %s>New Window</a>"),
       "href='$code_url/blah.php' target='_top'"));
Bad:
  $string = sprintf( _("<a href=%s>New Window</a>"),
       "'$code_url/blah.php' target='_top'"));
  // a translator might change the string to
  //   "<a href='%s'>New Window</a>
  // thinking they were correcting bad markup
  • If a string has more than one sprintf() formatting variable, use sprintf() parameter numbering. This allows translators to swap the order of arguments if necessary. See also "argument swapping" for the sprintf function (PHP manual).
Good:
  // when using single quotes, the $ should not be escaped:
  $string = sprintf(_('There are %1$d monkeys in the %2$s'), $number, $location);

  // when using double quotes, the $ must be escaped:
  $string = sprintf(_("There are %1\$d monkeys in the %2\$s"), $number, $location);

  // both of these allow translators to translate the string to something
  // like 'The %2$s contains %1$d monkeys' (except obviously probably not in
  // English)
Non-ideal:
  $string = sprintf(_('There are %d monkeys in the %s'), $number, $location);
  // without the numbers the translator must know how to insert them
  // and we'd rather place the burden on the smaller group of developers
  // than the potentially larger set of non-technical translators
Bad:
  $string = sprintf(_("There are %1$d monkeys in the %2$s"), $number, $location);
  // when using double quotes you must escape the $ in the string
  // otherwise the parameter contents do not get inserted because the $s/$d
  // are treated like PHP variables

Localizing emails

When sending emails with localized strings it's important to send the text localized with the recipient's locale, not the person viewing the page. To do this we need to switch the gettext() locale before building the email, and change it back after the email is constructed.

// switch to recipient's locale
configure_gettext_for_user($username);  // also takes a User object

// build the email
$body = _("Hello fellow human!");
maybe_email(... $body ...);

// restore current user's locale for any future messages
configure_gettext_for_user();

Same strings with multiple translations

In English, the string "PM" can mean either Project Manager or Private Message. The two abbreviations are unlikely to be the same in other languages. gettext supports the idea of context translations, where a string has multiple occurrences, but the translation depends on context. Use the pgettext() function to specify a context for a string.

Good:
echo pgettext("project manager", "PM");
echo pgettext("private message", "PM");
Bad:
echo _("PM");

Another place to use it is when strings are single word strings like "All" or "None". In English, what's being referred to is immaterial, but in gendered languages it is relevant. "Genre" and "Language" are different genders, both in French and in German.

Plurals

Translating plurals can be a challenge. To allow correct translation of plurals, use ngettext():

echo sprintf(
    ngettext(
        // TRANSLATORS: %1$d is a number of projects; %2$d is a number of days
        'You have %1$d project that has not been visited in the last %2$d days.',
        // TRANSLATORS: %1$d is a number of projects; %2$d is a number of days
        'You have %1$d projects that have not been visited in the last %2$d days.',
        $num_alert_projects
    ),
    $num_alert_projects,
    $pp_alert_threshold_days

Coding Securely

Linking to the current page with $_SERVER["PHP_SELF"]

A common idiom in PHP scripting is to have a link back to the current page, but with a different set of query parameters, or a HTML form with the current URL as the form's action attribute.

PHP puts the path part of the URL to the current page in the variable $_SERVER["PHP_SELF"], so at first it would seem that code like the following should be safe, since the user has no control over the script's path, and cannot put any dangerous characters in it.

 echo "<form method='GET' action='$_SERVER['PHP_SELF']'>";

However, many webservers (including Apache), it is possible for the user to add characters to the end of the URL path, and still be redirected to the same PHP script. So, for example, the following URLs will all access the same script, and the final one contains an example of an XSS exploit against the code fragment above.

  http://example.com/script.php
  http://example.com/script.php/foo_bar
  http://example.com/script.php/'><script>alert(42)</script>

To avoid this, whenever $_SERVER['PHP_SELF'] is written to any HTML, it should be treated as untrusted user input and escaped. The more secure way to do this is

 echo "<form method='GET' action='" . attr_safe($_SERVER['PHP_SELF']) . "'>";

However, since the default value of the 'action' attribute is the current document's address, the much simpler way to achieve the same effect is:

 echo "<form method='GET'>";