Common Drupal module developer beginner errors

Lately, I have been working with a few big and small web companies on Drupal-based projects. Whether it's for project analysis, training, debugging or performance, I often have to jump into other people's code which was rushed into production on short deadlines, as I have also received a lot of feedback.

Here are a few simple tips for beginner Drupal module developers who are in a rush and will very likely need to cut corners by lack of budget or time, but who must at least make sure that the next person will not panic when reading their code.

Sidenote: while a good project manager is essential in facilitating the whole process, they usually focus on facilitating (or moderating) the communication between parties (and taking care of paperwork, such as finalizing the contract and billing). Developers must also bear the responsibility of "delivering a project on time, on budget and with acceptable quality". (or am I making up excuses to justify cut corners? ;-)

[toc]

Where is the custom code?

In your sites/all/modules directory, create 3 sub-directories:

  • contrib: modules downloaded from drupal.org
  • custom: home made modules
  • patched: contrib modules which have been modified for some reason (usually because you applied a patch which is waiting to be reviewed in the project's issue queue, and each patched module has a clear documentation of why/how it was patched).

I have seen some projects with modules all over the place, to the extent where I have gotten used to run a "find ./ -name '*.module'" to find them, or to look in the "system" table of the site database.

Since I use Aegir for development and hosting, I also recommend putting all the commonly used modules (views, devel, admin_menu, etc) in sites/all/modules (and using "drush_make" to create those platforms). Less commonly used (including site-specific custom modules) should go in sites/[example.org]/modules. This way, we can quickly see the site-specific modules. Do not place modules in sites/default/modules, you may later on have multiple sites sharing that same Drupal code installation.

Custom code and namespace prefixes

A namespace is a short word that provides information about the context (ok, it's more than that, but I'm trying to be brief). It helps making sure that two modules do not use the same function or object name.

For modules specific to the site, I usually prefix modules names with initials or an abbreviation referring to the site. For example, on "bidon.ca" I use the prefix "bdn" ("bidon" is French for "barrel", "container" or "useless").

Some sample modules names could be: "bdnforms" (with all the hook_form_alter fun), bdnsnippets (for those small PHP snippets which may be included from nodes/blocks/views), etc.

The only exception would be if creating a sub-module. For example: uc_bdn would have the code for use-cases specific to the store on my site (all Ubercart modules are prefixed with "uc").

For service providers who re-use modules with various clients, instead of using a site-specific prefix, use a company-specific prefix.

Drupal.org is the reference for module names. Unless you have published your module on drupal.org, you should not expect the module name to be unique.

PHP snippets should be in files, not nodes/blocs/views, and don't forget about the cron

There are times when it is necessary to enter PHP code directly in a node, block or view. Usually it's a last-resort hack.

For example, maybe you need to write a very specific block to display on some pages. Usually this is done with a custom module that generates a block. Sometimes it may be a block that admins need to be able to update its content easily, but needs to include a few lines of PHP to calculate something dynamic (ex: the number of people who have signed a petition).

If you don't have time to create a full custom module, create a function for that PHP snippet and store it in a general module or in your theme's template.php. This way, the code will be easy to find, re-use, maintain and there will be less risk that a site admin deletes the code by mistake. Don't forget to pass any variable to that function that would otherwise be available globally in the node/block/view.

(thanks to mvc for this one!)

On putting PHP snippets in nodes and how it will likely break the cron: it is common to put a PHP snippet in a node in order to do a quick hack (run a db_query to fix a table, print something, etc.). However, when cron will run, it will try to index each node for the search engine, therefore executing the PHP code.

This is very hard to debug once it happens (I usually end up doing a query in the database for nodes who have a PHP input filter, or inserting debug messages in the core cron indexing function).

Here are two incidents I have noticed recently:

  • a PHP snippet in a node was used to quickly export the contents of a table. It was generating a CSV file then calling the "die" function, hence halting the cron (and allowing anyone to download the CSV file just by running the cron.php).

  • a PHP snippet in a node was doing a "drupal_goto" to redirect the visitor to another page (shame on me, I think I was responsible for that one).

In both cases, the cron would simply stop running and would not display an error in the watchdog. We only noticed a week later in the status page that the cron had not run for a long time.

Correct use of the SQL functions to avoid injection attacks

The database functions in Drupal is a pretty big topic. For those in a very big rush, consider especially the following bad code:

db_query('INSERT INTO mytable (email, newsid)
    VALUES (' . $_POST['email'] . ', ' . $_POST['newsid'] . ')');

The correct way in Drupal 6 should be:

db_query('INSERT INTO {mytable} (email, newsid)
    VALUES ("%s", %d)', $_POST['email'], $_POST['email']);

The correct way in Drupal 7 should be:

db_query('INSERT INTO {mytable} (email, newsid)
    VALUES (:email, :newsid)', 
        array(
            ':email' => $_POST['email'],
            ':newsid' => $_POST['newsid'],
        ));

First, developers often forget to wrap table names in curly brackets, i.e. {mytable}. This is not for security, but for making it easier to prefix tables names if necessary (ex: if using simpletest).

More importantly, in the initial example, the input was not validated at all. Hence, someone could enter the following string in the e-mail field: '; UPDATE {users} SET name = 'admin', pass = MD5('test') WHERE uid = 1; [...] and change the username and password to your superadmin account.

If you are not familiar with the topic, search the web for "sql injection attacks".

Note that the %s placeholder is for strings and %d is for decimal numbers, %f for floats, %b for blobs. In Drupal 7, it is no longer required to wrap %s placeholders in quotes. For more information: Drupal 7 database abstraction layer, or db_query in Drupal 6.

Note that it is very rare to have to manipulate $_REQUEST variables directly. Usually in the form API you have access to the $form_state['values'] variables.

You may also want to read about the drupal_write_record() function.

Custom scripts should use the Drupal API (don't hardcode database credentials!)

Whether it's custom migration scripts (in most cases the migrate module is much more practical), a quick script to fix something in the database or a custom cron that doesn't play well with hook_cron, beginner Drupal developers (which doesn't mean that they are beginner developers) may try to save time by not using the Drupal API. After all, if it's a write once, use once, who cares.

Here's how Drupal, or more precisely drush, can save you time:

  • use "drush php-script" to execute the script. This way, drush does a full bootstrap of Drupal and everything is available, including db_query(). So there is no need to hardcode the database credentials in the script (those credentials will likely change often as you copy a site from various dev environments, to staging, to production).
  • drush has a helpful drush_log function to log notices, success or error messages. Notices can be printed out from the command line by adding the "--debug" option.
  • you can use the full Drupal API from drush, and drush itself also has a very useful API for creating new users and manipulating fields, plus many drush sub-modules which can save you a lot of time.

Sanitize output

This is straight from the excellent writing secure code page on drupal.org:

A common mistake is to use arbitrary data in a call to "drupal_set_title":

drupal_set_title(t("Something: %name", $node->title));

Should be:

drupal_set_title(t("Something: %name", check_plain($node->title)));

As with any other situation where you need to display data which had been entered by users. Data is not always filtered when it is stored in the database, so it needs to be filtered when it is displayed since this can affect other users (especially site administrators). Fore more information, search the web about cross-site-scripting (XSS).

Use git for change management

Help track down bugs and avoid really silly mistakes by using change management for your code, themes and Drupal configurations.

My favorite trick is to version control Views by using the drush_views module to regularly export all views into a directory, then to commit the changes. This makes it easy to "diff" two revisions to see exactly what has changed and when. Using the features module may be a cleaner solution, and since Features generates a bunch of code, you can do version control on that too.

Git is easy to adopt. While the idea of having a "distributed source revision control system" may sound scary, in short, all it means is that you can have a git repository locally, without a server, or you can share changes to other servers too.

If you don't have any experience with Git or with other source code revision tools (CVS, SVN), at the very least, create a local repository.

On the command line, go to the top-level directory of your module or theme, then type:

git init
git add *
git commit -m "initial repository creation"

Every time you do a change, you can review and commit your changes:

git diff [optionally with the name of the file]
git add [name of file]
git commit -m "a summary description of your changes here"

And then you can review changes with:

git log

All that information to track the changes in your code are stored locally in a ".git" directory

Git is easy to install on most Linux distributions ("apt-get install git" on Debian/Ubuntu or "yum install git-core" on CentOS/Fedora). There is also a lot of documentation for Mac OSX and Microsoft Windows. Windows users may also want to look at tortoisegit.

There is also great documentation to learn Git:

  • git-scm.com, more specifically see the [documentation](http://git-scm.com/documentation] section.
  • The Pro Git book and website has a lot of freely available resources (the whole book is available online)

Different philosophies: how many code repositories?

There is no consensus on how to best manage the code:

  • one repository per custom module, theme, configurations? (This is what I do, since if I want to version-control everything include code from drupal.org, I will create a "drush_make" makefile and put that into my version control. Has the inconvenience that if you are pushing your code to a remote server, you must have a repository for each of those modules, which may be a bit annoying to manage.)
  • using git sub-modules? (Some people find this hard to use and confusing, not obvious how it works out when pulling modules from drupal.org)
  • one repository for all custom modules? or all the sites/example.org/ directory? (This has the advantage of reflecting the fact that your site is "one project" and everything must be at a same given version, without the hassle of tracking core and contrib modules)
  • one repository for the whole Drupal, include Drupal core (I find this unpractical when using Aegir, and do not feel the need to version-control contrib modules).

In my opinion all of these options are valid. Use whatever suits you.

Use an issue tracking system and cross-reference with your code

An issue tracker, or bug tracker, helps track the feature requests and bug reports that help evolve your project.

Instead of managing a pile scrap paper, napkins and random e-mails as a documentation for your project, enter them as feature requests and bug reports in your issue tracker. This is the best short and long term documentation for your project. Get your clients and colleagues to use it too. Instead of receiving e-mails from clients, ask them to open a new issue or comment in an existing one.

Redmine, for example, is a fast and simple to use. It installs easily with Debian (apt-get install redmine).

A few recommendations:

  • Refer to certain bugs or requests with their issue number, ex: "in feature #1234 about some-fancy-idea, ...". Even if it sounds weird, at least everyone can quickly access the issue and be sure of exactly what you are talking about. It also helps to keep discussions on focus.
  • In meetings, get people to update the issues as you go, otherwise the comments get lost on random pieces of paper.
  • A good issue title is probably as valuable as a good newspaper title. It also makes it easier to find old issues after some time.
  • Cross-reference your issues (related to, child ticket of, ...). Makes it easier to find related issues and get a global idea. For example, if you have a issue about the theming of your website, and your site has 10 sections to theme, open one "meta" ticket for all the theming, then one child ticket for each section. If there have been 10 bugs reported about a specific functionality, mark them as related, eventually it will be handy (avoids regressions).
  • Open one issue per topic. For example, if you had only one issue for the theming, but then in the comments you start having discussions about the bugs in the various sections, it will be very hard to see what is the overall status of the theming, and if the bugs reported in the comments really are being resolved.

Ticket workflow: the status of your issue will evolve until it is closed. Make sure everyone has the same understanding of what each status means. For example:

  • New: the issue was created (either by a developer or a client)
  • Open: the issue was accepted by a developer and work has started on the issue
  • Fixed: the issue is considered to be fixed by the developer
  • Closed: the client has accepted the fix (the client should always be the one who closes the issue, especially if she created the issue initially)

Should I go on? There are probably a lot of good references on this topic (please leave a comment if you have any suggestions).

Simplify the requirements, don't complicate the code

Often the client does not know what they want, want everything, they will learn as the project starts going live. The first iteration is always a big step for the client. The new project will have a lot of impact on their staff and users/clients, and they may want to change the priority of some aspects once they start testing/using it. Developers should not be afraid of raising issues about the relevance of feature requests.

A few rules:

  • If the requirements are hard to understand, don't work on assumptions, simplify the requirements first and have them validated by the client (write a short spec).
  • If edge cases are hard to automate, the client probably may not mind not automating it (will it provide a good return on investment?)
    • or postpone it to a second phase, where those edge cases will become more clear ("we can automate it later, but first see if it's really worth it, how often this issues comes up").
  • Compare to the existing/previous process. While the new one may not be perfect, is it better? How much do they want to invest to make it perfect? Can those improvements be made later?
  • How many people are affected by the edge cases?
  • Do you really want to have all those exceptions in your code? Can you move them to a separate sub-module?

Important for developers: make sure you clearly document edge cases in the code, with a reference to the issue number.

Use Drupal coding standards

I initially found the Drupal coding standards a bit ridiculous and annoying at first, especially the "two spaces" indentation. Just get used to it. There's nothing worse than having to read through code that has been formatted in many various ways.

Use the coder modules to validate your code. It has a lot of other good advice too.

Conclusion

What started out as an idea to write a quick "10 tips" article could almost be expanded into a book. It has helped me a lot to reflect on my work, and I hope it has given you a few ideas too. I didn't try to make it a how-to for anyone starting to work for Drupal, but rather a few words of caution for anyone already writing code, hacking, patching or tweaking Drupal code, especially to those who, like me, often write that code in a rush and forget about it afterwards.

I tried to stick to the essentials, avoid the obvious comments. I assume that if a specific topic has raised an interest you will search for web for more information. Feel free to leave a comment or contact me if you have any feedback.

Many thanks to fellow colleagues, especially those from Koumbit, who have been the biggest influence of my current work habits (well, at least, the good ones!).

Archives