may contain source code

published: • tags:

In my job I’m currently dealing with writing coding guides for several languages. That prompted me to sit down to do something I’ve been thinking about for quite a while: Compiling my personally preferred way of writing C++ into my very own coding guide – a.k.a. this article.

As I see it a coding guide consists of two parts:

  • The style guide contains rules and guidelines about formatting and naming. This part comprises the bulk of the whole coding guide.
  • The language guide is about the big C++ dos and don’ts. Here belong decisions about the central few best practices as well as project specific restrictions. (For obvious reasons this article does not contain anything project specific.)

A coding guide should be short and to the point because hardly anyone reads these things for fun. Judging from my own tolerance level about half an hour (just the rules without a rationale) seems like a reasonable compromise between conciseness and completeness. Automation is a great help because everything you automate does not need to be discussed in the guide in detail. Especially all the basic formatting rules – brace placement, whitespace, etc. – are ideal to be handled by a tool like clang-format. After all, in 2019 there is no excuse for formatting your code by hand.

The language guide part is a bit tricky. Many coding guides let this part balloon into something akin to a general C++ best practices course. That’s counterproductive. For once it will get too long and people will skip it. And how are you supposed to recognize the few items important for the project among all the clutter? If you really want to have general best practices mentioned, link to the Core Guidelines, but no more than that.

I’m of two minds about a rationale. On the one hand a coding guide is a prescriptive document. It should be about the what and the how, not about the why. On the other hand, explaining the reasons for a rule can help to understand it better and make it more acceptable. In a real project I’d probably include the rationale as a separate document. For this article I decided to split rules and rationale and link back and forth between them.

I have general software development in mind where – in my opinion – a pragmatic approach is the way to go. I’m not talking about safety-critical systems or any field where you might encounter words like MISRA in earnest. Those worlds are completely different.

Rules and Guidelines

Formatting

General Formatting

Auto-format all source code files with clang-format. Disabling clang-format temporarily with // clang-format off and // clang-format on comments is fine if used sparingly and for a good reason.

If you are interested in the details read the definitions in the _clang-format file. Here are the cornerstones:

  • The indent depth is 4 spaces. Tabs are not used.
  • The maximum line length is 100 characters.
  • Braces are based on the Stroustrup style.

A typical code snippet looks like this:

void some_function_with_a_longish_name(
        bool miau,
        int woof,
        const SomeClassOrOther& cow)
{
    if (not miau && woof < 1) {
        cow.moo();
    }
    else {
        miau = true;
        ++woof;
        throw BallOfTwine();
    }
}

Some general rules cannot be covered by clang-format.

  • Don’t be afraid of vertical whitespace. Having too little tends to damage readability quickly. Use blank lines to group logically connected code snippets. Additionally blank lines around control structures are almost always a good idea.

    // probably too few blank lines
    const auto found = find_the_thing();
    if (found != end) {
        return calculate(*found);
    }
    return std::nullopt;
    
    // usually better
    const auto found = find_the_thing();
    
    if (found != end) {
        return calculate(*found);
    }
    
    return std::nullopt;
    
  • Do not put multiple declarations on one line.

    // Do not declare several variables on one line.
    std::string miau, moo, woof;
    
    // Always use one line per declaration.
    std::string miau;
    std::string moo;
    std::string woof;
    
  • Don’t omit braces in conditionals and loops, even if you could.

    if (is_empty())  // not like this
        return;
    
    if (is_empty()) {  // always like this
        return;
    }
    
  • For negation prefer the not keyword over the exclamation mark because the single ! character is easy to miss when quickly scanning the code.

    // Avoid. The ! is easy to miss.
    if (!condition) {
        do_something();
    }
    
    // Better. Less error prone.
    if (not condition) {
        do_something();
    }
    

Including Headers

Use #pragma once to guard against multiple inclusions of the same header. Sometimes there is a solid reason for using an include guard macro instead. In those cases make the macro’s name as random as possible to emphasize its purpose as a unique identifier without any semantic meaning. A stringified UUID works nicely.

Using the #include command is governed by the following rules:

  • Use #include with angle brackets for the project’s public headers and all 3rd party headers. For example #include <project/some_header.hpp>.
  • Public headers must always specify the full path starting with the project name.
  • Use #include with quotes only for private headers. For example #include "internal.hpp".
  • Do not access parent directories when including a header, i.e. .. must not appear in the path to a header. Only whitebox unit tests may break this rule.
  • In a cpp file include its main header first, then all other headers separated by a blank line. Clang-format takes care of sorting the includes.
// start of *miau.cpp*
#include <project/miau.hpp>

#include "internal_types.hpp"
#include <project/moo.hpp>
#include <vector>

Class Definitions

In general order the content of a class definition like this:

  • Put the public section first, then protected, then private. Unneeded sections can be omitted.
  • Preferably each class should have at most one of each section.
  • Within each section use this order:
    • For a class: types (both aliases and nested type declarations), member functions, static member variables, non-static member variables.
    • For a struct the order of member functions and member variables is reversed. See Classes vs. Structs below for more details about the reason.
  • Put friend declarations at the very end of the class definition.
/** A nicely formatted example class */
class LoremIpsum : public DolorSitAmet
{
public:
    using Iterator = Consectetur<LoremIpsum>;

    explicit LoremIpsum(std::string name);
    ~LoremIpsum();

    Iterator begin();
    Iterator end();

protected:
    void update_adipiscing();

private:
    struct Payload
    {
        Tempor header;
        std::vector<Incididunt> body;
    };

    std::error_code send(Payload&& lots_of_stuff) const;

    static std::size_t adipiscing;
    std::string m_name;

    friend class Aliqua;
};

Comments

Write comments in English.

  • Use documentation comments (a.k.a. Doxygen comments) to describe the project’s public API. In other words: Doxygen comments document the public and protected parts of the code in public header files.
  • Use implementation comments in cpp files, in private headers and for private/internal parts of public headers.

Doxygen comments become part of the generated project documentation. They can use Doxygen syntax; but be wary of unnecessary redundancy, especially when using formal syntax to describe something like a function signature. A well defined function should convey most of the information a caller needs through its signature alone. A good Doxygen comment only adds information that cannot easily be represented in the signature, like pre/post conditions or a list of thrown exceptions.

Primarily all comments are intended to be read in source files. So, if in doubt, prefer readability in the file over flawless formatting in the generated documentation.

Format comments as follows:

  • For Doxygen comments always use the slash-star-star style /** */ and start Doxygen commands with an @.
  • A single-line slash-star comment separates comment text and start/end token with one space each.
  • In a multi-line slash-star comment the start/end tokens stand on their own lines. The comment text is not indented and does not use any leading decoration characters.
  • As the markup language for formatting the comments’ text use reStructuredText and the Sphinx extensions for reST.
/**
A multiline Doxygen comment.

When using Doxygen syntax elements to describe something like
a function signature, be wary! Unnecessary redundancy creeps in
faster than you might think.

@throws ``BarFailed`` if things go sideways.
*/
bool bar(Key bar_id);

/** Even Doxygen one-liners do not use the ``//`` style. */
void foo();

/* An implementation comment; note the single opening star. */
std::optional<Cat> fetch_cat(const std::string& cat_name)
{
    // This is an implementation comment, too.
    // It even has a second line.
    if (herd_of_cats.contains(cat_name)) {
        return herd_of_cats.grab(cat_name);  // Good luck!
    }

    return std::nullopt;
}

Naming

Use English for all names unless you have a compelling reason not to.

Notation

The C++ code uses three notation styles, mainly to distinguish between types, non-types and macros.

  • snake_case: lowercase letters, underscores for separating words
  • PascalCase: leading uppercase letter, uppercase letters for separating words, no underscores
  • ALL_CAPS: uppercase letters only, underscores for separating words

Apply the styles as follows:

  • Use PascalCase for types and related things. That includes the names of classes, structs, unions, enums, usings, typedefs, type template parameters, concepts, and similar things.
  • Use ALL_CAPS for macros – and only for macros. Each macro name is prefixed with the project name, for example PROJECT_PUBLIC_API.
  • Use snake_case for all other names. That includes the names of functions, variables, constants, namespaces, enumerators, non-type template parameters, etc.

Naming Patterns

General Naming Patterns
  • In general use nouns for types and verbs for functions.
  • If possible prefix boolean identifiers used as predicates with is, has, can or similar. For example: is_visible, has_children. If a clarifying noun is required, stick to natural English syntax and put the noun at the front. For example: file_is_empty, not is_file_empty.
  • For setters and getters use Qt-style prefixes, i.e. the set prefix for setters and no prefix for getters. For example: set_filepath() and filepath().
  • Prefix private or protected member variables with m_, for example m_child_nodes. In contrast, never use that prefix for public variables.
  • Do not capitalize acronyms in identifiers. For example, call a class for an HTTP connection HttpConnection, not HTTPConnection; call a local variable containing a JSON document json, not JSON. This rule only applies to acronyms, not camel-caseish names. For example, CmakeParser just looks wrong compared to CMakeParser.
  • Call the namespace for internal implementation details in headers detail. In cpp files prefer anonymous namespaces for the same purpose.
Classes vs. Structs

Except for their default access level structs and classes are the same in C++. However, the two names can still be used to convey information. Use struct for simple data containers with all-public member variables. Inheriting from other structs is ok as long as no virtuals are involved. Simple member functions (e.g. is_empty()) are also perfectly acceptable. Ask yourself:

  • Does the type need to maintain any invariants? (Implies that not all members can be public.)
  • Does the type need a user-defined constructor or destructor?

If the answer to both questions is “no” make it a struct. Otherwise make it a class. In other words: Use class for types implementing complex stateful behaviour.

Inheritance

In general no special naming for classes with a certain role in an inheritance hierarchy exists. In particular do not prefix the names of (abstract) base classes or classes with only pure virtual member functions (interfaces). Name them like you would any other class.

However, this rule does not apply to base classes that only act as bases and are not used for anything else. They are not instantiated directly, they are not used as function parameters, return types, etc. To use such classes properly they must be derived from. Prefix the names of such classes with Basic.

Class templates intended to be used primarily through aliases follow the same rule. std::basic_string<> with its aliases like std::string is a good example. In fact that’s where the Basic prefix comes from.

Templates

In general treat class templates, function templates, variable templates and alias templates like their non-templated counterparts. Only template parameters have rules of their own:

  • To denote type template parameters use typename. Avoid class.
  • Avoid single-letter names like T whenever possible. Prefer meaningful names, especially when you have more than one template parameter. However, for a single template parameter that can be just about any type T can be an appropriate name.
  • End the names of type template parameters with a capital T, for example ValueT.
  • Make the names of template parameter packs plural. For example ValueTs for a type parameter pack and values for a non-type parameter pack.
  • When interacting with 3rd party code, especially the standard library, you might have to break naming conventions. For example, the STL might require a using value_type when the naming rules would demand a using Value. In these cases prefer to implement both versions.

To summarize:

template<std::size_t item_count, typename ValueT>
class SillyArrayWrapper
{
public:
    // Variadic function template with a type parameter pack.
    // Shows the common case of accepting an arbitrary number of
    // arguments. In such a case use the names ArgTs and args
    // as below.
    template<typename... ArgTs>
    SillyArrayWrapper(ArgTs... args);

    // variadic function template with a non-type parameter pack
    template<ValueT... values>
    SillyArrayWrapper();

private:
    std::array<ValueT, item_count> m_values;
};
Enums
  • In general prefer enum classes. Only use an unscoped enum for a specific reason.

  • With unscoped enums use the enum name as a suffix for each enumerator:

    // Suffix unscoped enumerators.
    enum Colour { red_colour, green_colour, blue_colour };
    
    // Do not suffix enum class enumerators.
    enum class Colour { red, green, blue };
    

    However, sometimes the enumerator names being part of the surrounding scope can be a desired effect. In that case you may omit the suffix.

Source Code Files

  • Use UTF-8 as the encoding for all source code files.
  • Use Unix stlye line endings for all source code files.
  • Use snake_case for file and directory names. When naming a file like the main class it implements transform PascalCase to snake_case by inserting underscores before the inner capital letters: for example you would implement a class ComboBox in a file called combo_box.cpp.
  • As file extensions use .cpp for source files, .hpp for C++ headers and .h for C-compatible headers. The latter only applies to headers that might actually be called from C, not for headers that just happen to contain only C-compatible code. As a result .h is mostly useful to indicate headers being part of a C API.

C++ Usage

This section describes the most important points about writing safe C++ in the preferred style. The C++ language changed dramatically with C++11. A lot of things got better and safer, or started to become possible. Accordingly C++11 is the low bar. Compatibility with older C++ standards is neither necessary nor desirable. Delving into the question of how to write modern, idiomatic C++ would go vastly beyond the scope of this coding guide. Look into the Core Guidelines for a general reference for today’s best practices.

Safety

  • Owning raw pointers are banned except in these situations:

    • As the pointer to the managed resource in an RAII type. Still consider if that pointer can reasonably be a smart pointer.
    • When working with a 3rd party lifetime management system that requires them, for example the Qt parent mechanism.
    • To some degree when interfacing with C APIs. Owning raw pointers received from a C API have to be passed to an RAII wrapper as early as possible. When passing pointers to a C API always make sure that an RAII mechanism is involved on the C++ side.
  • C-style casts are banned. Use the appropriate C++ cast instead.

    double a = 1.2345;
    int b = (int)a;               // Wrong!
    int c = static_cast<int>(a);  // OK!
    
  • Avoid C-style arrays as much as possible. Use std::array instead.

  • When overriding a virtual member function mark it either override or final.

  • Do not expose the state of a class for direct mutation, i.e. do not use public non-const member variables or getter functions returning a pointer or reference to non-const. Protected member variables can be ok if they do not affect any of the class’s invariants. But keep their number as low as possible. Note: In contrast a struct has no invariants and should have all-public member variables.

    class Cat
    {
    public:
        // These are ok.
        Miau voice() const { return m_voice; }
        const Miau& voice2() const { return m_voice; }
        const Miau* voice3() const { return &m_voice; }
    
        // Don’t!
        Miau& voice4() { return m_voice; }
        Miau* voice5() { return &m_voice; }
    
        Mouse food;            // Don’t!
        const Bird more_food;  // Can be ok.
    
    protected:
        bool m_is_hungry;  // Can be ok.
    
    private:
        Miau m_voice;
    };
    

Style

  • Use auto with a bit of caution. Explicit type names are important for clarity. auto is most useful when the exact type is not important (iterator types are the canonical example) or when it is long enough to hurt readability.

  • Use west const.

    const int foo = 42;  // Like this.
    int const bar = 23;  // Not like this.
    
  • Use using to declare type aliases. Only fall back to typedef for C APIs.

Rationale

This section explains the rationale for the rules above. It is not exhaustive. Explanations are given where the reason for a rule isn’t immediately obvious and when there’s a bit more to it than subjective stylistic preference.

Formatting

General Formatting

The formatting rules in this style guide are unavoidably subjective. But they do follow the guiding principle of producing clearly structured, readable code while not sacrificing compactness too much. Line length and layout of parameter lists illustrate that principle nicely.

Lines are long enough to not force too many breaks in constructs that belong together – addressing clarity –, but short enough to fit a header and cpp file on the screen side by side – addressing compactness (and assuming a run of the mill FullHD monitor and a common font size).

If a parameter list can fit on one line in its entirety, it stays as a single compact line. For longer lists some compactness is sacrificed in favour of clarity. On the one hand putting each parameter on its own line makes the list longer overall. On the other hand all parameters start at the same column and form an easily recognizable block.

Following these and many more rules manually would be extremely tedious, not to mention impossible to do consistently. That’s where auto-formatting comes in. Clang-format became the tool of choice because it is one of the most widespread formatters in the C++ world.

Including Headers

Include guard macros are discouraged because pragma once, all things considered, is the simpler mechanism. Guard macros are both more hassle and more error prone because the programmer is responsible for ensuring unique names not only across the whole project, but also across all dependencies. Pragma once is not standard C++, which is a disadvantage. But compiler support is excellent. Overall the impact on portability is minor at best.

The rules about the #import command are subjective stylistic choice to some degree. More importantly they aim to enforce a clear, hierarchical project structure and a clear distinction between public and private headers. Also they ensure that public header includes do not break when the project is installed. The angle bracket rules lead to working paths during development as well as in the installed location.

Class Definitions

The order of sections in a class definition is based on the assumption that the class is used more often than changed. The user of a class is primarily interested in its interface to the outside world. That’s why the public section comes first, followed by the protected section. Both of them are aimed at the outside world in contrast to the inner workings of the class.

The order within a section is determined by two ideas:

  • Behaviour versus data:
    • A class emphasizes behaviour. Accordingly member functions precede member variables.
    • A struct emphasizes data. Accordingly member variables precede member functions.
  • Types come before everything else because they tend to be prerequisites for some member function/variable declarations, and C++ language rules require declaration before use.

Naming

Notation

Using different notation styles for semantically different things helps to easily recognize what kind of thing a name belongs to. Yes, the C++ standard library misses this opportunity with its snake_case everywhere approach and deserves to be criticized for it. However, too many notation styles can be hard to remember and can lead to confusion rather than being helpful. The three notations used in the style guide are a compromise to keep the rules simple and to make important things easily recognizable.

  • The default is snake_case to keep in line with the familiar style used by the standard library as well as a lot of big and important projects like Boost.
  • Types are an essential concept in C++ and deserve their own notation – PascalCase – to make them instantly recognizable even without any further context.
  • Macros are dangerous because as textual replacements they don’t obey the C++ language rules and can dramatically change the meaning of innocent looking code. They need to stand out to grab the programmer’s attention. Accordingly the ALL_CAPS notation is reserved for them. For the same reason enumerators are not written in all caps, although that is a relatively common style. There is nothing special or alarming about an enumerator that justifies drawing the programmer’s attention to it.

Naming Patterns

General Naming Patterns

The general naming patterns contain quite a bit of stylistic preference. That is not addressed here. The following points explain the functional reasons behind the rules.

  • Prefixing predicates addresses the problem demonstrated by the ambiguous empty() function. Is this the verb empty and the function removes all elements from the container? Or is it the adjective empty and answers the question whether the container is empty or not? Prefixes avoid the ambiguity.
  • The m_ prefix for member variables is meant to make them stand out a bit and to decrease the likelihood of unintentional shadowing by function parameters or local variables.
  • Not capitalizing acronyms is mostly stylistic, but also prevents the problem arising when a name consists solely of the acronym. That would lead to an all caps name, clashing with the notation for macros.
Inheritance

Avoiding name fragments related to the technical function of a class (like the Abstract or Base prefixes) keeps clutter out of the names and focuses more tightly on good names from the application domain. Also, such fragments offer a too convenient way out of naming conflicts. When finding distinct names becomes challenging, that’s a likely sign for a design problem. Hiding these signs is counterproductive.

To give a concrete example: If you have a base class for a network connection, calling it Connection should be sufficient. At the point of use it should not matter what exact role it plays in an inheritance hierarchy. The key point is that it represents a connection. Also, derived classes should add some kind of context that makes finding a different name easy. Something like a Connection inheriting from AbstractConnection is suspicious. Either it’s bad naming or the distinction between base and derived class is not meaningful. In contrast a base class Connection inherited by HttpConnection and SshConnection expresses clearly what’s going on.

Templates

Regarding naming, templates aren’t that special. Just template parameters have a few idiosyncrasies that need to be addressed.

  • Unlike with other names single-letter template parameters are quite common – above all the ubiquitous T. For that reason it’s explicitly stated when such short names help clarity and when they tend to be detrimental.
  • The trailing T for parameter names introduces a standard way to distinguish them from an associated using. Those are quite common because template parameters cannot be accessed directly from outside the class template.

Source Code Files

UTF-8 was chosen because it is arguably the established default text encoding. Similar reasoning applies to the .cpp file extension. For headers using both .hpp and .h makes it possible to distinguish between C++ headers and C headers. They are two different languages, after all. The other rules are mostly a matter of picking some standard to ensure consistency.

C++ Usage

Safety

C++ has a lot of features and quirks that make it possible and even easy to write unsafe code. The language hands you the rope, and you’ll most likely hang yourself with it if you’re not careful. This section of the guide is all about not tying the rope into a noose. It focuses on constructs that shift the burden of detecting programming mistakes from the compiler to the programmer.

  • Owning raw pointers are synonymous with manual memory management, which – experience has shown – is impossible for humans to get right in non-trivial programs. C++11 introduced greatly improved mechanisms to get rid of owning raw pointers. Smart pointers automate resource management and move semantics make it easier to efficiently use values instead of pointers.

    A few areas are left where owning raw pointers are still reasonable and allowed. The obvious one are RAII types. By their very nature they are designed to own resources and manage their lifetimes.

    Another area covers 3rd party lifetime management systems designed before the release of C++11. The Qt parent mechanism is a well known example. Such systems have to work with owning raw pointers to some degree, but essentially they serve the same purpose as smart pointers.

    C APIs are the third area. Since C does not have any automatic resource management, owning raw pointers cannot be avoided entirely. But the earlier RAII gets involved on the C++ side the better.

  • C-style casts are dangerous because they implicitly try different types of cast until one succeeds, and as a last resort fall back on reinterpreting the bits. This behaviour robs the compiler of every possibility to report misuse and opens the door wide into the land of undefined behaviour. C-style casts are banned entirely because either an appropriate C++ cast exists, or casting is the wrong tool for the job in the first place.

  • C-style arrays are dangerous because they do not track their own size and decay to a pointer at the first opportunity. Like C-style casts that reduces the compiler’s possibilities to detect misuse, and those mistakes tend to result in undefined behaviour.

  • Traditionally overriding virtual functions has pitfalls. A simple typo can be enough to introduce a new function instead of overriding the one from the base class. Consistent use of the override and final keywords in combination with changing the “missing override” compiler warning into an error can mostly eliminate this kind of mistake.

  • Avoiding public/protected member variables aims at keeping object state encapsulated and invariants enforceable. Functions providing direct mutable access to a member variable fall into the same category because their effect is the same. They make internal state publicly availabe. Invariants are the real problem here. If an object exposes a mutable piece of state it has no control whatsoever over how that state is changed. Changes aren’t even easily detectable anymore. The rule is relaxed for protected state because base and derived class are relatively tightly coupled anyway.

Comments