CMake/Language Syntax

From KitwarePublic
< CMake
Revision as of 18:42, 18 August 2009 by Shdwjk (talk | contribs) (→‎Join a list with semicolons.: Added missing word 'have' to last sentence.)
Jump to navigationJump to search

CMake has its own basic scripting language. A new user of CMake, learning this language, mostly needs to know about commands like SET, IF and MESSAGE. Sometimes, the details become lost. How does one manipulate variables? How do quotes and escapes work? Why do FOREACH and MACRO parameters behave strangely? What is the syntax?

CMake bug 6295 asked for documentation of the core syntax. Thus the CMake web site now has a quick introduction to CMake syntax.

This wiki page (formerly [1]) contains an overview of the CMake language, its quirks and its limitations. Anyone may contribute to this page. You may edit, rewrite, refactor, correct or otherwise improve this page. Help other CMake users understand the language and avoid mistakes, contribute to this wiki.

There is another wiki page, CMake:VariablesListsStrings, that has more details about variables and regular expressions.

cmake -P

The script mode cmake -P <script> allows you to run arbitrary CMake scripts (except that all commands related to Makefile generation or the CMake cache will fail). You may use this tool to experiment with the language, or to help create examples for this page. Write your script in a text editor and test your script with cmake -P something.cmake.

If your system is Unix, then cmake -P /dev/stdin is a good way to play with language: you type your script, type Control-D for end-of-file, and the script runs. The Windows equivalent is to run cmake -P con, and to use Control-Z (on a new line, followed by return) for end-of-file.

Listfiles contain comments and commands.

We refer to each CMake script or CMakeLists.txt file as a listfile. The basic syntax of a listfile is extremely simple: a listfile may contain only commands and comments. Comments start with a # character and stop at the end of a line. You may freely mix comments with commands; comments may between or amid commands:

# This is a comment.
COMMAND( arguments go here )
ANOTHER_COMMAND() # this command has no arguments
YET_ANOTHER_COMMAND( these
  arguments are spread         # another comment
  over several lines )

The names of commands are case insensitive; the usual convention is to type the command names in uppercase. However, the arguments are case sensitive. Thus MESSAGE and message and Message and mESsAgE are the same command:

MESSAGE( hi ) # displays "hi"
message( hi ) # displays "hi" again
message( HI ) # displays "HI"

The whitespace around the ( and ) parens is optional, all of these commands are the same:

MESSAGE(hi)
MESSAGE (hi)
MESSAGE( hi )
MESSAGE (hi )

Commands are procedure calls and cannot return values. However, you may pass the name of a global variable to some commands, and those commands will store the result there. For example, the MATH( EXPR ... ) command takes three arguments; the third argument is the expression, and the second argument is the variable to store the result:

MATH( EXPR x "3 + 3" ) # stores the result of 3 + 3 in x
MESSAGE( "x is ${x}" ) # displays "x is 6"
                       # using quotes so MESSAGE receives only one argument

CMake variables

In versions of CMake<=2.4, almost all variables have global scope within a directory (and in that directory's subdirectories); a notable exception is the FOREACH command, the loop variable of which is local only to the FOREACH body. Also, the parameters (including ARG0, ARG1, ARG2, ARGC, ARGN, and ARGV) of user defined macros are not variables, per se; their values must always be used directly: IF(${some_parameter}) instead of IF(some_parameter).

CMake 2.6 introduces functions, which are essentially macros that have local scope and use real variables (even ARG0, ARG1, ARGC, ARGN, and ARGV are treated as real variables); IF(some_parameter) is perfectly acceptable. Most importantly, global variables are read-only within functions. Whenever a variable is set within a function body, that variable only has local scope, masking any similarly named global variable; outside of the function body, the global variables remain unchanged.

See CMake:VariablesListsStrings about how the ADD_SUBDIRECTORY or SUBDIRS command affects variables in a CMake project.

Values

Variables always contain strings. Sometimes, we use the string to store a boolean, a path to a file, an integer, or a list.

SET and substitution

Use the SET command to set the value of a variable to some string. The SET command takes two arguments: the name of the variable and its new value, SET(variable value). To access the value of a variable, you perform a substitution. To perform a substitution, use the syntax ${variablename}:

SET( x 3 )
SET( y 1 )
MESSAGE( ${x}${y} ) # displays "31"

If more than one value is specified, each value is concatenated into one string in which semicolons serve to separate the indivual values. This kind of string is called a list in CMake and can be used with the LIST command. You may use quoting to avoid this:

SET( x a b c   ) # stores "a;b;c" in x      (without quotes)
SET( y "a b c" ) # stores "a b c" in y      (without quotes)
MESSAGE( a b c ) # prints "abc"   to stdout (without quotes)
MESSAGE( ${x} )  # prints "abc"   to stdout (without quotes)
MESSAGE("${x}")  # prints "a;b;c" to stdout (without quotes)
MESSAGE( ${y} )  # prints "a b c" to stdout (without quotes)
MESSAGE("${y}")  # prints "a b c" to stdout (without quotes)

In fact, the following are equivalent:

SET( x a b c )
SET( x a;b;c )
SET( x "a;b;c" )
SET( x;a;b;c )

However, this does 'not' work:

SET( "x;a;b;c" )

Of course, the arguments to SET itself can come from substitutions:

SET( x y A B C )              # stores "y;A;B;C" in x (without quote)
SET( ${x} )                   # => SET( y;A;B;C ) => SET( y A B C)
MESSAGE( ${y} )               # prints "ABC" to stdout (without quotes)
SET( y x )                    # stores "x" in y (without quotes)
SET( ${y} y = x )             # => SET( x y )
MESSAGE( "\${x} = '${x}'" )   # prints "${x} = 'y;=;x'" to stdout (without quotes)
SET( y ${x} )                 # => SET( y y = x ) => stores "y;=;x" in y (without quotes)
MESSAGE( ${y} )               # prints "y=x" to stdout (without quotes)

The value can be the empty string, "":

SET( x "" )

If there is no value specified, then the variable is UNSET, so that it is like the variable never existed:

SET( x )
IF(DEFINED x)
  MESSAGE("This will never be printed")
ENDIF(DEFINED x)

However, this is more obviously performed with the UNSET command:

UNSET( x )

Substitution of command names

Substitution only works within arguments to commands. In particular, the name of a command cannot include a substitution; so you cannot store the name of a command in a variable, then run that command. For example, this does not work:

SET( command MESSAGE )
${command}( hi )
# syntax error

If you insist, the available workaround is to write the command to a temporary file, then INCLUDE the file to run the command.

SET( command MESSAGE )

# ${command}( hi )
FILE( WRITE temp "${command}( hi )" ) # writes "MESSAGE( hi )" to ./temp
INCLUDE( temp )                       # ./temp is an unsafe temporary file...
FILE( REMOVE temp )

CMake splits arguments unless you use quotation marks or escapes.

Each command in a listfile takes zero or more arguments. CMake separates arguments by whitespace, performs substitutions, then separates arguments by semicolons. That is, if you are just typing literal arguments, then you may use either whitespace or semicolons to separate them. However, if you want a variable substitution to produce multiple arguments, then you need to use semicolons in the variable.

To shell script hackers, this is not the same as setting IFS=\; in a Bourne shell that follows POSIX. (Because ; is also a shell operator, we escape it with a backslash.) The difference is that the Bourne shell will only use the ; delimeter within substitutions, while CMake will use it everywhere:

# shell script
countargs() {
  echo $?
}
IFS=\;
countargs a\;b\;c    # displays "1"
ARGS=a\;b\;c
countargs $ARGS      # displays "3"

# CMake
MACRO( COUNTARGS )
  MESSAGE ( ${ARGC} )
ENDMACRO( COUNTARGS )
COUNTARGS( a;b;c )   # displays "3"
SET( ARGS a;b;c )
COUNTARGS( ${ARGS} ) # displays "3"

Importantly, the MESSAGE command has the habit of concatenating all of its arguments! (This is why many examples on this page use quoting so that MESSAGE takes only one argument.) The following three commands are effectively the same:

MESSAGE(This is practice.)             # prints "Thisispractice."
MESSAGE(  This   is    practice.     ) # prints "Thisispractice."
MESSAGE( This;is;practice. )           # prints "Thisispractice."

Quoting

You can preserve whitespace, semicolons and parens by quoting entire arguments, as if they were strings in C:

MESSAGE( "This is practice." )  # prints "This is practice."
MESSAGE( "This;is;practice." )  # prints "This;is;practice."
MESSAGE( "Hi. ) MESSAGE( x )" ) # prints "Hi. ) MESSAGE( x )"

You can also quote parts of arguments. CMake has to decide whether or not the quotes or quoting syntax (as in shell script) or literal quotes. The rules in CMake 2.4 are:

  1. If the argument begins with a quote, then the quotes are quoting syntax.
  2. Otherwise, if the quotes surround whitespace, semicolons or parens to preserve, then the quotes are quoting syntax.
  3. Otherwise, the quotes are literal quotes.
Input Output
MESSAGE( "Welc"ome ) # rule 1 Welcome
MESSAGE( Welc"ome" ) # rule 3 Welc"ome"
MESSAGE( Welc"ome)" ) # rule 2 Welcome)
MESSAGE( ""Thanks ) # rule 1 Thanks
MESSAGE( Thanks"" ) # rule 3 Thanks""

Quoting does not prevent substitutions. It does, however, prevent CMake from splitting arguments at the semicolons, and allows you to pass empty strings as arguments:

SET( SOURCES back.c io.c main.c )
MESSAGE( ${SOURCES}   )      # three arguments, prints "back.cio.cmain.c"
MESSAGE( "${SOURCES}" )      # one argument,    prints "back.c;io.c;main.c"
MESSAGE( "" )                # one argument,    prints "" an empty line
MESSAGE( "${EMPTY_STRING}" ) # one argument,    prints "" an empty line
MESSAGE( ${EMPTY_STRING} )   # zero arguments,  causes CMake Error
                             # "MESSAGE called with incorrect number of arguments"

Escapes

A backslash \ in the arguments to a command will start an escape sequence. To use literally any of "()#$^, escape the character with a backslash. You may also escape a space (instead of quoting it) and you may use \\ for a literal backslash.

MESSAGE( \\\"\ \(\)\#\$\^ ) # this message contains literal characters
MESSAGE( \# not a comment )
MESSAGE( \${NotAnExpansion} )
SET( rightparen \) )

There are a few other escape sequences, including \n for newline, and possibly others that I forgot. All other escape sequences are invalid and cause a CMake error.

CMake supports boolean variables.

CMake considers an empty string, "FALSE", "OFF", "NO", or any string ending in "-NOTFOUND" to be false. (This happens to be case-insensitive, so "False", "off", "no", and "something-NotFound" are all false.) Other values are true. Thus it matters not whether you use TRUE and FALSE, ON and OFF, or YES and NO for your booleans.

Many scripts expect a string to either be false or contain a useful value, often a path to a directory or a file. You have a potential but rare problem if one of the useful values coincides with falseness. Avoid giving nonsensical names like /tmp/ME-NOTFOUND to your files, executables or libraries.

IF and WHILE control the flow.

Any scripting language provides conditionals and loops. In CMake, the IF command provides conditionals, while the WHILE command provides a loop. These two commands also provide the way to evaluate boolean expressions.

When CMake sees an IF command, then it checks the arguments to the IF command. Is the condition is true, then it runs the command between the IF and matching ENDIF commands, else it skips them.

The WHILE command works like the IF command except that it is a loop: after CMake runs the commands between the WHILE and ENDWHILE commands, CMake jumps back to the WHILE command to check whether to run the loop again. Here is an example:

SET( number 4 )
# if ${number} is greater than 10
IF( number GREATER 10 )
  MESSAGE( "The number ${number} is too large." )
ENDIF( number GREATER 10 )
# while ${number} is between 0 and 11
WHILE( number GREATER 0 AND number LESS 11 )
  MESSAGE( "hi ${number}")
  MATH( EXPR number "${number} - 1" ) # decrement number
ENDWHILE( number GREATER 0 AND number LESS 11 )

If number is too large, this listfile complains. Otherwise, the loop counts down toward zero and says hi to each number. If number starts at 4, then the messages are "hi 4", "hi 3", "hi 2", "hi 1".

Note that CMake is very strict, and requires the matching IF and ENDIF (or WHILE and ENDWHILE) commands to have the same arguments. This strictness is not necessary to the language (because CMake may count IF and ENDIF commands to determine the nesting of conditions, similar to how ShellScript counts if and then and fi commands) but it catches programmer errors. There is a way in CMake to disable this strictness, so that a simple ENDIF() works.

The arguments to IF and WHILE commands may contain substitutions, as any other command:

SET( number 4 )
SET( operation GREATER )
SET( limit 10 )
IF( number ${operation} ${limit} )
  MESSAGE( "Oops, ${number} is ${operation} than ${limit}." )
ENDIF( number ${operation} ${limit} )

The syntax for boolean expressions is given in the CMake manual page, in the section for the IF command. This includes the order of operations (for example, GREATER and LESS have higher precedence than AND). Note that the operations are case sensitive, because they are arguments: GREATER is an operation, greater is not.

Join a list with semicolons.

A SET command with more than two arguments will join the second and after arguments with semicolons:

SET( letters a b c d ) # sets letters to "a;b;c;d"

A string that contains semicolons is a list. An empty string "" is a list of zero elements, while a nonempty string that contains zero semicolons is a list of one element. For example, you might have a list of source files for a particular target. CMake does not have arrays like Perl or shell script, or linked lists like Lisp; the expectation is that your CMake scripts will use semicolon delimiters in lists.

CMake provides a LIST command that performs basic operations with list.

CMake splits arguments at semicolons, thus a list will split into multiple arguments. You may store arguments in a list and use them with a later command:

SET( expression 4 LESS 10 ) # ${expression} is now "4;LESS;10"
IF( ${expression} )         # expands to IF( 4;LESS;10 )
  MESSAGE( "CMake believes that 4 is less than 10." )
ENDIF( ${expression} )

In a CMake project, you might have a variable sources for a list of source files. When you call ADD_EXECUTABLE or ADD_LIBRARY to create a target, you may pass ${sources}.

When CMake wants a variable name, when it wants a substitution.

Does IF( number GREATER 10 ) mean the same as IF( ${number} GREATER 10 )? Actually it does, if number is a variable. Some operations in boolean expressions will accept either a variable name or a value. If the name matches a variable name, then CMake uses that variable's value; but if the name does not match a variable name, then CMake uses the direct value.

Suppose the variable number contains 4. Then IF( number GREATER 10 ) will notice that number is the name of a variable, and CMake tests if 4 is greater than 10. (The test is false.) But IF( ${number} GREATER 10 ) expands to IF( 4 GREATER 10 ), and we hope that 4 is not the name of a variable, so CMake tests if 4 is greater than 10.

The first argument to the SET command is a variable name, not a substitution. SET wants to know the name of the variable to set, not the variable's current value. (This is different from Perl, which does not perform substitution.) It remains legitimate to use a substitution if the name of the variable that you want to set is the value of another variable:

SET( varname number ) # sets varname to "number"
SET( ${varname} 4 )   # sets number to "4"

CMake may access environment variables.

You can also substitute the value of an environment variable. The syntax is $ENV{name}.

MESSAGE( "Your Unix home directory is $ENV{HOME}." )

Note that "ENV" is the only permitted hash (or map or dictionary, or your name for a set of key-value pairs). If you attempt to access another hash, then CMake will give an error:

MESSAGE( $hi{there} )
# causes error "Key hi is not used yet. For now only $ENV{..} is allowed"

Yes, it is possible to set an environment variable:

SET( ENV{PATH} /bin:/usr/bin ) # use a minimal PATH

Recursive subsitutions permit indirect variables.

CMake can handle recursive substitutions, so that you may let one variable contain the name (or part of the name) of another variable:

SET( varname x )
SET( x 6 )
MESSAGE( "${varname} is ${${varname}}" ) # displays "x is 6"

It is not necessary to employ a workaround like "STRING( CONFIGURE ... )".

TODO

  • Describe ELSE and ELSEIF.
  • Describe FOREACH and MACRO on this page. These commands provide constant parameters, not variables; demonstrate that you cannot SET them. Explain the important rule, constant expansion happens before variable expansion. This rule has consequences when your parameter contains "${" somewhere.

History

Kernigh began to write about the syntax of the language of CMake. At first, the information appeared in a wiki page "The Scripting Language of CMake", http://kernigh.pbwiki.com/CMake. Then Kernigh moved the syntax information to this page at the Kitware Public Wiki.