转载: Rust: Raw string literals - rahul thakoor
# r#“What is this?”#
While working with Rust, you will often come across r#"something like this"#
, especially when working with JSON
and TOML
files. It defines a raw string literal. When would you use a raw string literal and what makes a valid raw string literal?
# When would you use a raw string literal?
First, let’s understand what a string literal is. According to the The Rust Reference1, A string literal is a sequence of any Unicode characters enclosed within two U+0022 (double-quote) characters, with the exception of U+0022 itself2. Escape characters in the string literal body are processed. The string body cannot contain a double-quote. If you need to insert one, you have to escape it like this: \"
.
Escaping double-quotes can be cumbersome in some cases such as writing regular expressions or defining a JSON object as a string literal. In these situations, raw string literals are helpful since they allow you to write the literal without requiring escapes.
Here is a snippet from the toml
3 crate:
1 | //source: https://github.com/alexcrichton/toml-rs/blob/master/examples/decode.rs |
Or another from serde-rs
4:
1 | // source: https://github.com/serde-rs/json |
So, raw string literals are helpful, but what makes a valid one?
# What makes a raw string literal?
The Rust Reference defines a raw string literal as starting with the character U+0072 ®, followed by zero or more of the character U+0023 (#) and a U+0022 (double-quote) character. The raw string body can contain any sequence of Unicode characters and is terminated only by another U+0022 (double-quote) character, followed by the same number of U+0023 (#) characters that preceded the opening U+0022 (double-quote) character.5
Escape characters in the raw string body are not processed.
Therefore the following raw string literals are all valid:
1 |
|
Try it on playpen
If you need to include double-quote character in a raw string, you must tag the start and end of the raw string with hash/pound signs( #
).
1 |
|
Try it on playpen
The raw string body can contain any sequence of UNICODE characters except "#
since it would terminate the literal. If you want to include the particular sequence, you have to change the number of #
that precede the opening double-quote. For instance:
1 |
|
Try it on playpen
Likewise, if "##
is to be included, you can add another #
to the starting and ending delimiters.
# Wrap Up
Raw string literals are helpful when you need to avoid escaping characters within a literal. The characters in a raw string represent themselves. Informally, a raw string literal is an r, followed by N hashes (where N can be zero), a quote, any characters, then a quote followed by N hashes.6
Here’s how visualising7 raw string literals works for me:
Image generated using Railroad-Diagram-GeneratorThat’s it for now!
- https://doc.rust-lang.org/stable/reference/ [return]
- https://doc.rust-lang.org/stable/reference/tokens.html#string-literals [return]
- https://github.com/alexcrichton/toml-rs/blob/master/examples/decode.rs [return]
- https://github.com/serde-rs/json [return]
- https://doc.rust-lang.org/stable/reference/tokens.html#raw-string-literals [return]
- https://github.com/rust-lang/rust/blob/master/src/grammar/raw-string-literal-ambiguity.md [return]
- http://www.bottlecaps.de/rr/ui [return]