str: String manipulation

Table of content

Introduction

The str: module provides string manipulation functions.

Function usages are given in the same format as in the reference doc for the builtin module.

The builtin module also contains some string utilities, such as string comparison commands like <s and printf.

Functions

str:compare

str:compare $a $b

Compares two strings and output an integer that will be 0 if a == b, -1 if a < b, and +1 if a > b.

~> str:compare a a
▶ (num 0)
~> str:compare a b
▶ (num -1)
~> str:compare b a
▶ (num 1)

str:contains

str:contains $str $substr

Outputs whether $str contains $substr as a substring.

~> str:contains abcd x
▶ $false
~> str:contains abcd bc
▶ $true

str:contains-any

str:contains-any $str $chars

Outputs whether $str contains any Unicode code points in $chars.

~> str:contains-any abcd x
▶ $false
~> str:contains-any abcd xby
▶ $true

str:count

str:count $str $substr

Outputs the number of non-overlapping instances of $substr in $s. If $substr is an empty string, output 1 + the number of Unicode code points in $s.

~> str:count abcdefabcdef bc
▶ (num 2)
~> str:count abcdef ''
▶ (num 7)

str:equal-fold

str:equal-fold $str1 $str2

Outputs if $str1 and $str2, interpreted as UTF-8 strings, are equal under Unicode case-folding.

~> str:equal-fold ABC abc
▶ $true
~> str:equal-fold abc ab
▶ $false

str:fields

str:fields $str

Splits $str around each instance of one or more consecutive white space characters.

~> str:fields "lorem ipsum   dolor"
▶ lorem
▶ ipsum
▶ dolor
~> str:fields "   "

See also str:split.

str:from-codepoints

str:from-codepoints $number...

Outputs a string consisting of the given Unicode codepoints. Example:

~> str:from-codepoints 0x61
▶ a
~> str:from-codepoints 0x4f60 0x597d
▶ 你好

See also str:to-codepoints.

str:from-utf8-bytes

str:from-utf8-bytes $number...

Outputs a string consisting of the given Unicode bytes. Example:

~> str:from-utf8-bytes 0x61
▶ a
~> str:from-utf8-bytes 0xe4 0xbd 0xa0 0xe5 0xa5 0xbd
▶ 你好

See also str:to-utf8-bytes.

str:has-prefix

str:has-prefix $str $prefix

Outputs if $str begins with $prefix.

~> str:has-prefix abc ab
▶ $true
~> str:has-prefix abc bc
▶ $false

str:has-suffix

str:has-suffix $str $suffix

Outputs if $str ends with $suffix.

~> str:has-suffix abc ab
▶ $false
~> str:has-suffix abc bc
▶ $true

str:index

str:index $str $substr

Outputs the index of the first instance of $substr in $str, or -1 if $substr is not present in $str.

~> str:index abcd cd
▶ (num 2)
~> str:index abcd xyz
▶ (num -1)

str:index-any

str:index-any $str $chars

Outputs the index of the first instance of any Unicode code point from $chars in $str, or -1 if no Unicode code point from $chars is present in $str.

~> str:index-any "chicken" "aeiouy"
▶ (num 2)
~> str:index-any l33t aeiouy
▶ (num -1)

str:join

str:join $sep $input-list?

Joins inputs with $sep. Examples:

~> put lorem ipsum | str:join ,
▶ 'lorem,ipsum'
~> str:join , [lorem ipsum]
▶ 'lorem,ipsum'
~> str:join '' [lorem ipsum]
▶ loremipsum
~> str:join '...' [lorem ipsum]
▶ lorem...ipsum

Etymology: Various languages, Python.

See also str:split.

str:last-index

str:last-index $str $substr

Outputs the index of the last instance of $substr in $str, or -1 if $substr is not present in $str.

~> str:last-index "elven speak elvish" elv
▶ (num 12)
~> str:last-index "elven speak elvish" romulan
▶ (num -1)

str:replace

str:replace &max=-1 $old $repl $source

Replaces all occurrences of $old with $repl in $source. If $max is non-negative, it determines the max number of substitutions.

Note: This command does not support searching by regular expressions, $old is always interpreted as a plain string. Use re:replace if you need to search by regex.

str:split

str:split &max=-1 $sep $string

Splits $string by $sep. If $sep is an empty string, split it into codepoints.

If the &max option is non-negative, stops after producing the maximum number of results.

~> str:split , lorem,ipsum
▶ lorem
▶ ipsum
~> str:split '' 你好
▶ 你
▶ 好
~> str:split &max=2 ' ' 'a b c d'
▶ a
▶ 'b c d'

Note: This command does not support splitting by regular expressions, $sep is always interpreted as a plain string. Use re:split if you need to split by regex.

Etymology: Various languages, in particular Python.

See also str:join and str:fields.

str:title

str:title $str

Outputs $str with all Unicode letters that begin words mapped to their Unicode title case.

~> str:title "her royal highness"
▶ 'Her Royal Highness'

str:to-codepoints

str:to-codepoints $string

Outputs value of each codepoint in $string, in hexadecimal. Examples:

~> str:to-codepoints a
▶ 0x61
~> str:to-codepoints 你好
▶ 0x4f60
▶ 0x597d

The output format is subject to change.

See also str:from-codepoints.

str:to-lower

str:to-lower $str

Outputs $str with all Unicode letters mapped to their lower-case equivalent.

~> str:to-lower 'ABC!123'
▶ abc!123

str:to-title

str:to-title $str

Outputs $str with all Unicode letters mapped to their Unicode title case.

~> str:to-title "her royal highness"
▶ 'HER ROYAL HIGHNESS'
~> str:to-title "хлеб"
▶ ХЛЕБ

str:to-upper

str:to-upper

Outputs $str with all Unicode letters mapped to their upper-case equivalent.

~> str:to-upper 'abc!123'
▶ ABC!123

str:to-utf8-bytes

str:to-utf8-bytes $string

Outputs value of each byte in $string, in hexadecimal. Examples:

~> str:to-utf8-bytes a
▶ 0x61
~> str:to-utf8-bytes 你好
▶ 0xe4
▶ 0xbd
▶ 0xa0
▶ 0xe5
▶ 0xa5
▶ 0xbd

The output format is subject to change.

See also str:from-utf8-bytes.

str:trim

str:trim $str $cutset

Outputs $str with all leading and trailing Unicode code points contained in $cutset removed.

~> str:trim "¡¡¡Hello, Elven!!!" "!¡"
▶ 'Hello, Elven'

str:trim-left

str:trim-left $str $cutset

Outputs $str with all leading Unicode code points contained in $cutset removed. To remove a prefix string use str:trim-prefix.

~> str:trim-left "¡¡¡Hello, Elven!!!" "!¡"
▶ 'Hello, Elven!!!'

str:trim-prefix

str:trim-prefix $str $prefix

Outputs $str minus the leading $prefix string. If $str doesn’t begin with $prefix, $str is output unchanged.

~> str:trim-prefix "¡¡¡Hello, Elven!!!" "¡¡¡Hello, "
▶ Elven!!!
~> str:trim-prefix "¡¡¡Hello, Elven!!!" "¡¡¡Hola, "
▶ '¡¡¡Hello, Elven!!!'

str:trim-right

str:trim-right $str $cutset

Outputs $str with all trailing Unicode code points contained in $cutset removed. To remove a suffix string use str:trim-suffix.

~> str:trim-right "¡¡¡Hello, Elven!!!" "!¡"
▶ '¡¡¡Hello, Elven'

str:trim-space

str:trim-space $str

Outputs $str with all leading and trailing white space removed as defined by Unicode.

~> str:trim-space " \t\n Hello, Elven \n\t\r\n"
▶ 'Hello, Elven'

str:trim-suffix

str:trim-suffix $str $suffix

Outputs $str minus the trailing $suffix string. If $str doesn’t end with $suffix, $str is output unchanged.

~> str:trim-suffix "¡¡¡Hello, Elven!!!" ", Elven!!!"
▶ ¡¡¡Hello
~> str:trim-suffix "¡¡¡Hello, Elven!!!" ", Klingons!!!"
▶ '¡¡¡Hello, Elven!!!'