Table of Content:
The semantics of Elvish is unique in many aspects when compared to other shells. This can be surprising if you are used to other shells, and it is a result of the design choice of making Elvish a full-fledged programming language.
Structureful IO
Elvish offers the ability to build elaborate data structures, “return” them from functions, and pass them through pipelines.
Motivation
Traditional shells use strings for all kinds of data. They can be stored in variables, used as function arguments, written to output and read from input. Strings are very simple to use, but they fall short if your data has an inherent structure. A common solution is using pseudo-structures of “each line representing a record, and each (whitespace-separated) field represents a property”, which is fine as long as your data do not contain whitespaces. If they do, you will quickly run into problems with escaping and quotation and find yourself doing black magics with strings.
Some shells provide data structures like lists and maps, but they are usually not first-class values. You can store them in variables, but you might not to be able to nest them, pass them to functions, or returned them from functions.
Data Structures and “Returning” Them
Elvish offers first-class support for data structures such as lists and maps. Here is an example that uses a list:
~> var li = [foo bar 'lorem ipsum']
~> kind-of $li # "kind" is like type
▶ list
~> count $li # count the number of elements in a list
▶ 3
(See the language reference for a more complete description of the builtin data structures.)
As you can see, you can store lists in variables and use them as
command arguments. But they would be much less useful if you cannot
return them from a function. A naive way to do this is
by echo
ing the list and use output capture to recover
it:
~> fn f {
echo [foo bar 'lorem ipsum']
}
~> var li = (f) # (...) is output capture, like $(...) in other shells
~> kind-of $li
▶ string
~> count $li # count the number of bytes, since $li is now a string
▶ 23
As we have seen, our attempt to output a list has turned it into a
string. This is because the echo
command in Elvish, like in
other shells, is string-oriented. To echo a list, it has to be first
converted to a string.
Elvish provides a put
command to output structured
values as they are:
~> fn f {
put [foo bar 'lorem ipsum']
}
~> var li = (f)
~> kind-of $li
▶ list
~> count $li
▶ 3
So how does put
work differently from echo
under the hood?
In Elvish, the standard output is made up of two parts: one
traditional byte-oriented file, and one internal
value-oriented channel. The echo
command
writes to the file, so it has to serialize its arguments into strings;
the put
command writes to the value-oriented channel,
preserving all the internal structures of the values.
If you invoke put
directly from the command prompt, the
values it output have a leading ▶
:
~> put [foo bar]
▶ [foo bar]
The leading arrow is a way to visualize that a command has written something onto the value channel, and not part of the value itself.
In retrospect, you may discover that the kind-of
and
count
builtin commands also write their output to the value
channel.
Passing Data Structures Through the Pipeline
When I said that standard output in Elvish comprises two parts, I was
not telling the full story: pipelines in Elvish also have these two
parts, in a very similar way. Data structures can flow in the
value-oriented part of the pipeline as well. For instance, the
each
command takes input from the
value-oriented channel, and apply a function to each value:
~> put lorem ipsum | each {|x| echo "Got "$x }
Got lorem
Got ipsum
There are many builtin commands that inputs or outputs values. As
another example, the take
commands retains a fixed number
of items:
~> put [lorem ipsum] "foo\nbar" [&key=value] | take 2
▶ [lorem ipsum]
▶ "foo\nbar"
Interoperability with External Commands
Unfortunately, the ability of passing structured values is not available to external commands. However, Elvish comes with a pair of commands for JSON serialization/deserialization. The following snippet illustrates how to interoperate with a Python script:
~> cat sort-list.py
import json, sys
li = json.load(sys.stdin)
li.sort()
json.dump(li, sys.stdout)
~> put [lorem ipsum foo bar] | to-json | python sort-list.py | from-json
▶ [bar foo ipsum lorem]
It is easy to write a wrapper for such external commands:
~> fn sort-list { to-json | python sort-list.py | from-json }
~> put [lorem ipsum foo bar] | sort-list
▶ [bar foo ipsum lorem]
More serialization/deserialization commands may be added to the language in the future.
Exit Status and Exceptions
Unix commands exit with a non-zero value to signal errors. This is
available traditionally as a $?
variable in other
shells:
true
echo $? # prints "0"
false
echo $? # prints "1"
Builtin commands and user-defined functions also do this to signal errors, although they are not Unix commands:
bad() {
return 2
}
bad
echo $? # prints "2"
This model is fine, only if most errors are non-fatal (so that errors
from a previous command normally do not affect the execution of
subsequence ones) and the script author remembers to check
$?
for the rare fatal errors.
Elvish has no concept of exit status. Instead, it has exceptions
that, when thrown, interrupt the flow of execution. The equivalency of
the bad
function in elvish is as follows:
fn bad {
fail "bad things have happened" # throw an exception
}
bad # will print a stack trace and stop execution
echo "after bad" # not executed
(If you run this interactively, you need to enter a literal newline
after bad
by pressing Alt-Enter to make sure
that it is executed in the same chunk as
echo "after bad"
.)
And, non-zero exit status from external commands are turned into exceptions:
false # will print a stack trace and stop execution
echo "after false"
An alternative way to describe this is that Elvish
does have exit statuses, but non-zero exit statuses
terminates execution by default. You can handle non-zero exit statuses
by wrapping the command in a try
block.
Compare with POSIX shells, the behavior of Elvish is similar to
set -e
or set -o errexit
, or having implicit
&&
operators joining all the commands. Defaulting
to stopping execution when bad things happen makes Elvish safer and code
behavior more predictable.
Predicates and if
The use of exit status is not limited to errors, however. In the Unix toolbox, quite a few commands exit with 0 to signal “true” and 1 to signal “false”. Notably ones are:
test
aka[
: testing file types, comparing numbers and strings;grep
: exits with 0 when there are matches, with 1 otherwise;diff
: exits with 0 when files are the same, with 1 otherwise;true
andfalse
, always exit with 0 and 1 respectively.
The if
control structure in POSIX shell is designed to
work with such predicate commands: it takes a pipeline, and executes the
body if the last command in the pipeline exits with 0. Examples:
# command 1
if true; then
echo 'always executes'
fi
# command 2
n=10
if test $n -gt 2; then
echo 'executes when $n > 2'
fi
# command 3
if diff a.txt b.txt; then
echo 'a.txt and b.txt are the same'
fi
Since Elvish treats non-zero exit status as a kind of exception, the
way that predicate commands and if
work in POSIX shell does
not work well for Elvish. Instead, Elvish’s if
is like most
non-shell programming languages: it takes a value, and executes the body
if the value is booleanly true. The first command above is written in
Elvish as:
if $true {
echo 'always executes'
}
The way to write the second command in Elvish warrants an explanation
of how predicates work in Elvish first. Predicates in Elvish simply
write a boolean output, either $true
or
$false
:
~> > 10 2
▶ $true
~> > 1 2
▶ $false
To use predicates in if
, you simply capture its output
with ()
. So the second command is written in Elvish as:
var n = 10
if (> $n 2) {
echo 'executes when $n > 2'
}
The parentheses after if
are different to those in C: In
C it is a syntactical requirement to put them around the condition; in
Elvish, it functions as output capture operator.
Sometimes it can be useful to have a condition on whether an external
commands exits with 0. In this case, you can use the exception capture
operator ?()
:
if ?(diff a.txt b.txt) {
echo 'a.txt and b.txt are the same'
}
In Elvish, all exceptions are booleanly false, while the special
$ok
value is booleanly true. If the diff
exits
with 0, the ?(...)
construct evaluates to $ok
,
which is booleanly true. Otherwise, it evaluates to an exception, which
is booleanly false. Overall, this leads to a similar semantics with the
POSIX if
command.
Note that the following code does have a severe downside:
?()
will prevent any kind of exceptions from throwing. In
this case, we only want to turn one sort of exception into a boolean:
diff
exits with 1. If diff
exits with 2, it
usually means that there was a genuine error (e.g. a.txt
does not exist). Swallowing this error defeats Elvish’s philosophy of
erring on the side of caution; a more sophisticated system of handling
exit status is still being considered.
Phases of Code Execution
A piece of code that gets evaluated as a whole is called a
chunk (a loanword from Lua). If you run
elvish some-script.elv
from the command line, the entire
script is one chunk; in interactive mode, each time you hit Enter, the
code you have written is one chunk.
Elvish interprets a code chunk in 3 phases: it first parses the code into a syntax tree, then compiles the syntax tree code to an internal representation, and finally evaluates the just-generated internal representation.
If any error happens during the first two phases, Elvish rejects the chunk without executing any of it. For instance, in Elvish unclosed parenthesis is an error during the parsing phase. The following code, when executed as a chunk, does nothing other than printing the parse error:
echo before
echo (
The same code, interpreted as bash, also contains a syntax error.
However, if you save this file to bad.bash
and run
bash bad.bash
, bash will execute the first line before
complaining about the syntax error on the second line.
Likewise, in Elvish using an unassigned variable is a compilation error, so the following code does nothing either:
# assuming $nonexistent was not assigned
echo before
echo $nonexistent
There seems to be no equivalency of compilation errors in other shells, but this extra compilation phase makes the language safer. In future, optional type checking may be introduced, which will fit into the compilation phase.
Assignment Semantics
In Python, JavaScript and many other languages, if you assign a container (e.g. a map) to multiple variables, modifications via those variables mutate the same container. This is best illustrated with an example:
= {'foo': 'bar', 'lorem': 'ipsum'}
m = m
m2 'foo'] = 'quux'
m2[print(m['foo']) # prints "quux"
This is because in such languages, variables do not hold the “actual”
map, but a reference to it. After the assignment m2 = m
,
both variables refer to the same map. The subsequent element assignment
m2['foo'] = 'quux'
mutates the underlying map, so
m['foo']
is also changed.
This is not the case for Elvish:
~> var m = [&foo=bar &lorem=ipsum]
~> var m2 = $m
~> set m2[foo] = quux
~> put $m[foo]
▶ bar
It seems that when you assign m2 = $m
, the entire map is
copied from $m
into $m2
, so any subsequent
changes to $m2
does not affect the original map in
$m
. You can entirely think of it this way: thinking
assignment as copying correctly models the behavior of
Elvish.
But wouldn’t it be expensive to copy an entire list or map every time
assignment happens? No, the “copying” is actually very cheap. Is it
implemented as copy-on-write –
i.e. the copying is delayed until $m2
gets modified? No,
subsequent modifications to the new $m2
is also very cheap.
Read on if you are interested in how it is possible.
Implementation Detail: Persistent Data Structures
Like in Python and JavaScript, Elvish variables like $m
and $m2
also only hold a reference to the underlying map.
However, that map is immutable, meaning that they never
change after creation. That explains why $m
did not change:
because the map $m
refers to never changes. But how is it
possible to do m2[foo] = quux
if the map is immutable?
The map implementation of Elvish has another property: although the map is immutable, it is easy to create a slight variation of one map. Given a map, it is easy to create another map that is almost the same, either 1) with one more key/value pair, or 2) with the value for one key changed, or 3) with one fewer key/value pair. This operation is fast, even if the original map is very large.
This low-level functionality is exposed by the assoc
(associate) and dissoc
(dissociate) builtins:
~> assoc [&] foo quux # "add" one pair
▶ [&foo=quux]
~> assoc [&foo=bar &lorem=ipsum] foo quux # "modify" one pair
▶ [&lorem=ipsum &foo=quux]
~> dissoc [&foo=bar &lorem=ipsum] foo # "remove" one pair
▶ [&lorem=ipsum]
Now, although maps are immutable, variables are mutable. So when you
try to assign an element of $m2
, Elvish turns that into an
assignment of $m2
itself:
set m2[foo] = quux
# is just syntax sugar for:
set m2 = (assoc $m2 foo quux)
The sort of immutable data structures that support cheap creation of
“slight variations” are called persistent
data structures and is used in functional programming languages.
However, the way Elvish turns assignment to $m2[foo]
into
an assignment to $m2
itself seems to be a new approach.