\input texinfo @c -*-texinfo-*- @c %**start of header @setfilename fibers.info @settitle Fibers @c %**end of header @set VERSION 1.1.0 @set UPDATED 31 January 2022 @copying This manual is for Fibers (version @value{VERSION}, updated @value{UPDATED}) Copyright 2016-2022 Andy Wingo @quotation @c For more information, see COPYING.docs in the fibers @c distribution. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. @end quotation @end copying @dircategory The Algorithmic Language Scheme @direntry * Fibers: (fibers.info). Lightweight concurrency for Guile. @end direntry @titlepage @title Fibers @subtitle version @value{VERSION}, updated @value{UPDATED} @author Andy Wingo @page @vskip 0pt plus 1filll @insertcopying @end titlepage @ifnottex @node Top @top Fibers @insertcopying @menu * Introduction:: What's this all about? * Reference:: API reference. * Pitfalls:: Stay on the happy path. * Examples:: Starting points for a hack. * Status:: Fibers is a work in progress. @end menu @end ifnottex @iftex @shortcontents @end iftex @node Introduction @chapter Introduction Fibers is a facility for lightweight concurrency in Guile. @menu * Context:: How do other systems handle concurrency? * Design:: Fibers' point in the design space. * Parallelism:: Faster throughput via more cores. @end menu @node Context @section A brief history of language facilities for concurrency Modern machines have the raw capability to serve hundreds of thousands of simultaneous long-lived connections, but it's often hard to manage this at the software level. Fibers tries to solve this problem in a nice way. Before discussing the approach taken in Fibers, it's worth spending some time on history to see how we got here. One of the most dominant patterns for concurrency these days is ``callbacks'', notably in the Twisted library for Python and the Node.js run-time for JavaScript. The basic observation in the callback approach to concurrency is that the efficient way to handle tens of thousands of connections at once is with low-level operating system facilities like @code{poll} or @code{epoll}. You add all of the file descriptors that you are interested in to a ``poll set'' and then ask the operating system which ones are readable or writable, as appropriate. Once the operating system says ``yes, file descriptor 7145 is readable'', you can do something with that socket; but what? With callbacks, the answer is ``call a user-supplied closure'': a callback, representing the continuation of the computation on that socket. Building a network service with a callback-oriented concurrency system means breaking the program into little chunks that can run without blocking. Wherever a program could block, instead of just continuing the program, you register a callback. Unfortunately this requirement permeates the program, from top to bottom: you always pay the mental cost of inverting your program's control flow by turning it into callbacks, and you always incur run-time cost of closure creation, even when the particular I/O could proceed without blocking. It's a somewhat galling requirement, given that this contortion is required of the programmer, but could be done by the compiler. We Schemers demand better abstractions than manual, obligatory continuation-passing-style conversion. Callback-based systems also encourage unstructured concurrency, as in practice callbacks are not the only path for data and control flow in a system: usually there is mutable global state as well. Without strong patterns and conventions, callback-based systems often exhibit bugs caused by concurrent reads and writes to global state. Some of the problems of callbacks can be mitigated by using ``promises'' or other library-level abstractions; if you're a Haskell person, you can think of this as lifting all possibly-blocking operations into a monad. If you're not a Haskeller, that's cool, neither am I! But if your typey spidey senses are tingling, it's for good reason: with promises, your whole program has to be transformed to return promises-for-values instead of values anywhere it would block. An obvious solution to the control-flow problem of callbacks is to use threads. In the most generic sense, a thread is a language feature which denotes an independent computation. Threads are created by other threads, but fork off and run independently instead of returning to their caller. In a system with threads, there is implicitly a scheduler somewhere that multiplexes the threads so that when one suspends, another can run. In practice, the concept of threads is often conflated with a particular implementation, @dfn{kernel threads}. Kernel threads are very low-level abstractions that are provided by the operating system. The nice thing about kernel threads is that they can use any CPU that is the kernel knows about. That's an important factor in today's computing landscape, where Moore's law seems to be giving us more cores instead of more gigahertz. However, as a building block for a highly concurrent system, kernel threads have a few important problems. One is that kernel threads simply aren't designed to be allocated in huge numbers, and instead are more optimized to run in a one-per-CPU-core fashion. Their memory usage is relatively high for what should be a lightweight abstraction: some 10 kilobytes at least and often some megabytes, in the form of the thread's stack. There are ongoing efforts to reduce this for some systems but we cannot expect wide deployment in the next 5 years, if ever. Even in the best case, a hundred thousand kernel threads will take at least a gigabyte of memory, which seems a bit excessive for book-keeping overhead. Kernel threads can be a bit irritating to schedule, too: when one thread suspends, it's for a reason, and it can be that user-space knows a good next thread that should run. However because kernel threads are scheduled in the kernel, it's rarely possible for the kernel to make informed decisions. There are some ``user-mode scheduling'' facilities that are in development for some systems, but again only for some systems. The other significant problem is that building non-crashy systems on top of kernel threads is hard to do, not to mention ``correct'' systems. It's an embarrassing situation. For one thing, the low-level synchronization primitives that are typically provided with kernel threads, mutexes and condition variables, are not composable. Also, as with callback-oriented concurrency, one thread can silently corrupt another via unstructured mutation of shared state. It's worse with kernel threads, though: a kernel thread can be interrupted at any point, not just at I/O. And though callback-oriented systems can theoretically operate on multiple CPUs at once, in practice they don't. This restriction is sometimes touted as a benefit by proponents of callback-oriented systems, because in such a system, the callback invocations have a single, sequential order. With multiple CPUs, this is not the case, as multiple threads can run at the same time, in parallel. Kernel threads can work. The Java virtual machine does at least manage to prevent low-level memory corruption and to do so with high performance, but still, even Java-based systems that aim for maximum concurrency avoid using a thread per connection because threads use too much memory. In this context it's no wonder that there's a third strain of concurrency: shared-nothing message-passing systems like Erlang. Erlang isolates each thread (called @dfn{processes} in the Erlang world), giving each it its own heap and ``mailbox''. Processes can spawn other processes, and the concurrency primitive is message-passing. A process that tries receive a message from an empty mailbox will ``block'', from its perspective. In the meantime the system will run other processes. Message sends never block, oddly; instead, sending to a process with many messages pending makes it more likely that Erlang will pre-empt the sending process. It's a strange trade off, but it makes sense when you realize that Erlang was designed for network transparency: the same message send/receive interface can be used to send messages to processes on remote machines as well. No network is truly transparent, however. At the most basic level, the performance of network sends should be much slower than local sends. Whereas a message sent to a remote process has to be written out byte-by-byte over the network, there is no need to copy immutable data within the same address space. The complexity of a remote message send is O(n) in the size of the message, whereas a local immutable send is O(1). This suggests that hiding the different complexities behind one operator is the wrong thing to do. And indeed, given byte read and write operators over sockets, it's possible to implement remote message send and receive as a process that serializes and parses messages between a channel and a byte sink or source. In this way we get cheap local channels, and network shims are under the programmer's control. This is the approach that the Go language takes, and is the one we use in Fibers. Structuring a concurrent program as separate threads that communicate over channels is an old idea that goes back to Tony Hoare's work on ``Communicating Sequential Processes'' (CSP). CSP is an elegant tower of mathematical abstraction whose layers form a pattern language for building concurrent systems that you can still reason about. Interestingly, it does so without any concept of time at all, instead representing a thread's behavior as a @dfn{trace} of instantaneous events. Threads themselves are like functions that unfold over the possible events to produce the actual event trace seen at run-time. This view of events as instantaneous happenings extends to communication as well. In CSP, one communication between two threads is modelled as an instantaneous event, partitioning the traces of the two threads into ``before'' and ``after'' segments. Practically speaking, this has ramifications in the Go language, which was heavily inspired by CSP. You might think that a channel is just a an asynchronous queue that blocks when writing to a full queue, or when reading from an empty queue. That's a bit closer to the Erlang conception of how things should work, though as we mentioned, Erlang simply slows down writes to full mailboxes rather than blocking them entirely. However, that's not what Go and other systems in the CSP family do; sending a message on a channel will block until there is a receiver available, and vice versa. The threads are said to ``rendezvous'' at the event. Unbuffered channels have the interesting property that you can @code{select} between sending a message on channel @var{a} or channel @var{b}, and in the end only one message will be sent; nothing happens until there is a receiver ready to take the message. In this way messages are really owned by threads and never by the channels themselves. You can of course add buffering if you like, simply by making a thread that waits on either sends or receives on a channel, and which buffers sends and makes them available to receives. It's also possible to add explicit support for buffered channels, as Go does, which can reduce the number of context switches as there is no explicit buffer thread. Whether to buffer or not to buffer is a tricky choice. It's possible to implement singly-buffered channels in a system like Erlang via an explicit send/acknowledge protocol, though it seems difficult to implement completely unbuffered channels. As we mentioned, it's possible to add buffering to an unbuffered system by the introduction of explicit buffer threads. In the end though in Fibers we follow CSP's lead so that we can implement the nice @code{select} behavior that we mentioned above. As a final point, @code{select} is OK but is not a great language abstraction. Say you call a function and it returns some kind of asynchronous result which you then have to @code{select} on. It could return this result as a channel, and that would be fine: you can add that channel to the other channels in your @code{select} set and you are good. However, what if what the function does is receive a message on a channel, then do something with the message? In that case the function should return a channel, plus a continuation (as a closure or something). If @code{select} results in a message being received over that channel, then we call the continuation on the message. Fine. But, what if the function itself wanted to @code{select} over some channels? It could return multiple channels and continuations, but that becomes unwieldy. What we need is an abstraction over asynchronous operations, and that is the main idea of a CSP-derived system called ``Concurrent ML'' (CML). Originally implemented as a library on top of Standard ML of New Jersey by John Reppy, CML provides this abstraction, which in Fibers is called an @dfn{operation}@footnote{CML uses the term @dfn{event}, but we find this to be a confusing name.}. Calling @code{send-operation} on a channel returns an operation, which is just a value. Operations are like closures in a way; a closure wraps up code in its environment, which can be later called many times or not at all. Operations likewise can be @dfn{performed}@footnote{In CML, @dfn{synchronized}.} many times or not at all; performing an operation is like calling a function. The interesting part is that you can compose operations via the @code{wrap-operation} and @code{choice-operation} combinators. The former lets you bundle up an operation and a continuation. The latter lets you construct an operation that chooses over a number of operations. Calling @code{perform-operation} on a choice operation will perform one and only one of the choices. Performing an operation will call its @code{wrap-operation} continuation on the resulting values. While it's possible to implement Concurrent ML in terms of Go's channels and baked-in @code{select} statement, it's more expressive to do it the other way around, as that also lets us implement other operations types besides channel send and receive, for example timeouts and condition variables. @node Design @section Fibers design In Fibers, the unit of computation is the @dfn{fiber}, a lightweight thread managed by Guile. A fiber communicates with the outside world via normal Guile ports: @code{get-bytevector}, @code{put-string}, and all that. Within a single Guile process fibers communicate by sending and receiving Scheme values over @dfn{channels}. Whenever a fiber tries to read but no data is available, or tries to write but no data can be written, Guile will suspend the fiber and arrange for it to be resumed when the port or channel operation can proceed. In the meantime, Guile will run other fibers. When no fiber is runnable, Guile will use efficient system facilities to sleep until input or output can proceed. When a fiber would block, it suspends to the scheduler from the current thread. The scheduler will arrange to re-start the fiber when the port or channel becomes readable or writable, as appropriate. For ports, the scheduler adds the file descriptor associated with the port to an @code{epoll} set. In either case, the scheduler remembers which fibers are waiting and for what, so that the user can inspect the state of their system. Currently in Fibers there is no ambient scheduler running; an error is signalled if a user calls @code{spawn-fiber} while not inside a @code{run-fibers} invocation. However it is possible to communicate with fibers via channels or other Concurrent ML-like operations, even outside of a @code{run-fibers} invocation. If an operation would block, it suspends the entire kernel thread until the operation can proceed. On the Scheme level, a fiber is a delimited continuation. When a scheduler runs a fiber, it does so within a prompt; when the fiber suspends, it suspends to the prompt. The scheduler saves the resulting continuation as part of the fiber's state. In this way the per-fiber computational state overhead is just the size of the pending stack frames of the fiber, which can be just a handful of words. By default, Fibers takes advantage of all available cores on your system. @xref{Parallelism}, for full details. Ports are how fibers communicate with the world; channels are how fibers communicate with each other. Channels are meeting places between fibers, or between threads. A fiber or thread that goes to send a message over a channel will block until there is a fiber or thread ready to receive the message, and vice versa. Once both parties are ready, the message is exchanged and both parties resume. There can be multiple fibers and threads waiting to read and write on a channel, allowing channels to express not only pipelines but also common concurrency patterns such as fan-in and fan-out. Unlike Erlang channels, channels in Fibers are purely local and do not attempt to provide the illusion of network transparency. This does have the positive advantage that we are able to provide better backpressure support than Erlang, blocking when no receiver is available to handle a message instead of letting the sender keep sending many messages. To avoid starvation, a fiber can only run once within a ``turn''. Each turn starts with a poll on file descriptors of interest and marks the associated fibers as runnable. If no fiber is runnable at the start of the poll, the poll call will ask the kernel to wait for a runnable descriptor. Otherwise the poll call will still check for runnable file descriptors, but also ask the kernel to return immediately. There is an additional FD added to the poll set that is used to interrupt a blocking poll, for example if a fiber becomes runnable due to I/O on a channel from a separate kernel thread while the first scheduler was still polling. If a fiber runs for too long (by default, 10 milliseconds), it will be @dfn{preempted}: interrupted and rescheduled for the next turn. The preemption frequency can be tuned by the user or turned off for a fully cooperative scheduling model. To enable expressive cross-kernel-thread communications, channel sends and receives are atomic and thread-safe. @node Parallelism @section Parallelism By default, Fibers will take advantage of all CPU cores available to it. The degree of parallelism is controlled by the @code{#:parallelism} keyword argument to @code{run-fibers}, which defaults to @code{(current-processor-count)}. @xref{Threads,,,guile.info,Guile Reference Manual}, for more information on @code{current-processor-count}. Pass a different argument to @code{#:parallelism} to choose a different degree of parallelism, for example @code{1} for single-threaded operation. To allocate specific cores to a Guile process, use the @code{taskset} command-line utility. A newly spawned fiber will be scheduled on the kernel thread in which it was created, unless @code{#:parallel? #t} was passed to the @code{spawn-fiber} invocation, in which case its initial kernel thread will be selected at random. In this way the default is to preserve locality of memory access and minimize cross-thread coordination. Additionally, after a scheduler has exhausted its run queue for the current turn, if it has nothing scheduled for the next turn it will try to steal work from other schedulers. This @dfn{work stealing} allows a set of parallel schedulers to automatically rebalance and burn through the current global run queue as fast as possible. After processing its current run queue, possibly including stolen work if its next run queue was empty, a scheduler will then ask the operating system for any file descriptors that have pending activity. The scheduler puts a time limit on this sleep phase if there are pending timeouts, but otherwise the sleep will only wake up when a file descriptor becomes readable or writable, or if another thread wakes up the scheduler. Schedulers that are sleeping do not participate in work stealing. For this reason there is another source of work rebalancing in Fibers, @dfn{work sharing}. As mentioned above, to schedule a fiber on a random remote scheduler, use @code{spawn-fiber} with the @code{#:parallel? #t} keyword argument. The specifics of the scheduling algorithm may change, and it may be that there is no global ``best scheduler''. We look forward to experimenting and finding not only a good default algorithm, but also a library that you can use to find your own local maximum in the scheduling space. As far as performance goes, we have found that computationally intensive tasks parallelize rather well. Expect near-linear speedup as you make more cores available to fibers. On the other hand, although allocation rate improves with additional cores, it currently does not scale linearly, and works best when all cores are on the same NUMA node. This is due to details about how Guile manages its memory. In general there may be many bottlenecks that originate in Guile, Fibers, and in your application, and these bottlenecks constrain the ability of an application to scale linearly. Probably the best way to know if Fibers scales appropriately for your use case is to make some experiments. To restrict the set of cores available to Guile, run Guile from within @code{taskset -c}. See @code{taskset}'s manual page. For machines with multiple sockets you will probably want to use @code{numactl --membind} as well. Then to test scalability on your machine, run @code{./env guile tests/speedup.scm} from within your Fibers build directory, or benchmark your application directly. In time we should be able to develop some diagnostic facilities to help the Fibers user determine where a scaling bottleneck is in their application. @node Reference @chapter API reference Fibers is a library built on Guile. It consists of a public interface, base support for asynchronous operations, implementations of operations for channels and timers, and an internals interface. @menu * Using Fibers:: User-facing interface to fibers * Operations:: Composable abstractions for concurrency. * Channels:: Share memory by communicating. * Timers:: Operations on time. * Conditions:: Waiting for simple state changes. * REPL Commands:: Experimenting with Fibers at the console. * Schedulers and Tasks:: Fibers are built from lower-level primitives. @end menu @node Using Fibers @section Using Fibers The public interface of fibers right now is quite minimal. To use it, import the @code{(fibers)} module: @example (use-modules (fibers)) @end example To create a new fibers scheduler and run it in the current Guile thread, use @code{run-fibers}. @defun run-fibers [init-thunk=@code{#f}] @ [#:install-suspendable-ports?=@code{#t}] @ [#:scheduler=@code{#f}] @ [#:parallelism=@code{(current-processor-count)}] @ [#:cpus=@code{(getaffinity 0)}] @ [#:hz=@code{100}] [#:drain?=@code{#f}] Run @var{init-thunk} within a fiber in a fresh scheduler, blocking until @var{init-thunk} returns. Return the value(s) returned by the call to @var{init-thunk}. For example: @example (run-fibers (lambda () 1)) @result{} 1 (run-fibers (lambda () (spawn-fiber (lambda () (display "hey!\n")))) #:drain? #t) @print{} hey! @end example Calling @code{run-fibers} will ensure that Guile's port implementation allows fibers to suspend if a read or a write on a port would block. @xref{Non-Blocking I/O,,,guile.info,Guile Reference Manual}, for more details on suspendable ports. If for some reason you want port reads or writes to prevent other fibers from running, pass @code{#f} as the @code{#:install-suspendable-ports?} keyword argument. By default, @code{run-fibers} will create a fresh scheduler, and destroy it after @code{run-fibers} finishes. If you happen to have a pre-existing scheduler (because you used the low-level scheduler interface to create one), you can pass it to @code{run-fibers} using the @code{#:scheduler} keyword argument. In that case the scheduler will not be destroyed when @code{run-fibers} finishes. @code{run-fibers} will return when the @var{init-thunk} call returns. To make it additionally wait until there are no more runnable fibers or pending timeouts, specify the @code{#:drain? #t} keyword argument. If @code{run-fibers} creates a scheduler on your behalf, it will arrange for a number of ``peer'' schedulers to also be created, up to a total scheduler count controlled by the @var{parallelism} keyword argument. These peer schedulers will be run in separate threads and will participate in work rebalancing. The fibers will be run on the CPUs specified by @var{cpus}. @xref{Parallelism}. By default @var{hz} is 100, indicating that running fibers should be preempted 100 times per every second of CPU time (not wall-clock time). Note that preemption will only occur if the fiber can actually be suspended; @xref{Barriers}, for more information. Pass @code{0} for @var{hz} to disable preemption, effectively making scheduling fully cooperative. @end defun @defun spawn-fiber thunk [scheduler=@code{(require-current-scheduler)}] @ [#:parallel?=@code{#f}] Spawn a new fiber that will run @var{thunk}. Return the new fiber. The new fiber will run concurrently with other fibers. The fiber will be added to the current scheduler, which is usually what you want. It's also possible to spawn the fiber on a specific scheduler, which is useful to ensure that the fiber runs on a different kernel thread. In that case, pass the optional @code{scheduler} argument. If @var{parallel?} is true, the fiber will be started not (necessarily) on @var{scheduler}, but on a random member of the peer set of @var{scheduler}. @xref{Parallelism}. Note that every scheduler is a member of its own peer set. The fiber will inherit the fluid--value associations (the dynamic state) in place when @code{spawn-fiber} is called. Any @code{fluid-set!} or parameter set within the fiber will not affect fluid or parameter bindings outside the fiber. @end defun @defun sleep seconds Wake up the current fiber after @var{seconds} of wall-clock time have elapsed. This definition will replace the binding for @code{sleep} in the importing module, effectively overriding Guile's ``core'' definition. @end defun @node Operations @section Operations Operations are first-class abstractions for asynchronous events. There are primitive operation types, such as waiting for a timer (@pxref{Timers}) or waiting for a message on a channel (@pxref{Channels}). Operations can also be combined and transformed using the @code{choice-operation} and @code{wrap-operation} from this module: @example (use-modules (fibers operations)) @end example @defun wrap-operation op f Given the operation @var{op}, return a new operation that, if and when it succeeds, will apply @var{f} to the values yielded by performing @var{op}, and yield the result as the values of the wrapped operation. @end defun @defun choice-operation . ops Given the operations @var{ops}, return a new operation that if it succeeds, will succeed with one and only one of the sub-operations @var{ops}. @end defun Finally, once you have an operation, you can perform it using @code{perform-operation}. @defun perform-operation op Perform the operation @var{op} and return the resulting values. If the operation cannot complete directly, block until it can complete. @end defun @xref{Introduction}, for more on the ``Concurrent ML'' system that introduced the concept of the operation abstraction. In the context of Fibers, ``blocking'' means to suspend the current fiber, or to suspend the current kernel thread if the operation is performed outside of a fiber. There is also a low-level constructor for other modules that implement primitive operation types: @defun make-base-operation wrap-fn try-fn block-fn Make a fresh base operation. @end defun This is a low-level constructor, though; if you ever feel the need to call @code{make-base-operation}, make sure you're familiar with the Concurrent ML literature. Godspeed! @node Channels @section Channels Channels are the way to communicate between fibers. To use them, load the channels module: @example (use-modules (fibers channels)) @end example @defun make-channel Make a fresh channel. @end defun @defun channel? obj Return @code{#t} if @var{obj} is a channel, or @code{#f} otherwise. @end defun @defun put-operation channel message Make an operation that if and when it completes will rendezvous with a receiving operation to send @var{message} over @var{channel}. @end defun @defun get-operation channel Make an operation that if and when it completes will rendezvous with a sending operation to receive one value from @var{channel}. @end defun @defun put-message channel message Send @var{message} on @var{channel}, and return zero values. If there is already a receiver waiting to receive a message on this channel, give it our message and continue. Otherwise, block until a receiver becomes available. Equivalent to: @example (perform-operation (put-operation channel message)) @end example @end defun @defun get-message channel Receive a message from @var{channel} and return it. If there is already a sender waiting to send a message on this channel, take its message directly. Otherwise, block until a sender becomes available. Equivalent to: @example (perform-operation (get-operation channel)) @end example @end defun Channels are thread-safe; you can use them to send and receive values between fibers on different kernel threads. @node Timers @section Timers Timers are a kind of operation that, you guessed it, let you sleep until a certain time. @example (use-modules (fibers timers)) @end example @defun sleep-operation seconds Make an operation that will succeed with no values when @var{seconds} have elapsed. @end defun @defun timer-operation expiry Make an operation that will succeed when the current time is greater than or equal to @var{expiry}, expressed in internal time units. The operation will succeed with no values. @end defun @defun sleep seconds Block the calling fiber or kernel thread until @var{seconds} have elapsed. @end defun @node Conditions @section Conditions Condition variables are a simple one-bit form of concurrent communication. A condition variable has two states: it starts in the @dfn{unsignalled} state and later may transition to the @dfn{signalled} state. When a condition becomes signalled, any associated waiting operations complete. @example (use-modules (fibers conditions)) @end example @defun make-condition Make a new condition variable. @end defun @defun condition? obj Return @code{#t} if @var{obj} is a condition variable, or @code{#f} otherwise. @end defun @defun signal-condition! cvar Signal @var{cvar}, notifying all waiting fibers and preventing blocking of future fibers waiting on this condition. @end defun @defun wait-operation cvar Make an operation that will succeed with no values when @var{cvar} becomes signalled. @end defun @defun wait cvar Block the calling fiber or kernel thread until @var{cvar} becomes signalled. Equivalent to @code{(perform-operation (wait-operation cvar))}. @end defun @node REPL Commands @section REPL Commands Fibers implements some basic extensions to the Guile command-line interface (its Read-Eval-Print Loop, or the REPL). Prefix these commands with a comma (@code{,}) to run them at the REPL; see @code{,help fibers} for full details, once you have loaded the @code{(fibers)} module of course. @deffn {REPL Command} scheds Show a list of all schedulers. @end deffn @deffn {REPL Command} spawn-sched Create a new scheduler for fibers, and run it on a new kernel thread. @end deffn @deffn {REPL Command} kill-sched name Shut down the scheduler named @var{name}. Use @code{,scheds} to list scheduler names. @end deffn @deffn {REPL Command} spawn-fiber exp [sched] Spawn a new fiber that runs @var{exp}. If @var{sched} is given, the fiber will be spawned on the given scheduler. @end deffn @node Schedulers and Tasks @section Schedulers and Tasks Internally, fibers are built on top of schedulers and tasks. A scheduler runs tasks. A task is just a thunk (a function of no arguments) whose return value is ignored. A scheduler runs tasks in batches, once per turn. Each turn, a scheduler takes all tasks from its ``next'' run-queue and adds them to its ``current'' run-queue, and then runs the tasks on the current run-queue in order. The scheduler then goes to the next turn, unless its ``finished?'' function returns true. Both the ``next'' and the ``current'' run-queues are public atomic data structures. Scheduling a task adds it to the ``next'' run-queue. Scheduling a task is a thread-safe operation; it can be done by any thread. Scheduling a task on a scheduler running on a remote thread will ensure that the thread wakes up from any sleeping operation it might be currently engaged in. There is normally just one scheduler for each kernel thread that runs fibers. Several schedulers can be made aware of each other so that they can one can spread out the load when spawning tasks that should run in parallel. Also, before moving to the next turn, a scheduler will try to steal work from other schedulers that it knows about, popping off an item from the remote scheduler's ``current'' run-queue. There are two additional sources of tasks for a scheduler: file descriptor events and timers. When gathering tasks to schedule for the next turn, a scheduler will call @code{epoll} to be notified of activity on file descriptors of interest. If there are no pending tasks on the next run-queue, the call to @code{epoll} will sleep until the scheduler is woken up, or until a timer expires. The capability that allows fibers to be built on schedulers is that tasks can suspend. Suspending a task calls a user-supplied after-suspend handler that is passed the continuation of the task. The user can then schedule that continuation at some later time. In this way a fiber starts as a single task run by a scheduler, but each time it suspends and is resumed, a fresh task consisting of the fiber's is scheduled. The fibers layer also uses other Guile mechanisms to isolate fibers from each other, such as dynamic states. All interfaces in this module are thread-safe except where marked otherwise. @example (use-modules (fibers scheduler)) @end example @defun make-scheduler [#:parallelism=@code{#f}] @ [#:prompt-tag=@code{(make-prompt-tag "fibers")}] Make a new scheduler in which to run fibers. If @var{parallelism} is true, it should be an integer indicating the number of schedulers to make. The resulting schedulers will all share the same prompt tag and will steal and share out work from among themselves. @end defun @defun run-scheduler sched finished? Run @var{sched} until calling the supplied @var{finished?} thunk returns true. Return zero values. Signal an error if @var{scheduler} is already running in some other kernel thread. @end defun @defun current-scheduler Return the current scheduler, or @code{#f} if no scheduler is current. @end defun @defun scheduler-kernel-thread sched Return the kernel thread that @var{sched} is running on, or @code{#f} if it is not currently running. @end defun @defun scheduler-runcount sched Return the number of tasks that have been run on @var{sched}, modulo @math{2^{32}}. This interface is useful as a lightweight check to see if a remote scheduler is making progress. @end defun @defun scheduler-remote-peers sched Return a list of peer schedulers of @var{sched}, not including @var{sched} itself. @end defun @defun scheduler-work-pending? sched Return @code{#t} if @var{sched} has any work pending: any runnable tasks or any pending timeouts. @end defun @defun choose-parallel-scheduler sched Return a random scheduler from @var{sched}'s peer set. Note that @var{sched}'s peer set includes @var{sched} itself. @end defun @defun destroy-scheduler sched Release any resources associated with @var{sched}. @end defun @defun schedule-task sched task Arrange to run @var{task}, a procedure of no arguments, on the next turn of @var{sched}. If @var{sched} is remote and sleeping, it will be woken up. @end defun @defun schedule-task-when-fd-readable sched fd task Arrange to schedule @var{task} when the file descriptor @var{fd} becomes readable. @emph{Not thread-safe.} @end defun @defun schedule-task-when-fd-writable sched fd task Arrange to schedule @var{task} on @var{sched} when the file descriptor @var{fd} becomes writable. @emph{Not thread-safe.} @end defun @defun schedule-task-at-time sched expiry task Arrange to schedule @var{task} on @var{sched} when the absolute real time is greater than or equal to @var{expiry}, expressed in internal time units. @emph{Not thread-safe.} @end defun @defun suspend-current-task after-suspend Suspend the current task to the current scheduler. After suspending, call the @var{after-suspend} callback with two arguments: the current scheduler, and the continuation of the current task. The continuation passed to the @var{after-suspend} handler is the continuation of the @code{suspend-current-task} call. @end defun @defun yield-current-task Yield control to the current scheduler. Like calling @code{(suspend-current-task schedule-task)} except that it avoids suspending if the current continuation isn't suspendable. Returns @code{#t} if the yield succeeded, or @code{#f} otherwise. @end defun @node Pitfalls @chapter Pitfalls Running Guile code within a fiber mostly ``just works''. There are a few pitfalls to be aware of though. @menu * Blocking:: Avoid calling blocking operations. * Barriers:: Avoid suspending inside continuation barriers. * Mutation:: Avoid unstructured mutation of shared data. * Mutexes:: Mutexes and fibers don't mix very well. @end menu @node Blocking @section Blocking When you run a program under fibers, the fibers library arranges to make it so that port operations can suspend the fiber instead of block. This generally works, with some caveats. @enumerate @item The port type has to either never block, or support non-blocking I/O. Currently the only kind of port in Guile are file ports (including sockets), and for them this condition is fulfilled. However notably non-blocking I/O is not supported for custom binary I/O ports, not yet anyway. If you need this, get it fixed in Guile :) @item You have to make sure that any file port you operate on is opened in nonblocking mode. @xref{Non-Blocking I/O,,,guile.info,Guile Reference Manual}, for the obscure @code{fcntl} incantation to use on your ports. @item You have to avoid any operation on ports that is not supported yet in Guile for non-blocking I/O. Since non-blocking I/O is new in Guile, only some I/O operations are expressed in terms of the primitive operations. Notably, Scheme @code{read}, @code{display}, and @code{write} are still implemented in C, which prevents any fiber that uses them from suspending and resuming correctly. What will happen instead is that the call blocks instead of suspending. If you find a situation like this, talk to Guile developers to get it fixed :) @item You can enable non-blocking I/O for local files, but Linux at least will always say that the local file is ready for I/O even if it has to page in data from a spinning-metal device. This is a well-known limitation for which the solution is apparently to do local I/O via a thread pool. We could implement this in Fibers, or in Guile... not sure what the right thing is! @end enumerate You also have to avoid any other library or system calls that would block. One common source of blocking is @code{getaddrinfo} and related network address resolution library calls. Again, apparently the solution is thread pools? Probably in Fibers we should implement a thread-pooled address resolver. The @code{(fibers)} module exports a @code{sleep} replacement. Code that sleeps should import the @code{(fibers)} module to be sure that they aren't using Guile's @code{sleep} function. Finally, a fiber itself has to avoid blocking other fibers; it must reach a ``yield point'' some time. A yield point includes a read or write on a port or a channel that would block, or a @code{sleep}. Other than that, nothing will pre-empt a fiber, at least not currently. If you need to yield to the scheduler, then at least do a @code{(sleep 0)} or something. @node Barriers @section Barriers When a fiber suspends, Fibers uses @code{abort-to-prompt} to save the fiber's continuation, saving each pending computation in that fiber to the heap. When the fiber resumes, Fibers invokes the saved continuation, effectively replaying these saved stack frames back onto the current stack. For this operation to succeed, the saved continuation needs to be @dfn{suspendable}. A suspendable continuation should be able to be resumed after the call to @code{abort-to-prompt}. Most continuations in Guile are suspendable. However, not all of them are. It's possible to explicitly instate a continuation barrier (@pxref{Continuation Barriers,,,guile.info,Guile Reference Manual}) that will allow the continuation to be aborted but not reinstated: @example ;; If put-message suspends, we will never resume! (run-fibers (lambda () (with-continuation-barrier (lambda () (put-message channel 42))))) @end example If the @code{put-message} call can't succeed directly, then the fiber will suspend. However when the fiber becomes runnable again, it can't be rewound because of the barrier. Because this is the case, when Fibers goes to suspend a computation but realizes that the suspended fiber could never be resumed, it throws an error instead. @code{with-continuation-barrier} is the only function in Guile that establishes a continuation barrier on purpose. However there are number of other functions that accidentally establish a continuation barrier by recursing into C code and then back to Scheme. (Guile can only rewind the state of a saved computation if Guile created the corresponding stack frame, and that's not the case for the intermediate stack frame created by the C compiler.) Accidental continuation barriers are bugs, and the Guile developers have been working on removing them over the years. By now, most of the high-priority accidental barriers are gone. Those that are left include: @itemize @item The body thunk of @code{call-with-blocked-asyncs} @item GOOPS methods attached to a primitive-generic like @code{+} or @code{equal?} @item Dynwind entry/exit handlers, but only when called due to nonlocal entry or exit @item R6RS custom binary port callbacks @item Legacy ``soft port'' callbacks @item R5RS ``delay'' callbacks @item Many module system callbacks (module transformers, etc) @item SRFI-13 string and character-set callbacks @item Callbacks from some SRFI-1 functions @item Callbacks from @code{sort} @item Custom hash table assoc functions @item Calls to @code{load-from-path} (though, oddly, not @code{load}) @item Object printers, e.g. custom record printers @item @code{call-with-vm} @item @code{array-map} and related array functions @end itemize This set will be reduced over time as more of @code{libguile} is rewritten in Scheme. Finally, for port operations, @xref{Non-Blocking I/O,,,guile.info,Guile Reference Manual}. When Guile tries to read from a file descriptor and nothing is available, normally it would call the current read waiter, which Fibers customizes to suspend the fiber and run another one in the meantime. However for procedures that have not been rewritten in terms of the ``suspendable port operations'', notably including @code{read}, @code{write}, and @code{display}, the nothing-to-read condition is handled in C, not Scheme, so Guile cannot create a resumable continuation. In this case, instead of erroring, Guile will wait until the file descriptor is readable or writable (as appropriate) and then continue. However in the meantime, which may be forever, this blocks other fibers from running. Therefore Fibers users sometimes have to be aware of the state of Guile's rewrite of port operations in terms of suspendable-port primitives, and to help out if things aren't moving fast enough :) @node Mutation @section Mutation Although code run within fibers looks like normal straight-up Scheme, it runs concurrently with other fibers. This means that if you mutate shared state and other fibers mutate the same shared state using normal Scheme procedures like @code{set!}, @code{vector-set!}, or the like, then probably you're going to have a bad time. This is especially the case considering that the default is to run as many schedulers in parallel as your machine has cores, and also to preempt fibers at any time. Even if you explicitly choose a cooperative scheduling mode by disabling interrupts and parallelism, multi-step transactions may be suspended if your code reaches a yield point in the middle of performing the transaction. The best way around this problem is to avoid unstructured mutation, and to instead share data by communicating over channels. Using channels to communicate data produces much more robust, safe systems. If you need to mutate global data, the best way is to use an atomic variable. If that is not possible, then consider spawning a fiber to manage the mutable data, and communicating with that fiber over channels. Mutexes are also an option but are difficult to use correctly; see the considerations from the following section. @node Mutexes @section Mutexes Mutexes are low-level synchronization primitives provided by Guile. Used properly, they can be used to build concurrent systems that concurrently access data without corrupting it. @xref{Mutexes and Condition Variables,,,guile.info,Guile Reference Manual}, for some reasons why mutexes aren't so great for Guile in general. Guile's mutexes are an even worse solution with a Fibers system. It is a bad idea for a fiber to grab a Guile mutex, because if the mutex is not available, Guile will suspend not just the fiber that is running but the entire kernel thread. If the mutex is available, the fiber obtains it, cool; but if it the fiber suspends while holding a mutex, that's bad news. Any fiber trying to acquire a mutex while a suspended fiber from the same thread already has the mutex will result in an error: as Guile thinks that the mutex has already been acquired by the current thread, it detects recursion and bails out. With condition variables, similar problems arise: waiting on a condition variable will block indefinitely, if the condition can only be signalled by another fiber in the current kernel thread. The root of this problem is that Guile associates mutexes with kernel threads, not fibers. It would be possible however to make a Fibers-appropriate implementation of mutexes, but we suggest that users try atomic boxes or channels instead. If you do use mutexes, make sure you disable preemption (possibly by a local call to @code{call-with-blocked-asyncs}), and take care to never suspend a fiber while it owns any mutex. @node Examples @chapter Examples Here are some small examples to get you started. @menu * Ping:: An echo server and client. * Memcached:: A simple memcached server and client. * Web Server Backend:: A backend for Guile's web server. * Concurrent Web Server:: A more concurrent web server. @end menu More examples would be great, especially demonstrating interesting things that can be done with channels. @node Ping @section Ping @subsection Server This simple server listens on a TCP port, echoing lines back to any user that connects. This file can be found in @code{examples/ping-server.scm}, and can be run from the build dir as @code{./env guile examples/ping-server.scm}. First, we use some standard Guile modules, and the fibers module. @example (use-modules (rnrs bytevectors) (fibers) (ice-9 textual-ports) (ice-9 rdelim) (ice-9 match)) @end example We run the server within a @code{run-fibers} call. @example (define* (run-ping-server #:key (host #f) (family AF_INET) (addr (if host (inet-pton family host) INADDR_LOOPBACK)) (port 11211) (socket (make-default-socket family addr port))) (listen socket 1024) (sigaction SIGPIPE SIG_IGN) (socket-loop socket (make-hash-table))) (run-fibers run-ping-server) @end example Up to here, all good. Perhaps we should look at how to open a socket; here's a couple helper that appears often in applications that use suspendable ports. @xref{Non-Blocking I/O,,,guile.info,Guile Reference Manual}, for full details. @example (define (make-default-socket family addr port) (let ((sock (socket PF_INET SOCK_STREAM 0))) (setsockopt sock SOL_SOCKET SO_REUSEADDR 1) (fcntl sock F_SETFD FD_CLOEXEC) (fcntl sock F_SETFL (logior O_NONBLOCK (fcntl sock F_GETFL))) (bind sock family addr port) sock)) @end example We hope to make this easier in the future; it's a bit too much ceremony. Now, the main dish is the server loop, that simply accepts new connections, forking off a fiber for each connection: @example (define (socket-loop socket store) (let loop () (match (accept socket SOCK_NONBLOCK) ((client . addr) (spawn-fiber (lambda () (client-loop client addr store))) (loop))))) @end example Finally, the loop to handle a single client: @example (define (client-loop port addr store) (setvbuf port 'block 1024) ;; Disable Nagle's algorithm. We buffer ourselves. (setsockopt port IPPROTO_TCP TCP_NODELAY 1) (let loop () ;; TODO: Restrict read-line to 512 chars. (let ((line (read-line port))) (cond ((eof-object? line) (close-port port)) (else (put-string port line) (put-char port #\newline) (force-output port) (loop)))))) @end example This ping server is fairly straightforward, and is only flawed in a couple of ways: it doesn't limit the line size, and so is vulnerable to memory exhaustion if the client gives it a very, very big line, and additionally, it does not time out clients after inactivity, so the poll set could get quite large. In practice the limit for the number of connections is imposed by the system in the form of a file descriptor limit. Use @code{ulimit -n} to increase this limit on the console, or @code{setrlimit} to increase it from Guile, within the hard limits imposed by the system. @subsection Client The client is similar to the server; see @code{examples/ping-client.scm} for full details. It can be run as @code{./env guile examples/ping-client.scm N M}, to make N concurrent connections to the server and make a series of M requests on each connection. It spawns a fiber per connection, and then uses a normal Guile loop to make the serial requests. @example (define (run-ping-test num-clients num-connections) ;; The getaddrinfo call blocks, unfortunately. Call it once before ;; spawning clients. (let ((addrinfo (car (getaddrinfo "localhost" "11211")))) (let lp ((n 0)) (when (< n num-clients) (spawn-fiber (lambda () (client-loop addrinfo n num-connections))) (lp (1+ n)))))) @end example Running this on a laptop from 2015 yields results more or less like this: @example $ time ./env guile examples/ping-client.scm 1000 100 real 0m1.647s user 0m2.176s sys 0m0.816s @end example That's a throughput of somewhere around 60000 fiber switches per second on each side, which is none too shabby. @node Memcached @section Memcached Similarly to the echo server, Fibers includes an example memcached server and client. Run the server like this: @example $ ./env guile examples/memcached-server.scm @end example Run the client as with the ping client: @example $ time ./env guile examples/memcached-client.scm 1000 100 real 0m6.343s user 0m9.868s sys 0m1.808s @end example Here we see a throughput of around 16000 puts plus 16000 gets per second on this 2-core, 2-thread-per-core machine; pretty OK for a basic implementation. @node Web Server Backend @section Web Server Backend Fibers includes a ``backend'' for Guile's built-in web server that uses non-blocking fibers to read requests and write responses. Fibers also includes a standalone web server that uses Guile's HTTP facilities, but not its web server framework. @xref{Concurrent Web Server}, for more on the standalone web server. To run a web server that serves each client from fibers, specify the @code{fibers} backend when running your web server: @example (use-modules (web server)) (define (handler request body) (values '((content-type . (text/plain))) "Hello, World!")) (run-server handler 'fibers) @end example Performance seems to be about 60% of the standard web server backend implementation shipped with Guile, though it is not as battle-hardened. The fibers web server backend is an interesting case because it uses channels to invert the inversion-of-control imposed on the backend by the web server framework. The web server wants the backend to provide ``read-request'' and ``write-response'' callbacks, whereas in fibers we usually want to run dedicated REPL-like fibers over the client sockets. The fibers backend enables this by passing a callback to the per-client loops: @example (define (have-request response-channel request body) (put-message request-channel (vector response-channel request body)) (match (get-message response-channel) (#(response body) (values response body)))) (let loop () (match (accept socket SOCK_NONBLOCK) ((client . sockaddr) ;; ... (spawn-fiber (lambda () (client-loop client have-request)) #:parallel? #t) (loop)))) @end example From the perspective of the @code{client-loop} fiber, @code{have-request} is a normal function that takes a request and returns a response, and the @code{client-loop} fiber is in control. But from the perspective of the web server, the @code{server-read} and @code{server-write} callbacks are straightforward and idiomatic too: @example (define (server-read server) (match (get-message (server-request-channel server)) (#(response-channel request body) (let ((client response-channel)) (values client request body))))) (define (server-write server client response body) (let ((response-channel client)) (put-message response-channel (vector response body))) (values)) @end example This takes advantage of the fact that we can use @code{get-message}, @code{put-message}, and other CML operations both inside and outside of fibers, and it mostly just does the right thing. Note also the @code{#:parallel? #t} on the @code{spawn-fiber} invocation. The natural unit of parallelism in a web server is the request (or the client), so it's at this point that we introduce work sharing to our system, allowing us to naturally take advantage of multiple cores on our server. @node Concurrent Web Server @section Concurrent Web Server Guile's web server framework single-threads all web request handling. The handler procedure can be passed a number of additional ``state'' arguments, and is expected to return a corresponding number of additional values to use as the next state. This is sometimes what you want, but it does limit concurrency; instead it would be nice to be able to not only the input and output running concurrently, but also handlers too. For this reason, Fibers includes a simple standalone web server that uses Guile's Guile's HTTP facilities, but not its web server framework. To run a standalone web server, use the @code{(fibers web server)} module: @example (use-modules (fibers web server)) (define (handler request body) (values '((content-type . (text/plain))) "Hello, World!")) (run-server handler) @end example Compared to the Fibers web server backend (@pxref{Web Server Backend}), using the standalone fibers web server enables more parallelism, as the handlers can run in parallel when you have multiple cores. Single-core performance of the standalone server is slightly better than the web server backend, and unlike the backend it scales with the number of cores available. @node Status @chapter Project status Fibers is feature-complete and ready to go! It's early days but things are solid enough to say without embarrassment or misgiving that Guile now has a solid concurrency story. Use fibers, incorporate it directly into your project, fork it, improve it: what happens now is up to you. Happy hacking and godspeed! @c @node Concept Index @c @unnumbered Concept Index @c @printindex cp @c @node Function Index @c @unnumbered Function Index @c @printindex fn @bye