% This LaTeX document was generated using the LaTeX backend of PlDoc,
% The SWI-Prolog documentation system


\section{library(csv): Process CSV (Comma-Separated Values) data}

\label{sec:csv}

\begin{tags}
    \tag{See also}
RFC 4180\mtag{To be done}- Implement immediate assert of the data to avoid possible stack
overflows. \\- Writing creates an intermediate code-list, possibly overflowing
resources. This waits for pure output!
\end{tags}

This library parses and generates CSV data. CSV data is represented in
Prolog as a list of rows. Each row is a compound term, where all rows
have the same name and arity.\vspace{0.7cm}

\begin{description}
    \predicate[det]{csv_read_file}{2}{+File, -Rows}
\nodescription
    \predicate[det]{csv_read_file}{3}{+File, -Rows, +Options}
Read a CSV file into a list of rows. Each row is a Prolog term
with the same arity. \arg{Options} is handed to \dcgref{csv}{2}. Remaining
options are processed by \predref{phrase_from_file}{3}. The default
separator depends on the file name extension and is \verb$\t$ for
\verb$.tsv$ files and \verb$,$ otherwise.

Suppose we want to create a predicate \predref{table}{6} from a CSV file
that we know contains 6 fields per record. This can be done
using the code below. Without the option \verb$arity(6)$, this would
generate a predicate table/N, where N is the number of fields
per record in the data.

\begin{code}
?- csv_read_file(File, Rows, [functor(table), arity(6)]),
   maplist(assert, Rows).
\end{code}

    \predicate[det]{csv_read_stream}{3}{+Stream, -Rows, +Options}
Read CSV data from \arg{Stream}. See also \predref{csv_read_row}{3}.

    \dcg[det]{csv}{1}{?Rows}
\nodescription
    \dcg[det]{csv}{2}{?Rows, +Options}
Prolog DCG to `read/write' CSV data. \arg{Options}:

\begin{description}
    \termitem{separator}{+Code}
The comma-separator. Must be a character code. Default is
(of course) the comma. Character codes can be specified
using the 0' notation. E.g., using \verb$separator(0';)$ parses
a semicolon separated file.
    \termitem{ignore_quotes}{+Boolean}
If \const{true} (default false), threat double quotes as a normal
character.
    \termitem{strip}{+Boolean}
If \const{true} (default \const{false}), strip leading and trailing
blank space. RFC4180 says that blank space is part of the
data.
    \termitem{skip_header}{+CommentLead}
Skip leading lines that start with \arg{CommentLead}. There is
no standard for comments in CSV files, but some CSV files
have a header where each line starts with \verb$#$. After
skipping comment lines this option causes \dcgref{csv}{2} to skip empty
lines. Note that an empty line may not contain white space
characters (space or tab) as these may provide valid data.
    \termitem{convert}{+Boolean}
If \const{true} (default), use \predref{name}{2} on the field data. This
translates the field into a number if possible.
    \termitem{case}{+Action}
If \const{down}, downcase atomic values. If \const{up}, upcase them
and if \const{preserve} (default), do not change the case.
    \termitem{functor}{+Atom}
Functor to use for creating row terms. Default is \const{row}.
    \termitem{arity}{?Arity}
Number of fields in each row. This predicate raises
a \verb$domain_error(row_arity(Expected), Found)$ if a row is
found with different arity.
    \termitem{match_arity}{+Boolean}
If \const{false} (default \const{true}), do not reject CSV files where
lines provide a varying number of fields (columns). This
can be a work-around to use some incorrect CSV files.
\end{description}

    \predicate[nondet]{csv_read_file_row}{3}{+File, -Row, +Options}
True when \arg{Row} is a row in \arg{File}. First unifies \arg{Row} with the first
row in \arg{File}. Backtracking yields the second, ... row. This
interface is an alternative to \predref{csv_read_file}{3} that avoids
loading all rows in memory. Note that this interface does not
guarantee that all rows in \arg{File} have the same arity.

In addition to the options of \predref{csv_read_file}{3}, this predicate
processes the option:

\begin{description}
    \termitem{line}{-Line}
\arg{Line} is unified with the 1-based line-number from which \arg{Row} is
read. Note that \arg{Line} is not the physical line, but rather the
\textit{logical} record number.
\end{description}

    \predicate[det]{csv_read_row}{3}{+Stream, -Row, +CompiledOptions}
Read the next CSV record from \arg{Stream} and unify the result with \arg{Row}.
\arg{CompiledOptions} is created from options defined for \dcgref{csv}{2} using
\predref{csv_options}{2}. \arg{Row} is unified with \verb$end_of_file$ upon reaching the
end of the input.

    \predicate[det]{csv_options}{2}{-Compiled, +Options}
\arg{Compiled} is the compiled representation of the CSV processing
options as they may be passed into \dcgref{csv}{2}, etc. This predicate is
used in combination with \predref{csv_read_row}{3} to avoid repeated processing
of the options.

    \predicate[det]{csv_write_file}{2}{+File, +Data}
\nodescription
    \predicate[det]{csv_write_file}{3}{+File, +Data, +Options}
Write a list of Prolog terms to a CSV file. \arg{Options} are given
to \dcgref{csv}{2}. Remaining options are given to \predref{open}{4}. The default
separator depends on the file name extension and is \verb$\t$ for
\verb$.tsv$ files and \verb$,$ otherwise.

    \predicate[det]{csv_write_stream}{3}{+Stream, +Data, +Options}
Write the rows in \arg{Data} to \arg{Stream}. This is similar to
\predref{csv_write_file}{3}, but can deal with data that is produced
incrementally. The example below saves all answers from the
predicate \predref{data}{3} to File.

\begin{code}
save_data(File) :-
   setup_call_cleanup(
       open(File, write, Out),
       forall(data(C1,C2,C3),
              csv_write_stream(Out, [row(C1,C2,C3)], [])),
       close(Out)).
\end{code}

\end{description}