% This LaTeX document was generated using the LaTeX backend of PlDoc, % The SWI-Prolog documentation system \section{library(csv): Process CSV (Comma-Separated Values) data} \label{sec:csv} \begin{tags} \tag{See also} RFC 4180\mtag{To be done}- Implement immediate assert of the data to avoid possible stack overflows. \\- Writing creates an intermediate code-list, possibly overflowing resources. This waits for pure output! \end{tags} This library parses and generates CSV data. CSV data is represented in Prolog as a list of rows. Each row is a compound term, where all rows have the same name and arity.\vspace{0.7cm} \begin{description} \predicate[det]{csv_read_file}{2}{+File, -Rows} \nodescription \predicate[det]{csv_read_file}{3}{+File, -Rows, +Options} Read a CSV file into a list of rows. Each row is a Prolog term with the same arity. \arg{Options} is handed to \dcgref{csv}{2}. Remaining options are processed by \predref{phrase_from_file}{3}. The default separator depends on the file name extension and is \verb$\t$ for \verb$.tsv$ files and \verb$,$ otherwise. Suppose we want to create a predicate \predref{table}{6} from a CSV file that we know contains 6 fields per record. This can be done using the code below. Without the option \verb$arity(6)$, this would generate a predicate table/N, where N is the number of fields per record in the data. \begin{code} ?- csv_read_file(File, Rows, [functor(table), arity(6)]), maplist(assert, Rows). \end{code} \predicate[det]{csv_read_stream}{3}{+Stream, -Rows, +Options} Read CSV data from \arg{Stream}. See also \predref{csv_read_row}{3}. \dcg[det]{csv}{1}{?Rows} \nodescription \dcg[det]{csv}{2}{?Rows, +Options} Prolog DCG to `read/write' CSV data. \arg{Options}: \begin{description} \termitem{separator}{+Code} The comma-separator. Must be a character code. Default is (of course) the comma. Character codes can be specified using the 0' notation. E.g., using \verb$separator(0';)$ parses a semicolon separated file. \termitem{ignore_quotes}{+Boolean} If \const{true} (default false), threat double quotes as a normal character. \termitem{strip}{+Boolean} If \const{true} (default \const{false}), strip leading and trailing blank space. RFC4180 says that blank space is part of the data. \termitem{skip_header}{+CommentLead} Skip leading lines that start with \arg{CommentLead}. There is no standard for comments in CSV files, but some CSV files have a header where each line starts with \verb$#$. After skipping comment lines this option causes \dcgref{csv}{2} to skip empty lines. Note that an empty line may not contain white space characters (space or tab) as these may provide valid data. \termitem{convert}{+Boolean} If \const{true} (default), use \predref{name}{2} on the field data. This translates the field into a number if possible. \termitem{case}{+Action} If \const{down}, downcase atomic values. If \const{up}, upcase them and if \const{preserve} (default), do not change the case. \termitem{functor}{+Atom} Functor to use for creating row terms. Default is \const{row}. \termitem{arity}{?Arity} Number of fields in each row. This predicate raises a \verb$domain_error(row_arity(Expected), Found)$ if a row is found with different arity. \termitem{match_arity}{+Boolean} If \const{false} (default \const{true}), do not reject CSV files where lines provide a varying number of fields (columns). This can be a work-around to use some incorrect CSV files. \end{description} \predicate[nondet]{csv_read_file_row}{3}{+File, -Row, +Options} True when \arg{Row} is a row in \arg{File}. First unifies \arg{Row} with the first row in \arg{File}. Backtracking yields the second, ... row. This interface is an alternative to \predref{csv_read_file}{3} that avoids loading all rows in memory. Note that this interface does not guarantee that all rows in \arg{File} have the same arity. In addition to the options of \predref{csv_read_file}{3}, this predicate processes the option: \begin{description} \termitem{line}{-Line} \arg{Line} is unified with the 1-based line-number from which \arg{Row} is read. Note that \arg{Line} is not the physical line, but rather the \textit{logical} record number. \end{description} \predicate[det]{csv_read_row}{3}{+Stream, -Row, +CompiledOptions} Read the next CSV record from \arg{Stream} and unify the result with \arg{Row}. \arg{CompiledOptions} is created from options defined for \dcgref{csv}{2} using \predref{csv_options}{2}. \arg{Row} is unified with \verb$end_of_file$ upon reaching the end of the input. \predicate[det]{csv_options}{2}{-Compiled, +Options} \arg{Compiled} is the compiled representation of the CSV processing options as they may be passed into \dcgref{csv}{2}, etc. This predicate is used in combination with \predref{csv_read_row}{3} to avoid repeated processing of the options. \predicate[det]{csv_write_file}{2}{+File, +Data} \nodescription \predicate[det]{csv_write_file}{3}{+File, +Data, +Options} Write a list of Prolog terms to a CSV file. \arg{Options} are given to \dcgref{csv}{2}. Remaining options are given to \predref{open}{4}. The default separator depends on the file name extension and is \verb$\t$ for \verb$.tsv$ files and \verb$,$ otherwise. \predicate[det]{csv_write_stream}{3}{+Stream, +Data, +Options} Write the rows in \arg{Data} to \arg{Stream}. This is similar to \predref{csv_write_file}{3}, but can deal with data that is produced incrementally. The example below saves all answers from the predicate \predref{data}{3} to File. \begin{code} save_data(File) :- setup_call_cleanup( open(File, write, Out), forall(data(C1,C2,C3), csv_write_stream(Out, [row(C1,C2,C3)], [])), close(Out)). \end{code} \end{description}