blob: b0640b182be83a063733918c212f533088eb1bcb [file] [log] [blame]
<link rel=stylesheet type="text/css" href="style.css" title="style">
man-pages proposal for Google Summer of Code 2015
GSoC 2015 proposal: a feature test macro parser for glibc source code
<form method="get" action="">
<table border=0 cellpadding=0 cellspacing=0 width="100%">
<td align="left">
<font size="-1">
Linux <em>man-pages</em>: &nbsp;
<a href="./index.html">home</a> |
<a href="./contributing.html">contributing</a> |
<a href="./reporting_bugs.html">bugs</a> |
<a href="./patches.html">patches</a> |
<a href="./download.html">download</a> &nbsp; || &nbsp;
<a href="">online pages</a></font>
<td align="right">
<input type="text" name="q" size=10 maxlength=255 value="">
<input type="hidden" name="sitesearch" value="">
<input type="submit" name="sa" value="Search online pages">
<h1>GSoC 2015 proposal: a feature test macro parser for glibc source code</h1>
<h2>The man-pages project</h2>
The Linux
<em>man-pages</em> project
documents the
<a href="">Linux</a>
<a href="">kernel</a>
and C library interfaces that are employed by user-space programs.
With respect to the C library, the primary focus is the
<a href="">GNU</a> C library
(<a href="">glibc</a>),
although, where known,
documentation of variations in other C libraries
available for Linux is also included.
Established in 1993, the project by now contains nearly
1000 man pages that provide documentation of Linux system calls
(and other Linux kernel-user-space APIs, such as the
<span class="pathname">/proc</span>
filesystem), C library APIs, and various related topics.
The project is
<a href="maintaining.html">maintained</a>
<a href="">Michael Kerrisk</a>
and typically 10 to 20 volunteers make contributions to the
<a href="">releases</a>
that occur approximately monthly.
<h2>Problem background</h2>
The GNU C library employs a range of
<a href="">feature test macros</a>
that are used by applications at compile time to control
the definitions and declarations exposed by the library header files.
Examples of FTMs include
<span class="const">_GNU_SOURCE</span>,
<span class="const">_POSIX_C_SOURCE</span>,
<span class="const">_XOPEN_SOURCE</span>.
Starting several years ago, the
<em>man-pages</em> project
began documenting the FTMs that must be defined in order
to obtain the declaration of each system call and library function
exposed via glibc headers.
That documentation includes changes in FTM requirements across
glibc versions and is by now reasonably complete,
albeit not completely up to date
(and hence the rationale for this proposal).
Some (more complex)
examples of the FTM documentation can be seen in the SYNOPSIS
at the top of various man pages such as
<a href="">stat(2)</a>,
<a href="">fsync(2)</a>,
<a href="">unshare(2)</a>,
<a href="">getgrnam(3)</a>,
<a href="">isfdtype(3)</a>.
The work to add this documentation to the man pages has been done
largely by hand, by visually inspecting the various header files
and checking against the FTM logic implemented in the
<span class="pathname"><a href="">&lt;features.h&gt;</a></span>
header file.
However, this approach is time-consuming and somewhat error-prone.
It is also
subject to bit rot as the glibc FTM requirements evolve
across releases.
In particular, starting with version 2.19 deprecated
two long-standing but no longer useful FTMs,
<span class="const">_BSD_SOURCE</span>
<span class="const">_SVID_SOURCE</span>,
and replaced them with a single
<span class="const">_DEFAULT_SOURCE</span>.
<em>man-pages</em> project
has largely caught up with this change, which requires updates
to around 150 pages.
An automated solution that provided up-to-date information on the
FTM requirements of the functions exposed by glibc,
generated by a tool that parsed the header files,
would be very helpful.
<h2>The problem to solve</h2>
The goal of this project is to construct a parser for glibc
header files that can be used to answer the question:
"Which feature test macro definitions cause the definition of function
<span class="func">foo()</span> to be exposed by the glibc header files?".
(Note that, as shown in many of the pages linked to above,
it is often the case that any of multiple different FTMs can be defined
in order to obtain the declaration of a function from the header files,
and the parser should produce the set
of all of the possible combinations.)
The parser would take account of the C preprocessor conditionals
(<span class="code">#if</span>,
<span class="code">#ifdef</span>)
in the C header files that employ FTM-related macros
in order to generate the desired information.
The parser should work across multiple versions
of glibc and will thus probably need to encode
(perhaps via a table-driven approach)
some of the version-specific logic contained in
<span class="pathname"><a href="">&lt;features.h&gt;</a></span>,
which has steadily evolved with various versions of glibc.
Input for the parser would include:
the glibc header files;
(probably) the glibc version number (since the range of FTMs
and the FTM logic encoded in
<span class="pathname"><a href="">&lt;features.h&gt;</a></span>
have changed across glibc versions); and
a list of names of functions for which FTM requirements are
to be generated.
The parser should output information in a format
that can be easily incorporated into the man pages
(but the step of actually updating the man pages is <em>not</em>
something that is intended to be automated).
Inasmuch as a scripting language will probably be useful
to write the parser, Python is the preferred choice.
That choice is based on the fact that the tool will need to be maintained
and modified as glibc evolves and some existing scripting
tools used by the
<em>man-pages</em> project
also employ Python.
Nevertheless, the choice of another scripting language
might be entertained if there is a good technical justification.
<a href="">Michael Kerrisk</a>
(<em>man-pages</em> project maintainer),
<span class="email"></span>
<strong>Technologies</strong>: (probably) Python, C preprocessor
<strong>Expected results</strong>: a parser
<strong>Knowledge prerequisites</strong>: good knowledge of
the chosen scripting language (probably Python);
basic understanding of the operation of the C preprocessor
<!-- SITETRACKING.linux_man-pages -->
<!-- Start of StatCounter Code -->
<script type="text/javascript">
var sc_project=5618989;
var sc_invisible=1;
var sc_partition=60;
var sc_click_stat=1;
var sc_security="4f8507d7";
<script type="text/javascript"
class="statcounter"><a title="customisable counter"
target="_blank"><img class="statcounter"
src="" alt="customisable
counter" ></a></div></noscript>
<!-- End of StatCounter Code -->