\input texinfo @c -*-texinfo-*- @c %**start of header @setfilename find-maint.info @include versionmaint.texi @settitle Maintaining GNU Findutils @value{VERSION} @c For double-sided printing, uncomment: @c @setchapternewpage odd @c %**end of header @iftex @finalout @end iftex @dircategory GNU organization @direntry * Maintaining Findutils: (find-maint). Maintaining GNU findutils @end direntry @copying This manual explains how GNU findutils is maintained, how changes should be made and tested, and what resources exist to help developers. This document corresponds to version @value{VERSION} of the GNU findutils. Copyright @copyright{} 2007--2022 Free Software Foundation, Inc. @quotation Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled ``GNU Free Documentation License''. @end quotation @end copying @titlepage @title Maintaining GNU Findutils @subtitle version @value{VERSION}, @value{UPDATED} @author by James Youngman @page @vskip 0pt plus 1filll @insertcopying @end titlepage @contents @ifnottex @node Top, Introduction, (dir), (dir) @top Maintaining GNU Findutils @insertcopying @end ifnottex @menu * Introduction:: * Maintaining GNU Programs:: * Design Issues:: * Coding Conventions:: * Tools:: * Using the GNU Portability Library:: * Documentation:: * Testing:: * Bugs:: * Distributions:: * Internationalisation:: * Security:: * Making Releases:: * GNU Free Documentation License:: @end menu @node Introduction @chapter Introduction This document explains how to contribute to and maintain GNU Findutils. It concentrates on developer-specific issues. For information about how to use the software please refer to @xref{Introduction, ,Introduction,find,The Findutils manual}. This manual aims to be useful without necessarily being verbose. It's also a recent document, so there will be a many areas in which improvements can be made. If you find that the document misses out important information or any part of the document is be so terse as to be unuseful, please ask for help on the @email{bug-findutils@@gnu.org} mailing list. We'll try to improve this document too. @node Maintaining GNU Programs @chapter Maintaining GNU Programs GNU Findutils is part of the GNU Project and so there are a number of documents which set out standards for the maintenance of GNU software. @table @file @item standards.texi GNU Project Coding Standards. All changes to findutils should comply with these standards. In some areas we go somewhat beyond the requirements of the standards, but these cases are explained in this manual. @item maintain.texi Information for Maintainers of GNU Software. This document provides guidance for GNU maintainers. Everybody with commit access should read this document. Everybody else is welcome to do so too, of course. @end table @node Design Issues @chapter Design Issues The findutils package is installed on many many systems, usually as a fundamental component. The programs in the package are often used in order to successfully boot or fix the system. This fact means that for findutils we bear in mind considerations that may not apply so much as for other packages. For example, the fact that findutils is often a base component motivates us to @itemize @item Limit dependencies on libraries @item Avoid dependencies on other large packages (for example, interpreters) @item Be conservative when making changes to the 'stable' release branch @end itemize All those considerations come before functionality. Functional enhancements are still made to findutils, but these are almost exclusively introduced in the 'development' release branch, to allow extensive testing and proving. Sometimes it is useful to have a priority list to provide guidance when making design trade-offs. For findutils, that priority list is: @enumerate @item Correctness @item Standards compliance @item Security @item Backward compatibility @item Performance @item Functionality @end enumerate For example, we support the @code{-exec} action because POSIX compliance requires this, even though there are security problems with it and we would otherwise prefer people to use @code{-execdir}. There are also cases where some performance is sacrificed in the name of security. For example, the sanity checks that @code{find} performs while traversing a directory tree may slow it down. We adopt functional changes, and functional changes are allowed to make @code{find} slower, but only if there is no detectable impact on users who don't use the feature. Backward-incompatible changes do get made in order to comply with standards (for example the behaviour of @code{-perm -...} changed in order to comply with POSIX). However, they don't get made in order to provide better ease of use; for example the semantics of @code{-size -2G} are almost always unexpected by users, but we retain the current behaviour because of backward compatibility and for its similarity to the block-rounding behaviour of @code{-size -30}. We might introduce a change which does not have the unfortunate rounding behaviour, but we would choose another syntax (for example @code{-size '<2G'}) for this. In a general sense, we try to do test-driven development of the findutils code; that is, we try to implement test cases for new features and bug fixes before modifying the code to make the test pass. Some features of the code are tested well, but the test coverage for other features is less good. If you are about to modify the code for a predicate and aren't sure about the test coverage, use @code{grep} on the test directories and measure the coverage with @code{lcov} or another test coverage tool. You should be able to use the @code{coverage} Makefile target (it's defined in @code{maint.mk} to generate a test coverage report for findutils. Due to limitations in @code{lcov}, this only works if your build directory is the same asthe source directory (that is, you're not using a VPATH build configuration). Lastly, we try not to depend on having a ``working system''. The findutils suite is used for diagnosis of problems, and this applies especially to @code{find}. We should ensure that @code{find} still works on relatively broken systems, for example systems with damaged @file{/etc/passwd} or @code{/etc/fstab} files. Another interesting example is the case where a system is a client of one or more unresponsive NFS servers. On such a system, if you try to stat all mount points, your program will hang indefinitely, waiting for the remote NFS server to respond. Another interesting but unusual case is broken NFS servers and corrupt filesystems; sometimes they return `impossible' file modes. It's important that find does not entirely fail when encountering such a file. @node Coding Conventions @chapter Coding Conventions Coding style documents which set out to establish a uniform look and feel to source code have worthy goals, for example greater ease of maintenance and readability. However, I do not believe that in general coding style guide authors can envisage every situation, and it is always possible that it might on occasion be necessary to break the letter of the style guide in order to honour its spirit, or to better achieve the style guide's goals. I've certainly seen many style guides outside the free software world which make bald statements such as ``functions shall have exactly one return statement''. The desire to ensure consistency and obviousness of control flow is laudable, but it is all too common for such bald requirements to be followed unthinkingly. Certainly I've seen such coding standards result in unmaintainable code with terrible infelicities such as functions containing @code{if} statements nested nine levels deep. I suppose such coding standards don't survive in free software projects because they tend to drive away potential contributors or tend to generate heated discussions on mailing lists. Equally, a nine-level-deep function in a free software program would quickly get refactored, assuming it is obvious what the function is supposed to do... Be that as it may, the approach I will take for this document is to explain some idioms and practices in use in the findutils source code, and leave it up to the reader's engineering judgement to decide which considerations apply to the code they are working on, and whether or not there is sufficient reason to ignore the guidance in current circumstances. @menu * Make the Compiler Find the Bugs:: * Factor Out Repeated Code:: * Debugging is For Users Too:: * Don't Trust the File System Contents:: * The File System Is Being Modified:: @end menu @node Make the Compiler Find the Bugs @section Make the Compiler Find the Bugs Finding bugs is tedious. If I have a filesystem containing two million files, and a find command line should print one million of them, but in fact it misses out 1%, you can tell the program is printing the wrong result only if you know the right answer for that filesystem at that time. If you don't know this, you may just not find out about that bug. For this reason it is important to have a comprehensive test suite. The test suite is of course not the only way to find the bugs. The findutils source code makes liberal use of the assert macro. While on the one hand these might be a performance drain, the performance impact of most of these is negligible compared to the time taken to fetch even one sector from a disk drive. Assertions should not be used to check the results of operations which may be affected by the program's external environment. For example, never assert that a file could be opened successfully. Errors relating to problems with the program's execution environment should be diagnosed with a user-oriented error message. An assertion failure should always denote a bug in the program. Avoid using @code{assert} to mark not-fully-implemented features of your code as such. Finish the implementation, disable the code, or leave the unfinished version on a local branch. Several programs in the findutils suite perform self-checks. See for example the function @code{pred_sanity_check} in @file{find/pred.c}. This is generally desirable. There are also a number of small ways in which we can help the compiler to find the bugs for us. @subsection Constants in Equality Testing It's a common error to write @code{=} when @code{==} is meant. Sometimes this happens in new code and is simply due to finger trouble. Sometimes it is the result of the inadvertent deletion of a character. In any case, there is a subset of cases where we can persuade the compiler to generate an error message when we make this mistake; this is where the equality test is with a constant. This is an example of a vulnerable piece of code. @example if (x == 2) ... @end example A simple typo converts the above into @example if (x = 2) ... @end example We've introduced a bug; the condition is always true, and the value of @code{x} has been changed. However, a simple change to our practice would have made us immune to this problem: @example if (2 == x) ... @end example Usually, the Emacs keystroke @kbd{M-t} can be used to swap the operands. @subsection Spelling of ASCII NUL Strings in C are just sequences of characters terminated by a NUL. The ASCII NUL character has the numerical value zero. It is normally represented in C code as @samp{\0}. Here is a typical piece of C code: @example *p = '\0'; @end example Consider what happens if there is an unfortunate typo: @example *p = '0'; @end example We have changed the meaning of our program and the compiler cannot diagnose this as an error. Our string is no longer terminated. Bad things will probably happen. It would be better if the compiler could help us diagnose this problem. In C, the type of @code{'\0'} is in fact int, not char. This provides us with a simple way to avoid this error. The constant @code{0} has the same value and type as the constant @code{'\0'}. However, it is not as vulnerable to typos. For this reason I normally prefer to use this code: @example *p = 0; @end example @node Factor Out Repeated Code @section Factor Out Repeated Code Repeated code imposes a greater maintenance burden and increases the exposure to bugs. For example, if you discover that something you want to implement has some similarity with an existing piece of code, don't cut and paste it. Instead, factor the code out. The risk of cutting and pasting the code, particularly if you do this several times, is that you end up with several copies of the same code. If the original code had a bug, you now have N places where this needs to be fixed. It's all to easy to miss some out when trying to fix the bug. Equally, it's quite possible that when pasting the code into some function, the pasted code was not quite adapted correctly to its new environment. To pick a contrived example, perhaps it modifies a global variable which it (that [original] code) shouldn't be touching in its new home. Worse, perhaps it makes some unstated assumption about the nature of the input arguments which is in fact not true for the context of the now duplicated code. A good example of the use of refactoring in findutils is the @code{collect_arg} function in @file{find/parser.c}. The findutils test suite is comprehensive enough that refactoring code should not generally be a daunting prospect from a testing point of view. Nevertheless there are some areas which are only lightly-tested: @enumerate @item Tests on the ages of files @item Code which deals with the values returned by operating system calls (for example handling of ENOENT) @item Code dealing with OS limits (for example, limits on path length or exec arguments) @item Code relating to features not all systems have (for example Solaris Doors) @end enumerate Please exercise caution when working in those areas. @node Debugging is For Users Too @section Debugging is For Users Too Debug and diagnostic code is often used to verify that a program is working in the way its author thinks it should be. But users are often uncertain about what a program is doing, too. Exposing them a little more diagnostic information can help. Much of the diagnostic code in @code{find}, for example, is controlled by the @samp{-D} flag, as opposed to C preprocessor directives. Making diagnostic messages available to users also means that the phrasing of the diagnostic messages becomes important, too. @node Don't Trust the File System Contents @section Don't Trust the File System Contents People use @code{find} to search in directories created by other people. Sometimes they do this to check to suspicious activity (for example to look for new setuid binaries). This means that it would be bad if @code{find} were vulnerable to, say, a security problem exploitable by constructing a specially-crafted filename. The same consideration would apply to @code{locate} and @code{updatedb}. Henry Spencer said this well in his fifth commandment: @quotation Thou shalt check the array bounds of all strings (indeed, all arrays), for surely where thou typest @samp{foo} someone someday shall type @samp{supercalifragilisticexpialidocious}. @end quotation Symbolic links can often be a problem. If @code{find} calls @code{lstat} on something and discovers that it is a directory, it's normal for @code{find} to recurse into it. Even if the @code{chdir} system call is used immediately, there is still a window of opportunity between the @code{lstat} and the @code{chdir} in which a malicious person could rename the directory and substitute a symbolic link to some other directory. @node The File System Is Being Modified @section The File System Is Being Modified The filesystem gets modified while you are traversing it. For, example, it's normal for files to get deleted while @code{find} is traversing a directory. Issuing an error message seems helpful when a file is deleted from the one directory you are interested in, but if @code{find} is searching 15000 directories, such a message becomes less helpful. Bear in mind also that it is possible for the directory @code{find} is searching to be concurrently moved elsewhere in the file system, and that the directory in which @code{find} was started could be deleted. Henry Spencer's sixth commandment is also apposite here: @quotation If a function be advertised to return an error code in the event of difficulties, thou shalt check for that code, yea, even though the checks triple the size of thy code and produce aches in thy typing fingers, for if thou thinkest ``it cannot happen to me'', the gods shall surely punish thee for thy arrogance. @end quotation There are a lot of files out there. They come in all dates and sizes. There is a condition out there in the real world to exercise every bit of the code base. So we try to test that code base before someone falls over a bug. @node Tools @chapter Tools Most of the tools required to build findutils are mentioned in the file @file{README-hacking}. We also use some other tools: @table @asis @item System call traces Much of the execution time of find is spent waiting for filesystem operations. A system call trace (for example, that provided by @code{strace}) shows what system calls are being made. Using this information we can work to remove unnecessary file system operations. @item Valgrind Valgrind is a tool which dynamically verifies the memory accesses a program makes to ensure that they are valid (for example, that the behaviour of the program does not in any way depend on the contents of uninitialized memory). @item DejaGnu DejaGnu is the test framework used to run the findutils test suite (the @code{runtest} program is part of DejaGnu). It would be ideal if everybody building @code{findutils} also ran the test suite, but many people don't have DejaGnu installed. When changes are made to findutils, DejaGnu is invoked a lot. @xref{Testing}, for more information. @end table @node Using the GNU Portability Library @chapter Using the GNU Portability Library The Gnulib library (@url{https://www.gnu.org/software/gnulib/}) makes a variety of systems look more like a GNU/Linux system and also applies a bunch of automatic bug fixes and workarounds. Some of these also apply to GNU/Linux systems too. For example, the Gnulib regex implementation is used when we determine that we are building on a GNU libc system with a bug in the regex implementation. @section How and Why we Import the Gnulib Code Gnulib does not have a release process which results in a source tarball you can download. Instead, the code is simply made available by GIT, so we import gnulib via the submodule feature. The bootstrap script performs the necessary steps. Findutils does not use all the Gnulib code. The modules we need are listed in the file @file{bootstrap.conf}. The upshot of all this is that we can use the findutils git repository to track which version of Gnulib every findutils release uses. A small number of files are installed by automake and will therefore vary according to which version of automake was used to generate a release. This includes for example boiler-plate GNU files such as @file{ABOUT-NLS}, @file{INSTALL} and @file{COPYING}. @section How We Fix Gnulib Bugs Gnulib is used by quite a number of GNU projects, and this means that it gets plenty of testing. Therefore there are relatively few bugs in the Gnulib code, but it does happen from time to time. However, since there is no waiting around for a Gnulib source release tarball, Gnulib bugs are generally fixed quickly. Here is an outline of the way we would contribute a fix to Gnulib (assuming you know it is not already fixed in the current Gnulib git tree): @table @asis @item Check you already completed a copyright assignment for Gnulib @item Begin with a vanilla git tree Download the Findutils source code from git (or use the tree you have already) @item Run the bootstrap script @item Run configure @item Build findutils Build findutils and run the test suite, which should pass. In our example we assume you have just noticed a bug in Gnulib, not that recent Gnulib changes broke the findutils regression tests. @item Write a test case If in fact Gnulib did break the findutils regression tests, you can probably skip this step, since you already have a test case demonstrating the problem. Otherwise, write a findutils test case for the bug and/or a Gnulib test case. @item Fix the Gnulib bug Make sure your editor follows symbolic links so that your changes to @file{gnulib/...} actually affect the files in the git working directory you checked out earlier. Observe that your test now passes. @item Prepare a Gnulib patch In the gnulib subdirectory, use @code{git format-patch} to prepare the patch. Follow the normal usage for checkin comments (take a look at the output of @code{git log}). Check that the patch conforms with the GNU coding standards, and email it to the Gnulib mailing list. @item Wait for the patch to be applied Once your bug fix has been applied, you can update your gnulib directory from git, and then check in the change to the submodule as normal (you can check @code{git help submodule} for details). @end table There is an alternative to the method above; it is possible to store local diffs to be patched into gnulib beneath the @file{gnulib-local}. Normally however, there is no need for this, since gnulib updates are very prompt. @section How to update Gnulib to latest With a non-dirty working tree, the command @code{make update-gnulib-to-latest} (or the shorter alias @code{make gnulib-sync} allows, well, to update the gnulib submodule. In detail, that is: @enumerate @item Fetching the latest upstream gnulib reference. @item Copying the files which should stay in sync like @file{bootstrap} from gnulib into the findutils working tree. @item And finally showing the @code{git status} for the gnulib submodule and the above copied files. @end enumerate After that, the maintainer compares if all is correct, if the findutils build and run correct, and finally commits with the new gnulib version, e.g. via @code{git gui}. The @code{gnulib-sync} target can be run any time - after a @code{configure} run -, and only rejects to run if the working tree is dirty. @node Documentation @chapter Documentation The findutils git tree includes several different types of documentation. @section git change log The git change log for the source tree contains check-in messages which describe each check-in. These have a standard format: @smallexample Summary of the change. (ChangeLog-style detail) @end smallexample Here, the format of the detail part follows the standard GNU ChangeLog style, but without whitespace in the left margin and without author/date headers. Take a look at the output of @code{git log} to see some examples. The README-hacking file also contains an example with an explanation. @section User Documentation User-oriented documentation is provided as manual pages and in Texinfo. See @ref{Introduction,,Introduction,find,The Findutils manual}. Please make sure both sets of documentation are updated if you make a change to the code. The GNU coding standards do not normally call for maintaining manual pages on the grounds of effort duplication. However, the manual page format is more convenient for quick reference, and so it's worth maintaining both types of documentation. However, the manual pages are normally rather more terse than the Texinfo documentation. The manual pages are suitable for reference use, but the Texinfo manual should also include introductory and tutorial material. We make the user documentation available on the web, on the GNU project web site. These web pages are source-controlled via CVS (still!). If you are a member of the @samp{findutils} project on Savannah you should be able to check the web pages out like this (@samp{$USER} is a placeholder for your Savannah username): @smallexample cvs -d :ext:$USER@@cvs.savannah.gnu.org:/web/findutils checkout findutils/manual @end smallexample You can automatically update the documentation in this repository by using the script @samp{build-aux/update-online-manual.sh} with the path to the findutils Git repository as parameter. @smallexample build-aux/update-online-manual.sh $HOME/git/findutils @end smallexample That script will generate the documentation in the directory @samp{doc/manual/} by calling the @code{make} target @samp{web-manual}; then it will copy over the files into the CVS checkout. There you can check the documentation once again before committing to CVS. The Savannah CVS server will automatically initiate the transfer to the web server. @section Build Guidance @table @file @item ABOUT-NLS Describes the Free Translation Project, the translation status of various GNU projects, and how to participate by translating an application. @item AUTHORS Lists the authors of findutils. @item COPYING The copyright license covering findutils; currently, the GNU GPL, version 3. @item INSTALL Generic installation instructions for installing GNU programs. @item README Information about how to compile findutils in particular @item README-hacking Describes how to build findutils from the code in git. @item THANKS Thanks for people who contributed to findutils. Generally, if someone's contribution was significant enough to need a copyright assignment, their name should go in here. @item TODO Mainly obsolete. Please add bugs to the Savannah bug tracker instead of adding entries to this file. @end table @section Release Information @table @file @item NEWS Enumerates the user-visible change in each release. Typical changes are fixed bugs, functionality changes and documentation changes. Include the date when a release is made. @item ChangeLog This file enumerates all changes to the findutils source code (with the possible exception of @file{.cvsignore} and @code{.gitignore} changes). The level of detail used for this file should be sufficient to answer the questions ``what changed?'' and ``why was it changed?''. The file is generated from the git commit messages during @code{make dist}. If a change fixes a bug, always give the bug reference number in the @file{NEWS} file and of course also in the checkin message. In general, it should be possible to enumerate all material changes to a function by searching for its name in @file{ChangeLog}. Mention when each release is made. @end table @node Testing @chapter Testing This chapter will explain the general procedures for adding tests to the test suite, and the functions defined in the findutils-specific DejaGnu configuration. Where appropriate references will be made to the DejaGnu documentation. @node Bugs @chapter Bugs Bugs are logged in the Savannah bug tracker @url{https://savannah.gnu.org/bugs/?group=findutils}. The tracker offers several fields but their use is largely obvious. The life-cycle of a bug is like this: @table @asis @item Open Someone, usually a maintainer, a distribution maintainer or a user, creates a bug by filling in the form. They fill in field values as they see fit. This will generate an email to @email{bug-findutils@@gnu.org}. @item Triage The bug hangs around with @samp{Status=None} until someone begins to work on it. At that point they set the ``Assigned To'' field and will sometimes set the status to @samp{In Progress}, especially if the bug will take a while to fix. @item Non-bugs Quite a lot of reports are not actually bugs; for these the usual procedure is to explain why the problem is not a bug, set the status to @samp{Invalid} and close the bug. Make sure you set the @samp{Assigned to} field to yourself before closing the bug. @item Fixing When you commit a bug fix into git (or in the case of a contributed patch, commit the change), mark the bug as @samp{Fixed}. Make sure you include a new test case where this is relevant. If you can figure out which releases are affected, please also set the @samp{Release} field to the earliest release which is affected by the bug. Indicate which source branch the fix is included in (for example, 4.2.x or 4.3.x). Don't close the bug yet. @item Release When a release is made which includes the bug fix, make sure the bug is listed in the NEWS file. Once the release is made, fill in the @samp{Fixed Release} field and close the bug. @end table @node Distributions @chapter Distributions Almost all GNU/Linux distributions include findutils, but only some of them have a package maintainer who is a member of the mailing list. Distributions don't often feed back patches to the @email{bug-findutils@@gnu.org} list, but on the other hand many of their patches relate only to standards for file locations and so forth, and are therefore distribution specific. On an irregular basis I check the current patches being used by one or two distributions, but the total number of GNU/Linux distributions is large enough that we could not hope to cover them all. Often, bugs are raised against a distribution's bug tracker instead of GNU's. Periodically (about every six months) I take a look at some of the more accessible bug trackers to indicate which bugs have been fixed upstream. Many distributions include both findutils and the slocate package, which provides a replacement @code{locate}. @node Internationalisation @chapter Internationalisation Translation is essentially automated from the maintainer's point of view. The TP mails the maintainer when a new PO file is available, and we just download it and check it in. The @file{bootstrap} script copies @file{.po} files into the working tree. For more information, please see @url{https://translationproject.org/domain/findutils.html}. @node Security @chapter Security See @ref{Security Considerations, ,Security Considerations,find,The Findutils manual}, for a full description of the findutils approach to security considerations and discussion of particular tools. If someone reports a security bug publicly, we should fix this as rapidly as possible. If necessary, this can mean issuing a fixed release containing just the one bug fix. We try to avoid issuing releases which include both significant security fixes and functional changes. Where someone reports a security problem privately, we generally try to construct and test a patch without pushing the intermediate code to the public repository. Once everything has been tested, this allows us to make a release and push the patch. The advantage of doing things this way is that we avoid situations where people watching for git commits can figure out and exploit a security problem before a fixed release is available. It's important that security problems be fixed promptly, but don't rush so much that things go wrong. Make sure the new release really fixes the problem. It's usually best not to include functional changes in your security-fix release. If the security problem is serious, send an alert to @email{vendor-sec@@lst.de}. The members of the list include most GNU/Linux distributions. The point of doing this is to allow them to prepare to release your security fix to their customers, once the fix becomes available. Here is an example alert:- @smallexample GNU findutils heap buffer overrun (potential privilege escalation) I. BACKGROUND ============= GNU findutils is a set of programs which search for files on Unix-like systems. It is maintained by the GNU Project of the Free Software Foundation. For more information, see @url{https://www.gnu.org/software/findutils}. II. DESCRIPTION =============== When GNU locate reads filenames from an old-format locate database, they are read into a fixed-length buffer allocated on the heap. Filenames longer than the 1026-byte buffer can cause a buffer overrun. The overrunning data can be chosen by any person able to control the names of filenames created on the local system. This will normally include all local users, but in many cases also remote users (for example in the case of FTP servers allowing uploads). III. ANALYSIS ============= Findutils supports three different formats of locate database, its native format "LOCATE02", the slocate variant of LOCATE02, and a traditional ("old") format that locate uses on other Unix systems. When locate reads filenames from a LOCATE02 database (the default format), the buffer into which data is read is automatically extended to accommodate the length of the filenames. This automatic buffer extension does not happen for old-format databases. Instead a 1026-byte buffer is used. When a longer pathname appears in the locate database, the end of this buffer is overrun. The buffer is allocated on the heap (not the stack). If the locate database is in the default LOCATE02 format, the locate program does perform automatic buffer extension, and the program is not vulnerable to this problem. The software used to build the old-format locate database is not itself vulnerable to the same attack. Most installations of GNU findutils do not use the old database format, and so will not be vulnerable. IV. DETECTION ============= Software -------- All existing releases of findutils are affected. Installations ------------- To discover the longest path name on a given system, you can use the following command (requires GNU findutils and GNU coreutils): @verbatim find / -print0 | tr -c '\0' 'x' | tr '\0' '\n' | wc -L @end verbatim V. EXAMPLE ========== This section includes a shell script which determines which of a list of locate binaries is vulnerable to the problem. The shell script has been tested only on glibc based systems having a mktemp binary. NOTE: This script deliberately overruns the buffer in order to determine if a binary is affected. Therefore running it on your system may have undesirable effects. We recommend that you read the script before running it. @verbatim #! /bin/sh set +m if vanilla_db="$(mktemp nicedb.XXXXXX)" ; then if updatedb --prunepaths="" --old-format --localpaths="/tmp" \ --output="$@{vanilla_db@}" ; then true else rm -f "$@{vanilla_db@}" vanilla_db="" echo "Failed to create old-format locate database; skipping the sanity checks" >&2 fi fi make_overrun_db() @{ # Start with a valid database cat "$@{vanilla_db@}" # Make the final entry really long dd if=/dev/zero bs=1 count=1500 2>/dev/null | tr '\000' 'x' @} ulimit -c 0 usage() @{ echo "usage: $0 binary [binary...]" >&2; exit $1; @} [ $# -eq 0 ] && usage 1 bad="" good="" ugly="" if dbfile="$(mktemp nasty.XXXXXX)" then make_overrun_db > "$dbfile" for locate ; do ver="$locate = $("$locate" --version | head -1)" if [ -z "$vanilla_db" ] || "$locate" -d "$vanilla_db" "" >/dev/null ; then "$locate" -d "$dbfile" "" >/dev/null if [ $? -gt 128 ] ; then bad="$bad vulnerable: $ver" else good="$good good: $ver" fi else # the regular locate failed ugly="$ugly buggy, may or may not be vulnerable: $ver" fi done rm -f "$@{dbfile@}" "$@{vanilla_db@}" # good: unaffected. bad: affected (vulnerable). # ugly: doesn't even work for a normal old-format database. echo "$good" echo "$bad" echo "$ugly" else exit 1 fi @end verbatim VI. VENDOR RESPONSE =================== The GNU project discovered the problem while 'locate' was being worked on; this is the first public announcement of the problem. The GNU findutils mantainer has issued a patch as p[art of this announcement. The patch appears below. A source release of findutils-4.2.31 will be issued on 2007-05-30. That release will of course include the patch. The patch will be committed to the public CVS repository at the same time. Public announcements of the release, including a description of the bug, will be made at the same time as the release. A release of findutils-4.3.x will follow and will also include the patch. VII. PATCH ========== This patch should apply to findutils-4.2.23 and later. Findutils-4.2.23 was released almost two years ago. @verbatim Index: locate/locate.c =================================================================== RCS file: /cvsroot/findutils/findutils/locate/locate.c,v retrieving revision 1.58.2.2 diff -u -p -r1.58.2.2 locate.c --- locate/locate.c 22 Apr 2007 16:57:42 -0000 1.58.2.2 +++ locate/locate.c 28 May 2007 10:18:16 -0000 @@@@ -124,9 +124,9 @@@@ extern int errno; #include "locatedb.h" #include -#include "../gnulib/lib/xalloc.h" -#include "../gnulib/lib/error.h" -#include "../gnulib/lib/human.h" +#include "xalloc.h" +#include "error.h" +#include "human.h" #include "dirname.h" #include "closeout.h" #include "nextelem.h" @@@@ -468,10 +468,36 @@@@ visit_justprint_unquoted(struct process_ return VISIT_CONTINUE; @} +static void +toolong (struct process_data *procdata) +@{ + error (EXIT_FAILURE, 0, + _("locate database %s contains a " + "filename longer than locate can handle"), + procdata->dbfile); +@} + +static void +extend (struct process_data *procdata, size_t siz1, size_t siz2) +@{ + /* Figure out if the addition operation is safe before performing it. */ + if (SIZE_MAX - siz1 < siz2) + @{ + toolong (procdata); + @} + else if (procdata->pathsize < (siz1+siz2)) + @{ + procdata->pathsize = siz1+siz2; + procdata->original_filename = x2nrealloc (procdata->original_filename, + &procdata->pathsize, + 1); + @} +@} + static int visit_old_format(struct process_data *procdata, void *context) @{ - register char *s; + register size_t i; (void) context; /* Get the offset in the path where this path info starts. */ @@@@ -479,20 +505,35 @@@@ visit_old_format(struct process_data *pr procdata->count += getw (procdata->fp) - LOCATEDB_OLD_OFFSET; else procdata->count += procdata->c - LOCATEDB_OLD_OFFSET; + assert(procdata->count > 0); - /* Overlay the old path with the remainder of the new. */ - for (s = procdata->original_filename + procdata->count; + /* Overlay the old path with the remainder of the new. Read + * more data until we get to the next filename. + */ + for (i=procdata->count; (procdata->c = getc (procdata->fp)) > LOCATEDB_OLD_ESCAPE;) - if (procdata->c < 0200) - *s++ = procdata->c; /* An ordinary character. */ - else - @{ - /* Bigram markers have the high bit set. */ - procdata->c &= 0177; - *s++ = procdata->bigram1[procdata->c]; - *s++ = procdata->bigram2[procdata->c]; - @} - *s-- = '\0'; + @{ + if (procdata->c < 0200) + @{ + /* An ordinary character. */ + extend (procdata, i, 1u); + procdata->original_filename[i++] = procdata->c; + @} + else + @{ + /* Bigram markers have the high bit set. */ + extend (procdata, i, 2u); + procdata->c &= 0177; + procdata->original_filename[i++] = procdata->bigram1[procdata->c]; + procdata->original_filename[i++] = procdata->bigram2[procdata->c]; + @} + @} + + /* Consider the case where we executed the loop body zero times; we + * still need space for the terminating null byte. + */ + extend (procdata, i, 1u); + procdata->original_filename[i] = 0; procdata->munged_filename = procdata->original_filename; @end verbatim VIII. THANKS ============ Thanks to Rob Holland and Tavis Ormandy. VIII. CVE INFORMATION ===================== No CVE candidate number has yet been assigned for this vulnerability. If someone provides one, I will include it in the public announcement and change logs. @end smallexample The original announcement above was sent out with a cleartext PGP signature, of course, but that has been omitted from the example. Once a fixed release is available, announce the new release using the normal channels. Any CVE number assigned for the problem should be included in the @file{ChangeLog} and @file{NEWS} entries. See @url{https://cve.mitre.org/} for an explanation of CVE numbers. @node Making Releases @chapter Making Releases This section will explain how to make a findutils release. For the time being here is a terse description of the main steps: @set RELEASE X.Y.Z @set RELTAG v@value{RELEASE} @enumerate @item Commit changes; make sure your working directory has no uncommitted changes. @item Update translation files; re-run bootstrap to download the newest @samp{.po} files. @item Make sure compiler warnings would block the release; re-run @samp{configure} with the options @code{--enable-compiler-warnings --enable-compiler-warnings-are-errors}. @item Test; make sure that all changes you have made have tests, and that the tests pass. Verify this with @code{env RUN_EXPENSIVE_TESTS=yes make distcheck}. @c The RUN_EXPENSIVE_TESTS environment variable is checked in init.cfg. @item Bugs; make sure all Savannah bug entries fixed in this release are marked as fixed in Savannah. Optionally close them too to save duplicate work (otherwise, close them after the release is uploaded). @item Add new release in Savannah field values; see the @code{Bugs > Edit Field Values} menu item. Add a field value for the release you are about to make so that users can report bugs in it. @item Update version; make sure that the NEWS file is updated with the new release number (and checked in). @c There is no longer any need to update configure.ac, since it no @c longer contains version information. @item Tag the release; findutils releases are tagged like this for example: v4.5.5. You can create a tag with the a command like this: @c we use @example here because @value will not work within @code or @samp. @example git tag -s -m "Findutils release @value{RELEASE}" @value{RELTAG} @end example @noindent @item Build the release tarball; do this with @code{make distcheck}. Copy the tarball somewhere safe. @item Merge; if the release (and signed tag) were made on a local branch, merge the branch to your local master. @item Push; push your master to origin/master. @item Push the new release tag; assuming that the name of your remote is @samp{origin}, this is: @example git push origin tag @value{RELTAG} @end example @item Prepare the upload and upload it. You can do this with @c we use @example here because @value will not work within @code or @samp. @example build-aux/gnupload --to ftp.gnu.org:findutils findutils-@value{RELEASE}.tar.xz @end example @noindent Use @code{alpha.gnu.org:findutils} for an alpha or beta release. @xref{Automated FTP Uploads, ,Automated FTP Uploads, maintain, Information for Maintainers of GNU Software}, for detailed upload instructions. @item Check the FTP upload worked; you can look for an email from the robot or check the contents of the actual FTP site. @item Make a release announcement; include an extract from the NEWS file which explains what's changed. Announcements for test releases should just go to @email{bug-findutils@@gnu.org}. Announcements for stable releases should go to @email{info-gnu@@gnu.org} as well. @item Post-release administrativa: add a new dummy release header in NEWS: @code{* Major changes in release ?.?.?, YYYY-MM-DD} and update the @code{old_NEWS_hash} in @file{cfg.mk} with @code{make update-NEWS-hash}. Commit both changes. @c make update-NEWS-hash supports make news-check but we normally @c don't do that (and I'm not sure that the current NEWS file would @c pass the check anyway). @item Close bugs; any bugs recorded on Savannah which were fixed in this release should now be marked as closed if there were not already. Update the @samp{Fixed Release} field of these bugs appropriately and make sure the @samp{Assigned to} field is populated. @end enumerate @node GNU Free Documentation License @appendix GNU Free Documentation License @include fdl.texi @bye @comment texi related words used by Emacs' spell checker ispell.el @comment LocalWords: texinfo setfilename settitle setchapternewpage @comment LocalWords: iftex finalout ifinfo DIR titlepage vskip pt @comment LocalWords: filll dir samp dfn noindent xref pxref @comment LocalWords: var deffn texi deffnx itemx emph asis @comment LocalWords: findex smallexample subsubsection cindex @comment LocalWords: dircategory direntry itemize @comment other words used by Emacs' spell checker ispell.el @comment LocalWords: README fred updatedb xargs Plett Rendell akefile @comment LocalWords: args grep Filesystems fo foo fOo wildcards iname @comment LocalWords: ipath regex iregex expr fubar regexps @comment LocalWords: metacharacters macs sr sc inode lname ilname @comment LocalWords: sysdep noleaf ls inum xdev filesystems usr atime @comment LocalWords: ctime mtime amin cmin mmin al daystart Sladkey rm @comment LocalWords: anewer cnewer bckw rf xtype uname gname uid gid @comment LocalWords: nouser nogroup chown chgrp perm ch maxdepth @comment LocalWords: mindepth cpio src CD AFS statted stat fstype ufs @comment LocalWords: nfs tmp mfs printf fprint dils rw djm Nov lwall @comment LocalWords: POSIXLY fls fprintf strftime locale's EDT GMT AP @comment LocalWords: EST diff perl backquotes sprintf Falstad Oct cron @comment LocalWords: eg vmunix mkdir afs allexec allwrite ARG bigram @comment LocalWords: bigrams cd chmod comp crc CVS dbfile eof @comment LocalWords: fileserver filesystem fn frcode Ghazi Hnewc iXX @comment LocalWords: joeuser Kaveh localpaths localuser LOGNAME @comment LocalWords: Meyering mv netpaths netuser nonblank nonblanks @comment LocalWords: ois ok Pinard printindex proc procs prunefs @comment LocalWords: prunepaths pwd RFS rmadillo rmdir rsh sbins str @comment LocalWords: su Timar ubins ug unstripped vf VM Weitzel @comment LocalWords: wildcard zlogout basename execdir wholename iwholename @comment LocalWords: timestamp timestamps Solaris FreeBSD OpenBSD POSIX