blob: b23abe7ffc777857dfbcd1d663ed8b9af513efaf [file] [log] [blame]
NAME
parse_date - parses a date string into a timespec struct.
SYNOPSIS
#include "timeutils.h"
int parse_date(struct timespec *result, char const *p,
struct timespec const *now)
LDADD libcommon.la
DESCRIPTION
Parse a date/time string, storing the resulting time value into *result.
The string itself is pointed to by *p. Return 1 if successful.
*p can be an incomplete or relative time specification; if so, use
*now as the basis for the returned time.
This function is based upon gnulib's parse-datetime.y-dd7a871.
Below is a plain text version of the gnulib parse-datetime.texi-dd7a871 manual
describing the input strings that are recognized.
Any future modifications to the util-linux parser that affect input strings
should be noted below.
1 Date input formats
********************
First, a quote:
Our units of temporal measurement, from seconds on up to months,
are so complicated, asymmetrical and disjunctive so as to make
coherent mental reckoning in time all but impossible. Indeed, had
some tyrannical god contrived to enslave our minds to time, to
make it all but impossible for us to escape subjection to sodden
routines and unpleasant surprises, he could hardly have done
better than handing down our present system. It is like a set of
trapezoidal building blocks, with no vertical or horizontal
surfaces, like a language in which the simplest thought demands
ornate constructions, useless particles and lengthy
circumlocutions. Unlike the more successful patterns of language
and science, which enable us to face experience boldly or at least
level-headedly, our system of temporal calculation silently and
persistently encourages our terror of time.
... It is as though architects had to measure length in feet,
width in meters and height in ells; as though basic instruction
manuals demanded a knowledge of five different languages. It is
no wonder then that we often look into our own immediate past or
future, last Tuesday or a week from Sunday, with feelings of
helpless confusion. ...
--Robert Grudin, `Time and the Art of Living'.
This section describes the textual date representations that GNU
programs accept. These are the strings you, as a user, can supply as
arguments to the various programs. The C interface (via the
`parse_datetime' function) is not described here.
1.1 General date syntax
=======================
A "date" is a string, possibly empty, containing many items separated
by whitespace. The whitespace may be omitted when no ambiguity arises.
The empty string means the beginning of today (i.e., midnight). Order
of the items is immaterial. A date string may contain many flavors of
items:
* calendar date items
* time of day items
* time zone items
* combined date and time of day items
* day of the week items
* relative items
* pure numbers.
We describe each of these item types in turn, below.
A few ordinal numbers may be written out in words in some contexts.
This is most useful for specifying day of the week items or relative
items (see below). Among the most commonly used ordinal numbers, the
word `last' stands for -1, `this' stands for 0, and `first' and `next'
both stand for 1. Because the word `second' stands for the unit of
time there is no way to write the ordinal number 2, but for convenience
`third' stands for 3, `fourth' for 4, `fifth' for 5, `sixth' for 6,
`seventh' for 7, `eighth' for 8, `ninth' for 9, `tenth' for 10,
`eleventh' for 11 and `twelfth' for 12.
When a month is written this way, it is still considered to be
written numerically, instead of being "spelled in full"; this changes
the allowed strings.
In the current implementation, only English is supported for words
and abbreviations like `AM', `DST', `EST', `first', `January',
`Sunday', `tomorrow', and `year'.
The output of the `date' command is not always acceptable as a date
string, not only because of the language problem, but also because
there is no standard meaning for time zone items like `IST'. When using
`date' to generate a date string intended to be parsed later, specify a
date format that is independent of language and that does not use time
zone items other than `UTC' and `Z'. Here are some ways to do this:
$ LC_ALL=C TZ=UTC0 date
Mon Mar 1 00:21:42 UTC 2004
$ TZ=UTC0 date +'%Y-%m-%d %H:%M:%SZ'
2004-03-01 00:21:42Z
$ date --rfc-3339=ns # --rfc-3339 is a GNU extension.
2004-02-29 16:21:42.692722128-08:00
$ date --rfc-2822 # a GNU extension
Sun, 29 Feb 2004 16:21:42 -0800
$ date +'%Y-%m-%d %H:%M:%S %z' # %z is a GNU extension.
2004-02-29 16:21:42 -0800
$ date +'@%s.%N' # %s and %N are GNU extensions.
@1078100502.692722128
Alphabetic case is completely ignored in dates. Comments may be
introduced between round parentheses, as long as included parentheses
are properly nested. Hyphens not followed by a digit are currently
ignored. Leading zeros on numbers are ignored.
Invalid dates like `2005-02-29' or times like `24:00' are rejected.
In the typical case of a host that does not support leap seconds, a
time like `23:59:60' is rejected even if it corresponds to a valid leap
second.
1.2 Calendar date items
=======================
A "calendar date item" specifies a day of the year. It is specified
differently, depending on whether the month is specified numerically or
literally. All these strings specify the same calendar date:
1972-09-24 # ISO 8601.
72-9-24 # Assume 19xx for 69 through 99,
# 20xx for 00 through 68.
72-09-24 # Leading zeros are ignored.
9/24/72 # Common U.S. writing.
24 September 1972
24 Sept 72 # September has a special abbreviation.
24 Sep 72 # Three-letter abbreviations always allowed.
Sep 24, 1972
24-sep-72
24sep72
The year can also be omitted. In this case, the last specified year
is used, or the current year if none. For example:
9/24
sep 24
Here are the rules.
For numeric months, the ISO 8601 format `YEAR-MONTH-DAY' is allowed,
where YEAR is any positive number, MONTH is a number between 01 and 12,
and DAY is a number between 01 and 31. A leading zero must be present
if a number is less than ten. If YEAR is 68 or smaller, then 2000 is
added to it; otherwise, if YEAR is less than 100, then 1900 is added to
it. The construct `MONTH/DAY/YEAR', popular in the United States, is
accepted. Also `MONTH/DAY', omitting the year.
Literal months may be spelled out in full: `January', `February',
`March', `April', `May', `June', `July', `August', `September',
`October', `November' or `December'. Literal months may be abbreviated
to their first three letters, possibly followed by an abbreviating dot.
It is also permitted to write `Sept' instead of `September'.
When months are written literally, the calendar date may be given as
any of the following:
DAY MONTH YEAR
DAY MONTH
MONTH DAY YEAR
DAY-MONTH-YEAR
Or, omitting the year:
MONTH DAY
1.3 Time of day items
=====================
A "time of day item" in date strings specifies the time on a given day.
Here are some examples, all of which represent the same time:
20:02:00.000000
20:02
8:02pm
20:02-0500 # In EST (U.S. Eastern Standard Time).
More generally, the time of day may be given as
`HOUR:MINUTE:SECOND', where HOUR is a number between 0 and 23, MINUTE
is a number between 0 and 59, and SECOND is a number between 0 and 59
possibly followed by `.' or `,' and a fraction containing one or more
digits. Alternatively, `:SECOND' can be omitted, in which case it is
taken to be zero. On the rare hosts that support leap seconds, SECOND
may be 60.
If the time is followed by `am' or `pm' (or `a.m.' or `p.m.'), HOUR
is restricted to run from 1 to 12, and `:MINUTE' may be omitted (taken
to be zero). `am' indicates the first half of the day, `pm' indicates
the second half of the day. In this notation, 12 is the predecessor of
1: midnight is `12am' while noon is `12pm'. (This is the zero-oriented
interpretation of `12am' and `12pm', as opposed to the old tradition
derived from Latin which uses `12m' for noon and `12pm' for midnight.)
The time may alternatively be followed by a time zone correction,
expressed as `SHHMM', where S is `+' or `-', HH is a number of zone
hours and MM is a number of zone minutes. The zone minutes term, MM,
may be omitted, in which case the one- or two-digit correction is
interpreted as a number of hours. You can also separate HH from MM
with a colon. When a time zone correction is given this way, it forces
interpretation of the time relative to Coordinated Universal Time
(UTC), overriding any previous specification for the time zone or the
local time zone. For example, `+0530' and `+05:30' both stand for the
time zone 5.5 hours ahead of UTC (e.g., India). This is the best way to
specify a time zone correction by fractional parts of an hour. The
maximum zone correction is 24 hours.
Either `am'/`pm' or a time zone correction may be specified, but not
both.
1.4 Time zone items
===================
A "time zone item" specifies an international time zone, indicated by a
small set of letters, e.g., `UTC' or `Z' for Coordinated Universal
Time. Any included periods are ignored. By following a
non-daylight-saving time zone by the string `DST' in a separate word
(that is, separated by some white space), the corresponding daylight
saving time zone may be specified. Alternatively, a
non-daylight-saving time zone can be followed by a time zone
correction, to add the two values. This is normally done only for
`UTC'; for example, `UTC+05:30' is equivalent to `+05:30'.
Time zone items other than `UTC' and `Z' are obsolescent and are not
recommended, because they are ambiguous; for example, `EST' has a
different meaning in Australia than in the United States. Instead,
it's better to use unambiguous numeric time zone corrections like
`-0500', as described in the previous section.
If neither a time zone item nor a time zone correction is supplied,
timestamps are interpreted using the rules of the default time zone
(*note Specifying time zone rules::).
1.5 Combined date and time of day items
=======================================
The ISO 8601 date and time of day extended format consists of an ISO
8601 date, a `T' character separator, and an ISO 8601 time of day.
This format is also recognized if the `T' is replaced by a space.
In this format, the time of day should use 24-hour notation.
Fractional seconds are allowed, with either comma or period preceding
the fraction. ISO 8601 fractional minutes and hours are not supported.
Typically, hosts support nanosecond timestamp resolution; excess
precision is silently discarded.
Here are some examples:
2012-09-24T20:02:00.052-05:00
2012-12-31T23:59:59,999999999+11:00
1970-01-01 00:00Z
1.6 Day of week items
=====================
The explicit mention of a day of the week will forward the date (only
if necessary) to reach that day of the week in the future.
Days of the week may be spelled out in full: `Sunday', `Monday',
`Tuesday', `Wednesday', `Thursday', `Friday' or `Saturday'. Days may
be abbreviated to their first three letters, optionally followed by a
period. The special abbreviations `Tues' for `Tuesday', `Wednes' for
`Wednesday' and `Thur' or `Thurs' for `Thursday' are also allowed.
A number may precede a day of the week item to move forward
supplementary weeks. It is best used in expression like `third
monday'. In this context, `last DAY' or `next DAY' is also acceptable;
they move one week before or after the day that DAY by itself would
represent.
A comma following a day of the week item is ignored.
1.7 Relative items in date strings
==================================
"Relative items" adjust a date (or the current date if none) forward or
backward. The effects of relative items accumulate. Here are some
examples:
1 year
1 year ago
3 years
2 days
The unit of time displacement may be selected by the string `year'
or `month' for moving by whole years or months. These are fuzzy units,
as years and months are not all of equal duration. More precise units
are `fortnight' which is worth 14 days, `week' worth 7 days, `day'
worth 24 hours, `hour' worth 60 minutes, `minute' or `min' worth 60
seconds, and `second' or `sec' worth one second. An `s' suffix on
these units is accepted and ignored.
The unit of time may be preceded by a multiplier, given as an
optionally signed number. Unsigned numbers are taken as positively
signed. No number at all implies 1 for a multiplier. Following a
relative item by the string `ago' is equivalent to preceding the unit
by a multiplier with value -1.
The string `tomorrow' is worth one day in the future (equivalent to
`day'), the string `yesterday' is worth one day in the past (equivalent
to `day ago').
The strings `now' or `today' are relative items corresponding to
zero-valued time displacement, these strings come from the fact a
zero-valued time displacement represents the current time when not
otherwise changed by previous items. They may be used to stress other
items, like in `12:00 today'. The string `this' also has the meaning
of a zero-valued time displacement, but is preferred in date strings
like `this thursday'.
When a relative item causes the resulting date to cross a boundary
where the clocks were adjusted, typically for daylight saving time, the
resulting date and time are adjusted accordingly.
The fuzz in units can cause problems with relative items. For
example, `2003-07-31 -1 month' might evaluate to 2003-07-01, because
2003-06-31 is an invalid date. To determine the previous month more
reliably, you can ask for the month before the 15th of the current
month. For example:
$ date -R
Thu, 31 Jul 2003 13:02:39 -0700
$ date --date='-1 month' +'Last month was %B?'
Last month was July?
$ date --date="$(date +%Y-%m-15) -1 month" +'Last month was %B!'
Last month was June!
Also, take care when manipulating dates around clock changes such as
daylight saving leaps. In a few cases these have added or subtracted
as much as 24 hours from the clock, so it is often wise to adopt
universal time by setting the `TZ' environment variable to `UTC0'
before embarking on calendrical calculations.
1.8 Pure numbers in date strings
================================
The precise interpretation of a pure decimal number depends on the
context in the date string.
If the decimal number is of the form YYYYMMDD and no other calendar
date item (*note Calendar date items::) appears before it in the date
string, then YYYY is read as the year, MM as the month number and DD as
the day of the month, for the specified calendar date.
If the decimal number is of the form HHMM and no other time of day
item appears before it in the date string, then HH is read as the hour
of the day and MM as the minute of the hour, for the specified time of
day. MM can also be omitted.
If both a calendar date and a time of day appear to the left of a
number in the date string, but no relative item, then the number
overrides the year.
1.9 Seconds since the Epoch
===========================
If you precede a number with `@', it represents an internal timestamp
as a count of seconds. The number can contain an internal decimal
point (either `.' or `,'); any excess precision not supported by the
internal representation is truncated toward minus infinity. Such a
number cannot be combined with any other date item, as it specifies a
complete timestamp.
Internally, computer times are represented as a count of seconds
since an epoch--a well-defined point of time. On GNU and POSIX
systems, the epoch is 1970-01-01 00:00:00 UTC, so `@0' represents this
time, `@1' represents 1970-01-01 00:00:01 UTC, and so forth. GNU and
most other POSIX-compliant systems support such times as an extension
to POSIX, using negative counts, so that `@-1' represents 1969-12-31
23:59:59 UTC.
Traditional Unix systems count seconds with 32-bit two's-complement
integers and can represent times from 1901-12-13 20:45:52 through
2038-01-19 03:14:07 UTC. More modern systems use 64-bit counts of
seconds with nanosecond subcounts, and can represent all the times in
the known lifetime of the universe to a resolution of 1 nanosecond.
On most hosts, these counts ignore the presence of leap seconds.
For example, on most hosts `@915148799' represents 1998-12-31 23:59:59
UTC, `@915148800' represents 1999-01-01 00:00:00 UTC, and there is no
way to represent the intervening leap second 1998-12-31 23:59:60 UTC.
1.10 Specifying time zone rules
===============================
Normally, dates are interpreted using the rules of the current time
zone, which in turn are specified by the `TZ' environment variable, or
by a system default if `TZ' is not set. To specify a different set of
default time zone rules that apply just to one date, start the date
with a string of the form `TZ="RULE"'. The two quote characters (`"')
must be present in the date, and any quotes or backslashes within RULE
must be escaped by a backslash.
For example, with the GNU `date' command you can answer the question
"What time is it in New York when a Paris clock shows 6:30am on October
31, 2004?" by using a date beginning with `TZ="Europe/Paris"' as shown
in the following shell transcript:
$ export TZ="America/New_York"
$ date --date='TZ="Europe/Paris" 2004-10-31 06:30'
Sun Oct 31 01:30:00 EDT 2004
In this example, the `--date' operand begins with its own `TZ'
setting, so the rest of that operand is processed according to
`Europe/Paris' rules, treating the string `2004-10-31 06:30' as if it
were in Paris. However, since the output of the `date' command is
processed according to the overall time zone rules, it uses New York
time. (Paris was normally six hours ahead of New York in 2004, but
this example refers to a brief Halloween period when the gap was five
hours.)
A `TZ' value is a rule that typically names a location in the `tz'
database (http://www.twinsun.com/tz/tz-link.htm). A recent catalog of
location names appears in the TWiki Date and Time Gateway
(http://twiki.org/cgi-bin/xtra/tzdate). A few non-GNU hosts require a
colon before a location name in a `TZ' setting, e.g.,
`TZ=":America/New_York"'.
The `tz' database includes a wide variety of locations ranging from
`Arctic/Longyearbyen' to `Antarctica/South_Pole', but if you are at sea
and have your own private time zone, or if you are using a non-GNU host
that does not support the `tz' database, you may need to use a POSIX
rule instead. Simple POSIX rules like `UTC0' specify a time zone
without daylight saving time; other rules can specify simple daylight
saving regimes. *Note Specifying the Time Zone with `TZ': (libc)TZ
Variable.
1.11 Authors of `parse_datetime'
================================
`parse_datetime' started life as `getdate', as originally implemented
by Steven M. Bellovin (<smb@research.att.com>) while at the University
of North Carolina at Chapel Hill. The code was later tweaked by a
couple of people on Usenet, then completely overhauled by Rich $alz
(<rsalz@bbn.com>) and Jim Berets (<jberets@bbn.com>) in August, 1990.
Various revisions for the GNU system were made by David MacKenzie, Jim
Meyering, Paul Eggert and others, including renaming it to `get_date' to
avoid a conflict with the alternative Posix function `getdate', and a
later rename to `parse_datetime'. The Posix function `getdate' can
parse more locale-specific dates using `strptime', but relies on an
environment variable and external file, and lacks the thread-safety of
`parse_datetime'.
This chapter was originally produced by Fran├žois Pinard
(<pinard@iro.umontreal.ca>) from the `parse_datetime.y' source code,
and then edited by K. Berry (<kb@cs.umb.edu>).