# Scan a line for various common date and time formats. # Set REPLY to the number of seconds since the epoch at which that # time occurs. The time does not need to be matched; this will # produce midnight at the start of the date. # # Absolute times # # The rules below are fairly complicated, to allow any natural (and # some highly unnatural but nonetheless common) combination of # time and date used by English speakers. It is recommended that, # rather than exploring the intricacies of the system, users find # a date format that is natural to them and stick to it. This # will avoid unexpected effects. Various key facts should be noted, # explained in more detail below: # # - In particular, note the confusion between month/day/year and # day/month/year when the month is numeric; this format should be # avoided if at all possible. Many alternatives are available. # - However, there is currently no localization support, so month # names must be English (though only the first three letters are required). # The same applies to days of the week if they occur (they are not useful). # - The year must be given in full to avoid confusion, and only years # from 1900 to 2099 inclusive are matched. # - Although timezones are parsed (complicated formats may not be recognized), # they are then ignored; no time adjustment is made. # # The following give some obvious examples; users finding here # a format they like and not subject to vagaries of style may skip # the full description. As dates and times are matched separately # (even though the time may be embedded in the date), any date format # may be mixed with any format for the time of day provide the # separators are clear (whitespace, colons, commas). # 2007/04/03 13:13 # 2007/04/03:13:13 # 2007/04/03 1:13 pm # 3rd April 2007, 13:13 # April 3rd 2007 1:13 p.m. # Apr 3, 2007 13:13 # Tue Apr 03 13:13:00 2007 # 13:13 2007/apr/3 # # Times are parsed and extracted before dates. They must use colons # to separate hours and minutes, though a dot is allowed before seconds # if they are present. This limits time formats to # HH:MM[:SS[.FFFFF]] [am|pm|a.m.|p.m.] # HH:MM.SS[.FFFFF] [am|pm|a.m.|p.m.] # in which square brackets indicate optional elements, possibly with # alternatives. Fractions of a second are recognised but ignored. # Unless -r is given (see below), a date is mandatory but a time of day is # not; the time returned is at the start of the date. # # Time zones are not handled, though if one is matched following a time # specification it will be removed to allow a surrounding date to be # parsed. This only happens if the format of the timezone is not too # wacky: # +0100 # GMT # GMT-7 # CET+1CDT # etc. are all understood, but any part of the timezone that is not numeric # must have exactly three capital letters in the name. # # Dates suffer from the ambiguity between DD/MM/YYYY and MM/DD/YYYY. It is # recommended this form is avoided with purely numeric dates, but use of # ordinals, eg. 3rd/04/2007, will resolve the ambiguity as the ordinal is # always parsed as the day of the month. Years must be four digits (and # the first two must be 19 or 20); 03/04/08 is not recognised. Other # numbers may have leading zeroes, but they are not required. The # following are handled: # YYYY/MM/DD # YYYY-MM-DD # YYYY/MNM/DD # YYYY-MNM-DD # DD[th|st|rd] MNM[,] YYYY # DD[th|st|rd] MNM[,] current year assumed # MNM DD[th|st|rd][,] YYYY # MNM DD[th|st|rd][,] current year assumed # DD[th|st|rd]/MM[,] YYYY # DD[th|st|rd]/MM/YYYY # MM/DD[th|st|rd][,] YYYY # MM/DD[th|st|rd]/YYYY # Here, MNM is at least the first three letters of a month name, # matched case-insensitively. The remainder of the month name may appear but # its contents are irrelevant, so janissary, febrile, martial, apricot, # etc. are happily handled. # # Note there are only two cases that assume the current year, the # form "Jun 20" or "14 September" (the only two commonly occurring # forms, apart from a "the" in some forms of English, which isn't # currently supported). Such dates will of course become ambiguous # in the future, so should ideally be avoided. # # Times may follow dates with a colon, e.g. 1965/07/12:09:45; this # is in order to provide a format with no whitespace. A comma # and whitespace are allowed, e.g. "1965/07/12, 09:45". # Currently the order of these separators is not checked, so # illogical formats such as "1965/07/12, : ,09:45" will also # be matched. Otherwise, a time is only recognised as being associated # with a date if there is only whitespace in between, or if the time # was embedded in the date. # # Days of the week are not scanned, but will be ignored if they occur # at the start of the date pattern only. # # For example, the standard date format: # Fri Aug 18 17:00:48 BST 2006 # is handled by matching HH:MM:SS and removing it together with the # matched (but unused) time zone. This leaves the following: # Fri Aug 18 2006 # "Fri" is ignored and the rest is matched according to the sixth of # the standard rules. # # Relative times # ============== # # The option -r allows a relative time. Years (or ys, yrs, or without s), # months (or mths, mons, mnths, months, or without s --- "m", "ms" and # "mns" are ambiguous and are not handled), weeks (or ws, wks, or without # s) and days (or ds, dys, days, or without s), hours (or hs, hrs, with or # without s), minutes (or mins, with or without s) and seconds (or ss, # secs, with or without s) are understood. Spaces between the numbers # are optional, but are required between items, although a comma # may be used (with or without spaces). # # Note that a year here is 365.25 days and a month is 30 days. TODO: # improve this by passing down base time and adjusting. (This will # be crucial for events repeating monthly.) TODO: it then makes # sense to make PERIODly = 1 PERIOD (also for PERIOD = dai!) # # This allows forms like: # 30 years 3 months 4 days 3:42:41 # 14 days 5 hours # 4d,10hr # In this case absolute dates are ignored. emulate -L zsh setopt extendedglob zmodload -i zsh/datetime || return 1 # separator characters before time or between time and date # allow , - or : before the time: this allows spaceless but still # relatively logical dates like 2006/09/19:14:27 # don't allow / before time ! the above # is not 19 hours 14 mins and 27 seconds after anything. local tschars="[-,:[:space:]]" # start pattern for time when anchored local tspat_anchor="(${tschars}#)" # ... when not anchored local tspat_noanchor="(|*${tschars})" # separator characters between elements. comma is fairly # natural punctuation; otherwise only allow whitespace. local schars="[.,[:space:]]" local daypat="${schars}#(sun|mon|tue|wed|thu|fri|sat)[a-z]#${schars}#" # Start pattern for date: treat , as space for simplicity. This # is illogical at the start but saves lots of minor fiddling later. # Date start pattern when anchored at the start. # We need to be able to ignore the day here, although (for consistency # with the unanchored case) we don't remove it until later. # (The problem in the other case is that matching anything before # the day of the week is greedy, so the day of the week gets ignored # if it's optional.) local dspat_anchor="(|(#B)${daypat}(#b)${schars}#)" local dspat_anchor_noday="(|${schars}#)" # Date start pattern when not anchored at the start. local dspat_noanchor="(|*${schars})" # end pattern for relative times: similar remark about use of $schars. local repat="(|s)(|${schars}*)" # not locale-dependent! I don't know how to get the months out # of the system for the purpose of finding out where they occur. # We may need some completely different heuristic. local monthpat="(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[a-z]#" # days, not handled but we need to ignore them. also not localized. integer year month day hour minute second then local opt line orig_line mname MATCH MBEGIN MEND tz local -a match mbegin mend # Flags that we found a date or a time (maybe a relative time) integer date_found time_found # Flag that it's OK to have a time only integer time_ok # Indices of positions of start and end of time and dates found. # These are actual character indices as zsh would normally use, i.e. # line[time_start,time_end] is the string for the time. integer time_start time_end date_start date_end integer anchor anchor_end debug relative reladd setvar while getopts "aAdrst" opt; do case $opt in (a) # anchor (( anchor = 1 )) ;; (A) # anchor at end, too (( anchor = 1, anchor_end = 1 )) ;; (d) # enable debug output (( debug = 1 )) ;; (r) (( relative = 1 )) ;; (s) (( setvar = 1 )) ;; (t) (( time_ok = 1 )) ;; (*) return 1 ;; esac done shift $(( OPTIND - 1 )) line=$1 orig_line=$1 local dspat dspat_noday tspat if (( anchor )); then # Anchored at the start. dspat=$dspat_anchor dspat_noday=$dspat_anchor_noday if (( relative )); then tspat=$tspat_anchor else # We'll test later if the time is associated with the date. tspat=$tspat_noanchor fi else dspat=$dspat_noanchor dspat_noday=$dspat_noanchor tspat=$tspat_noanchor fi # Look for a time separately; we need colons for this. case $line in # with seconds, am/pm: don't match / in front. ((#ibm)${~tspat}(<0-12>):(<0-59>)[.:]((<0-59>)(.<->|))[[:space:]]#([ap])(|.)[[:space:]]#m(.|[[:space:]]|(#e))(*)) hour=$match[2] minute=$match[3] second=$match[5] [[ $match[7] = (#i)p ]] && (( hour <= 12 )) && (( hour += 12 )) time_found=1 ;; # no seconds, am/pm ((#ibm)${~tspat}(<0-12>):(<0-59>)[[:space:]]#([ap])(|.)[[:space:]]#m(.|[[:space:]]|(#e))(*)) hour=$match[2] minute=$match[3] [[ $match[4] = (#i)p ]] && (( hour <= 12 )) && (( hour += 12 )) time_found=1 ;; # no colon, even, but a.m./p.m. indicator ((#ibm)${~tspat}(<0-12>)[[:space:]]#([ap])(|.)[[:space:]]#m(.|[[:space:]]|(#e))(*)) hour=$match[2] minute=0 [[ $match[3] = (#i)p ]] && (( hour <= 12 )) && (( hour += 12 )) time_found=1 ;; # 24 hour clock, with seconds ((#ibm)${~tspat}(<0-24>):(<0-59>)[.:]((<0-59>)(.<->|))(*)) hour=$match[2] minute=$match[3] second=$match[5] time_found=1 ;; # 24 hour clock, no seconds ((#ibm)${~tspat}(<0-24>):(<0-59>)(*)) hour=$match[2] minute=$match[3] time_found=1 ;; esac (( hour == 24 )) && hour=0 if (( time_found )); then # time was found time_start=$mbegin[2] time_end=$mend[-2] # Remove the timespec because it may be in the middle of # the date (as in the output of "date". # There may be a time zone, too, which we don't yet handle. # (It's not in POSIX strptime() and libraries don't support it well.) # This attempts to remove some of the weirder forms. if [[ $line[$time_end+1,-1] = (#b)[[:space:]]#([A-Z][A-Z][A-Z]|[-+][0-9][0-9][0-9][0-9])([[:space:]]|(#e))* || \ $line[$time_end+1,-1] = (#b)[[:space:]]#([A-Z][A-Z][A-Z](|[-+])<0-12>)([[:space:]]|(#e))* || \ $line[$time_end+1,-1] = (#b)[[:space:]]#([A-Z][A-Z][A-Z](|[-+])<0-12>[A-Z][A-Z][A-Z])([[:space:]]|(#e))* ]]; then (( time_end += ${mend[-1]} )) tz=$match[1] fi line=$line[1,time_start-1]$line[time_end+1,-1] (( debug )) && print "line after time: $line" fi if (( relative == 0 )); then # Date. case $line in # Look for YEAR[-/.]MONTH[-/.]DAY ((#bi)${~dspat}((19|20)[0-9][0-9])[-/](<1-12>)[-/](<1-31>)*) year=$match[2] month=$match[4] day=$match[5] date_start=$mbegin[2] date_end=$mend[5] date_found=1 ;; # Same with month name ((#bi)${~dspat}((19|20)[0-9][0-9])[-/]${~monthpat}[-/](<1-31>)*) year=$match[2] mname=$match[4] day=$match[5] date_start=$mbegin[2] date_end=$mend[5] date_found=1 ;; # Look for DAY[th/st/rd] MNAME[,] YEAR ((#bi)${~dspat}(<1-31>)(|th|st|rd)[[:space:]]##${~monthpat}(|,)[[:space:]]##((19|20)[0-9][0-9])*) day=$match[2] mname=$match[4] year=$match[6] date_start=$mbegin[2] date_end=$mend[6] date_found=1 ;; # Look for MNAME DAY[th/st/rd][,] YEAR ((#bi)${~dspat}${~monthpat}[[:space:]]##(<1-31>)(|th|st|rd)(|,)[[:space:]]##((19|20)[0-9][0-9])*) mname=$match[2] day=$match[3] year=$match[6] date_start=$mbegin[2] date_end=$mend[6] date_found=1 ;; # Look for DAY[th/st/rd] MNAME; assume current year ((#bi)${~dspat}(<1-31>)(|th|st|rd)[[:space:]]##${~monthpat}(|,)([[:space:]]##*|)) day=$match[2] mname=$match[4] strftime -s year "%Y" $EPOCHSECONDS date_start=$mbegin[2] date_end=$mend[5] date_found=1 ;; # Look for MNAME DAY[th/st/rd]; assume current year ((#bi)${~dspat}${~monthpat}[[:space:]]##(<1-31>)(|th|st|rd)(|,)([[:space:]]##*|)) mname=$match[2] day=$match[3] strftime -s year "%Y" $EPOCHSECONDS date_start=$mbegin[2] date_end=$mend[5] date_found=1 ;; # Now it gets a bit ambiguous. # Look for DAY[th/st/rd][/]MONTH[/ ,]YEAR ((#bi)${~dspat}(<1-31>)(|th|st|rd)/(<1-12>)((|,)[[:space:]]##|/)((19|20)[0-9][0-9])*) day=$match[2] month=$match[4] year=$match[7] date_start=$mbegin[2] date_end=$mend[7] date_found=1 ;; # Look for MONTH[/]DAY[th/st/rd][/ ,]YEAR ((#bi)${~dspat}(<1-12>)/(<1-31>)(|th|st|rd)((|,)[[:space:]]##|/)((19|20)[0-9][0-9])*) month=$match[2] day=$match[3] year=$match[7] date_start=$mbegin[2] date_end=$mend[7] date_found=1 ;; # Look for WEEKDAY ((#bi)${~dspat_noday}(${~daypat})*) integer wday_now wday local wdaystr=${(L)match[3]} date_start=$mbegin[2] date_end=$mend[2] # Find the day number. local -a wdays # This is the ordering of %w in strtfime (zero-offset). wdays=(sun mon tue wed thu fri sat sun) (( wday = ${wdays[(i)$wdaystr]} - 1 )) # Find the date for that day. (( then = EPOCHSECONDS )) strftime -s wday_now "%w" $then # Day is either today or in the past. (( wday_now < wday )) && (( wday_now += 7 )) (( then -= (wday_now - wday) * 24 * 60 * 60 )) strftime -s year "%Y" $then strftime -s month "%m" $then strftime -s day "%d" $then date_found=1 ;; # Look for "today", "yesterday", "tomorrow" ((#bi)${~dspat_noday}(yesterday|today|tomorrow)(|${schars})*) (( then = EPOCHSECONDS )) case ${(L)match[2]} in (yesterday) (( then -= 24 * 60 * 60 )) ;; (tomorrow) (( then += 24 * 60 * 60 )) ;; esac strftime -s year "%Y" $then strftime -s month "%m" $then strftime -s day "%d" $then date_start=$mbegin[2] date_end=$mend[2] date_found=1 ;; esac fi if (( date_found || (time_ok && time_found) )); then # date found # see if there's a day at the start if (( date_found )); then if [[ ${line[1,$date_start-1]} = (#bi)${~daypat} ]]; then date_start=$mbegin[1] fi line=${line[1,$date_start-1]}${line[$date_end+1,-1]} fi if (( time_found )); then if (( date_found )); then # If we found a time, it must be associated with the date, # or we can't use it. Since we removed the time from the # string to find the date, however, it's complicated to # know where both were found. Reconstruct the date indices of # the original string. if (( time_start <= date_start )); then # Time came before start of date; add length in. (( date_start += time_end - time_start + 1 )) fi if (( time_start <= date_end )); then (( date_end += time_end - time_start + 1 )) fi if (( time_end + 1 < date_start )); then # If time wholly before date, OK if only separator characters # in between. (This allows some illogical stuff with commas # but that's probably not important.) if [[ ${orig_line[time_end+1,date_start-1]} != ${~schars}# ]]; then # Clearly this can't work if anchor is set. In principle, # we could match the date and ignore the time if it wasn't. # However, that seems dodgy. return 1 else # Form massaged line by removing the entire date/time chunk. line="${orig_line[1,time_start-1]}${orig_line[date_end+1,-1]}" fi elif (( date_end + 1 < time_start )); then # If date wholly before time, OK if only time separator characters # in between. This allows 2006/10/12:13:43 etc. if [[ ${orig_line[date_end+1,time_start-1]} != ${~tschars}# ]]; then # Here, we assume the time is associated with something later # in the line. This is pretty much inevitable for the sort # of use we are expecting. For example, # 2006/10/24 Meeting from early, may go on till 12:00. # or with some uses of the calendar system, # 2006/10/24 MR 1 Another pointless meeting WARN 01:00 # The 01:00 says warn an hour before, not that the meeting starts # at 1 am. About the only safe way round would be to force # a time to be present, but that's not how the traditional # calendar programme works. # # Hence we need to reconstruct. (( time_found = 0, hour = 0, minute = 0, second = 0 )) line="${orig_line[1,date_start-1]}${orig_line[date_end+1,-1]}" else # As above. line="${orig_line[1,date_start-1]}${orig_line[time_end+1,-1]}" fi fi else # Time only. # We didn't test anchors for time originally, since it # might have been embedded in the date. If there's no date, # we need to test specially. if (( anchor )) && [[ ${orig_line[1,time_start-1]} != ${~tschars}# ]]; then # Anchor at start failed. return 1 fi strftime -s year "%Y" $EPOCHSECONDS strftime -s month "%m" $EPOCHSECONDS strftime -s day "%d" $EPOCHSECONDS # Date now handled. (( date_found = 1 )) fi if (( debug )); then print "Time string: $time_start,$time_end:" \ "'$orig_line[time_start,time_end]'" (( date_ok )) && print "Date string: $date_start,$date_end:" \ "'$orig_line[date_start,date_end]'" print "Remaining line: '$line'" fi fi fi if (( relative )); then if [[ $line = (#bi)${~dspat}(<->)[[:blank:]]#(y|yr|year)${~repat} ]]; then (( reladd += ((365*4+1) * 24 * 60 * 60 * ${match[2]} + 1) / 4 )) line=${line[1,$mbegin[2]-1]}${line[$mend[4]+1,-1]} time_found=1 fi if [[ $line = (#bi)${~dspat}(<->)[[:blank:]]#(mth|mon|mnth|month)${~repat} ]]; then (( reladd += 30 * 24 * 60 * 60 * ${match[2]} )) line=${line[1,$mbegin[2]-1]}${line[$mend[4]+1,-1]} time_found=1 fi if [[ $line = (#bi)${~dspat}(<->)[[:blank:]]#(w|wk|week)${~repat} ]]; then (( reladd += 7 * 24 * 60 * 60 * ${match[2]} )) line=${line[1,$mbegin[2]-1]}${line[$mend[4]+1,-1]} time_found=1 fi if [[ $line = (#bi)${~dspat}(<->)[[:blank:]]#(d|dy|day)${~repat} ]]; then (( reladd += 24 * 60 * 60 * ${match[2]} )) line=${line[1,$mbegin[2]-1]}${line[$mend[4]+1,-1]} time_found=1 fi if [[ $line = (#bi)${~dspat}(<->)[[:blank:]]#(h|hr|hour)${~repat} ]]; then (( reladd += 60 * 60 * ${match[2]} )) line=${line[1,$mbegin[2]-1]}${line[$mend[4]+1,-1]} time_found=1 fi if [[ $line = (#bi)${~dspat}(<->)[[:blank:]]#(min|minute)${~repat} ]]; then (( reladd += 60 * ${match[2]} )) line=${line[1,$mbegin[2]-1]}${line[$mend[4]+1,-1]} time_found=1 fi if [[ $line = (#bi)${~dspat}(<->)[[:blank:]]#(s|sec|second)${~repat} ]]; then (( reladd += ${match[2]} )) line=${line[1,$mbegin[2]-1]}${line[$mend[4]+1,-1]} time_found=1 fi fi if (( relative )); then # If no date was found, we're in trouble unless we found a time. if (( time_found )); then if (( anchor_end )); then # must be left with only separator characters if [[ $line != ${~schars}# ]]; then return 1 fi fi (( REPLY = reladd + (hour * 60 + minute) * 60 + second )) [[ -n $setvar ]] && REPLY2=$line return 0 fi return 1 elif (( ! date_found )); then return 1 fi if (( anchor_end )); then # must be left with only separator characters if [[ $line != ${~schars}# ]]; then return 1 fi fi local fmt nums if [[ -n $mname ]]; then fmt="%Y %b %d %H %M %S" nums="$year $mname $day $hour $minute $second" else fmt="%Y %m %d %H %M %S" nums="$year $month $day $hour $minute $second" fi strftime -s REPLY -r $fmt $nums [[ -n $setvar ]] && REPLY2=$line return 0