The context()
command can be used to group input data (vectors or fields)
into arbitrary contextual windows.
Unlike grouping vectors,
context()
windows 1) are always contiguous relative to the reference vector(s)/field(s)
(which can depend on order); 2) can overlap; and 3) don't necesarily exhaustively
divide the data.
The context()
function should generally be called on
humdrumR data, but it can also be called directly on vectors.
The uncontext()
function removes contextual windows
from a humdrumR data object.
Once contextual windows are created, the windows()
function can be used to
view a data.table representing these windows.
The Open
and Close
columns indicate row indices in the humdrum table.
Usage
context(x, open, close, ...)
# S3 method for default
context(
x,
open,
close,
reference = x,
overlap = "paired",
depth = NULL,
rightward = TRUE,
duplicate_indices = TRUE,
min_length = 2L,
max_length = Inf,
inPlace = FALSE,
complement = FALSE,
alignToOpen = TRUE,
collapse = TRUE,
sep = ",",
stripRegex = FALSE,
groupby = list()
)
# S3 method for humdrumR
context(
humdrumR,
open,
close,
dataTypes = "D",
overlap = "paired",
depth = NULL,
rightward = TRUE,
duplicate_indices = TRUE,
min_length = 2L,
max_length = Inf,
groupby
)
uncontext(humdrumR, complement = NULL)
windows(humdrumR)
Arguments
- x
Input data to group into windows.
Must be an atomic vector.
- open
Where to "open" (start) windows.
Can be natural numbers,
logical
vectors (of the same length asx
), a singlecharacter
string (interpreted as a regular expression). May also be an arbitrary expression which returns natural numbers; the expression can refer to named elements ofreference
, toend
(last index), toclose
, or toprevclose
(the previous close).- close
Where to "close" (end) windows.
Can be natural numbers,
logical
vectors (of the same length asx
), a singlecharacter
string (interpreted as a regular expression). May also be an arbitrary expression which returns natural numbers; the expression can refer to named elements ofreference
, toend
(previous index), toopen
, or tonextopen
(the next open).- reference
Vector(s) to use to identify window open/closes.
Defaults to
x
.Must be either an atomic vector of the same length as
x
, or alist()
/data.frame
of such vectors, all named.If
context()
is applied to a humdrumR dataset, all the fields of the data's humdrum table are used asreference
.- overlap
How are overlapping windows treated/created?
Defaults to
'paired'
.Must be a single
character
, partially matching either"paired"
,"nested"
,"edge"
, or"none"
.- depth
How "deep" can windows overlap?
Defaults to
NULL
.Must be
NULL
, or a vector of non-zero whole numbers.- rightward
Should window alignment/overlap be determined from left to right?
Defaults to
TRUE
.Must be a singleton
logical
value: an on/off switch.- duplicate_indices
Can the same index open/close multiple windows?
Defaults to
TRUE
.Must be a singleton
logical
value: an on/off switch.- min_length, max_length
The minimum/maximum lengths of output windows.
Default to two and infinity (no maximum) respectively.
Must be single, positive whole numbers.
- inPlace
Should output be padded to same length as input?
Defaults to
FALSE
.Must be a singleton
logical
value: an on/off switch.- complement
Should input "outside" any windows, be output?
Defaults to
FALSE
.Must be a singleton
logical
value: an on/off switch.- alignToOpen
Should '
inPlace
' output be aligned to the open of each window?Defaults to
TRUE
.Must be a singleton
logical
value: an on/off switch.- collapse
Should output windows be collapsed to single
character
strings?Defaults to
TRUE
.Must be a singleton
logical
value: an on/off switch.- sep
Separator for collapsed output.
Defaults to a comma (
","
).Must be a single
character
string.- stripRegex
Should regular expressions matched by the
open
/close
arguments be removed from the output?Defaults to
FALSE
.Must be a singleton
logical
value: an on/off switch.- groupby
Optional vectors to group windows within.
Defaults to empty
list()
.Must be a
list()
, which is either empty or contains vectors which are all the same length asx
. In calls to with/within.humdrumR,groupby
is passedlist(Piece, Spine, Path)
by default.Windows cannot cross group boundaries.
- humdrumR
HumdrumR data.
Must be a humdrumR data object.
Details
The context()
function determines where contextual windows will begin and end based on
expressions in its open
and close
arguments.
These open
and close
expressions are evaluated using a reference vector, or set of vectors/fields()
that are all
the same length.
In most cases, we'll apply context()
to a humdrumR data object, so windows are defined by
evaluating the open
and close
arguments using the fields()
of the humdrum table as the reference.
Once this has been done, the humdrumR object will show how many windows
have been identified when printed in the console.
If we then use with/within/mutate/summarize/reframe on our data, these methods
will evaluate their argument expressions within each and every contextual window defined by context()
.
This means we can do basically anything we want to/with our contextual windows.
We can also apply context()
directly to a single input vector x
, providing a vector or list/data.frame of
equal-length vectors as the reference
for the open
and close
arguments.
By default, x
itself is reused as the reference
, so windows are based on the input x
itself.
When applied to a vector, context()
will simply group the elements of x
into the windows we defined,
with a few arguments to control how this is done:
complement
: The "complement" refers to elements of the input vector that don't fall inside any indicated windows: ifcomplement = FALSE
(the default), these "outside" values are dropped; ifcomplement = TRUE
, they are retained.inPlace
: IfinPlace = TRUE
, windows are output in a vector of the same length as the input, padded withNA
as needed---otherwise (the default), only the windows are returnedcollapse
: Ifcollapse = TRUE
, the windows are collapsed to strings (separated bysep
), otherwise, alist()
of windows is returned.sep
the separator used ifcollapse = TRUE
.
alignToOpen
: Should padded output (inPlace = TRUE
) by aligned to the openning (left-side) of each window?stripRegex
: Should regular expressions used to identify windows (details below) be stripped from the output?
In the rest of this man page, we will apply context()
to simple vectors (like the letters vector)
to illustrate how windows are defined.
In actual analyses, you'll be more likely to apply context()
to humdrumR data.
Note that, when using context()
inside with, within, etc.,
the alignToOpen
argument will have no effect.
Instead, use alignLeft = FALSE
as an argument to
with()
/within()
, not as an argument to context()
.
groupby
The groupby
argument is an optional list of grouping vectors,
all the same length as x
/reference
.
Contextual windows cannot cross boundaries indicated in groupby
.
When applying context()
to humdrumR data,
groupby
is automatically passed list(Piece, Spine, Path)
, which prevents
windows from crossing normal "melodic" boundaries in the data.
Defining windows
The system context()
uses to define/identify windows in the data is quite sophisticated,
and can take some time to master!
The basic idea is that you must indicate where you want windows to start ("open") and
where you want them to end ("close"):
you indicate this using the open
and close
arguments.
To introduce their usage, we will first do some simple examples applying
context()
to the built-in letters vector, which (by default) will act as our reference
vector and the target vector x
to contextualize.
We will then show how these techniques can be used with multiple vectors/fields()
.
The open
and close
arguments are expressions which, when evaluated,
must indicate indices in the reference
vector(s)/field(s);
For example, if we want a window to open at the 4th and 11th indices,
and close at the 15th and 24th index,
we can write:
This is quite trivial.
However, the open
and close
expressions can do a number of special tricks,
including refering to each other.
For example, if either argument includes a call to hop()
,
hop()
will automatically be applied along the input vector.
Consider this example:
In this example, the hop()
command generates open
indices for every odd
number from 1
to 25
.
The close
argument then references these open
indices, and adds 3
to each---
the result is pairs like 1:4
, 2:5
, 3:6
, 4:7
, etc.
If we give hop()
different arguments (like by
or from
), we can modify this process.
In fact, if we use the default by
value for hop()
(1
), we can use this approach to
create standard N-grams.
We can also indicate open/closes by providing logical
vectors (the same length as x
/reference
).
For example:
Regular Expressions
If either open
or close
are provided a character
string, this string is treated
as a regular expression and is matched against the reference
vector.
For example, we could make windows in the alphabet starting or ending on each vowel:
context(letters, open = '[aeiou]', close = open + 4)
context(letters, open = close - 4, close = '[aeiou]', alignToOpen = FALSE)
If the stripRegex = TRUE
(not the default), the matching open
or close
regular expressions are removed
from the output.
This can be useful if the character/tokens used to indicate windows are no longer needed
once windowing is done.
Special variables
The open
and close
expressions will understand a few special variable names:
nextopen
: represents the index of the next open---can only be used in theclose
argument.prevclose
: represents the index of the previous close---can only be used in theopen
argument.end
: represents the last index of thereference
vector(s).|
: As in "OR"---specify alternative windowopen
/close
criteria.
What if we'd like each of our windows to close right before the next window opens? We can do this by making the
closeargument refer to the *next*
open, by referring to the
nextopen` variable:
Conversely, open
can refer to the prevclose
close:
Notice that when we called context(letters, open = '[aeiou]', close = nextopen - 1L)
,
the window opening on "u"
is not returned.
This is because there is no "nextopen
" open to close on.
We can instead provide context()
an alternative, using |
(or):
Here we are saying, close a window 1 index before the next open OR at index 26.
What if we don't know exactly how long our input vector is?
Refer to the end
variable:
Separating context reference from application
The previous examples illustrate the basic concepts of using open
/close
;
to grasp how they work, study these examples and play around with them.
We can also define open
and close
expressions that reference more than one vector(s)/field,
and which aren't necessarily the thing we want to apply the windowing to.
To illustrate this last point, let's take the last command from the previous section
and make it so the x
argument is different than the reference
argument:
Now, letters
is still being used as the windowing reference, but the contextual windowing is being
applied to LETTERS
.
When we use context()
on a humdrumR dataset, the data's fields()
can be used as the reference,
then with(), within(), mutate() can be used to manipulate other fields.
Multi-vector/field reference
The open
and close
arguments can reference more than one reference vector.
When applying context()
to a vector x
, we can provide a named list()
or data.frame()
as the reference
argument---so long as all the vectors they contain are the same length as x
.
We can then refer to these vectors by name:
reference.frame <- data.frame(Threes = rep(1:3, length.out = 26),
Fours = rep(4:1, length.out = 26))
context(letters,
reference = reference.frame,
open = Threes == Fours, close = Fours == 1)
So we created a data.frame with columns Threes
and Fours
.
We referenced both of these columns when defining where windows open
and close
.
Using humdrumR data
When we apply context()
to humdrumR data, we can refer to any of the
data's fields()
in open
or close
.
We can also use all of open
/close
's special tricks (described above), like hop()
, nextopen
, prevclose
,
and end
.
For example, to create 4-grams in a humdrum dataset:
As mentioned above, when we apply context()
to humdrumR data,
groupby
is automatically passed list(Piece, Spine, Path)
, which prevents
windows from crossing normal "melodic" boundaries in the data.
This can be overrriden by providing your own explicit groupby
argument.
Grouping fields that have already been defined in the data, are also used.
Filtering windows
Once open
and close
have identified where windows can start and end, there is still
some options for which open and close indices to associate with each other to create a window.
For example, as mentioned above, the groupby
argument can be used to make sure windows to cross
grouping boundaries---even if one group has and extra open index and the next an extra close index.
The minimum and maximum length of windows can also be controlled using the min_length
and max_length
arguments.
The overlap
, depth
, rightward
, and duplicate_indices
arguments provide a number of additional options,
which are useful for some use cases (details below).
Nested windows
A common use-case for context()
is analyzing phrases indicated in music.
In **kern
, phrases are indicated with opening ((
) and close ()
) parentheses,
which we can capture with regular expressions for open
and close
.
Here is an example:
nesting1 <- c('(a', 'b)', '(c', 'd', 'e)', '(d', 'e', 'f)', '(e', 'f', 'f#', 'g', 'g#', 'a)')
context(nesting1, open = '(', close = ')')
Perfect. However, what if there are nested phrasing indicators?
nesting2 <- c('(a', 'b)', '(c', '(d', 'e)', '(d', 'e)', 'f)', '(e', '(f', '(f#', 'g)', 'g#)', 'a)')
context(nesting2, open = '(', close = ')')
That's not what we want!
By default, context()
"pairs" each open
with the next close
, which often makes the most sense.
But in this case, we want different behavior.
We can get what we want by specifying overlap = 'nested'
:
Now context aligns each open
with the corresponding close
at the same nesting level.
What if we are only interested in the highest (or lowest) level of nesting?
Use the depth
argument, which can be non-zero integers: the highest level is 1
,
with "deeper" levels incrementing up.
context(nesting2, open = '(', close = ')', overlap = 'nested', depth = 1)
context(nesting2, open = '(', close = ')', overlap = 'nested', depth = 2)
context(nesting2, open = '(', close = ')', overlap = 'nested', depth = 2:3)
You can also use negative depth
to specify from the deepest levels outward.
For example, in this case depth == -1
should get us that deepest level:
If depth
is NULL
(the default), all depths are returned.
Controlling overlap
There are some other options for controlling how windows can, or cannot, overlap. Perhaps we'd like to look at every melodic phrase moving from So (dominant) to Do (tonic).
melody <- c('so', 'la', 'ti', 'do', 'so', 'fi', 'so', 'la', 'ti', 're', 'do', 'so', 'la', 're', 'do')
context(melody, open = 'so', close = 'do')
This output is probably not what we want.
Again, context()
(by default) pairs each opening with the next close which hasn't already been paired.
In this case, that means the third So is getting pairs with the third Do,
even though there is another Do in between!
We might want to try either the "edge"
or "none"
options for the overlap
argument:
context(melody, open = 'so', close = 'do', overlap = 'edge')
context(melody, open = 'so', close = 'do', overlap = 'none')
The "edge"
option allows the closing edge of windows to share a close
---in this case,
the second and third So (open
) are paired with the same Do.
On the other hand, with overlap = "none"
, overlapping windows are simply not allowed, so the third open
simply doesn't get paired with anything.
What if you would like to pair windows on their left (opening) edge?
If you specify rightward = FALSE
, the overlap argument works backwards (right-to-left) through
the input vector, starting on each close
and ending on each open
.
By combining righward = FALSE
with various overlap
options, you can achieve a lot of windowing
options you might need.
Repeated indices
Note that if duplicates_indices = TRUE
(the default) the open
and close
arguments can
incorporate repeated indices, including multiple matches to a regular expression in the same index.
This is useful with, for example, nested phrases:
nesting3 <- c('(a', 'b)', '((c', 'd', 'e)', '(d', 'e', 'f))', '(e', 'f', '((f#', 'g)', 'g#)', 'a)')
context(nesting3, open = '(', close = ')', overlap = 'nested', depth = 1)
context(nesting3, open = '(', close = ')', overlap = 'nested', depth = 2)
In some cases, you might want to turn duplicate_indices = FALSE
.
Complements (removing context)
The uncontext()
command, like the ungroup()
command, is needed to remove
contextual windows from humdrumR data, so that further calls to
within()/mutate()/etc. are not applied in context.
The uncontext()
command can also be used to access data outside of contextual windows
by using a complement
argument, similar to the unfilter()
function.
The complement
must be an existing field in the data.
If uncontext()
is used with a given complement field, the currently
selected data field (unless Token
is selected) has the contents of the complement
field inserted into it all points outside the contextual windows.
This can be used to keep
See also
Other Contextual grouping functions.:
groupHumdrum
Examples
# use the built-in 'letters' vector
context(letters, open = hop(4), close = open + 3)
#> [1] "a,b,c,d" "e,f,g,h" "i,j,k,l" "m,n,o,p" "q,r,s,t" "u,v,w,x"
context(letters, open = "[aeiou]", close = nextopen - 1 | end)
#> [1] "a,b,c,d" "e,f,g,h" "i,j,k,l,m,n" "o,p,q,r,s,t" "u,v,w,x,y,z"
context(letters, open = "[aeiou]", close = nextopen - 1 | end, inPlace = TRUE)
#> [1] "a,b,c,d" NA NA NA "e,f,g,h"
#> [6] NA NA NA "i,j,k,l,m,n" NA
#> [11] NA NA NA NA "o,p,q,r,s,t"
#> [16] NA NA NA NA NA
#> [21] "u,v,w,x,y,z" NA NA NA NA
#> [26] NA
context(letters, open = "[aeiou]", close = nextopen - 1 | end, collapse = FALSE)
#> [[1]]
#> [1] "a" "b" "c" "d"
#>
#> [[2]]
#> [1] "e" "f" "g" "h"
#>
#> [[3]]
#> [1] "i" "j" "k" "l" "m" "n"
#>
#> [[4]]
#> [1] "o" "p" "q" "r" "s" "t"
#>
#> [[5]]
#> [1] "u" "v" "w" "x" "y" "z"
#>
if (FALSE) {
# within.humdrumR
chorales <- readHumdrum(humdrumRroot, "HumdrumData/BachChorales/.*.krn")
# 4-grams
chorales |>
context(open = hop(), open + 3) |>
within(paste(Token, collapse = ','))
# phrases leading to fermatas
chorales |>
context(open = 1 | prevclose + 1, close = ';', overlap = 'none') |>
within(paste(Token, collapse = ','), alignLeft = FALSE)
}
humData <- readHumdrum(humdrumRroot, "HumdrumData/BachChorales/chor00[1-4].krn")
#> Finding and reading files...
#> REpath-pattern '/home/nat/.tmp/Rtmpn4KeFS/temp_libpath7af94615c2ed/humdrumR/HumdrumData/BachChorales/chor00[1-4].krn' matches 4 text files in 1 directory.
#> Four files read from disk.
#> Validating four files...
#> all valid.
#> Parsing four files...
#> Assembling corpus...
#> Done!
humData |> context(hop(6), open + 2) |> within(paste(Token, collapse = '|')) |> uncontext(complement = 'Token')
#> ######################## vvv chor001.krn vvv #########################
#> 1: !!!COM: Bach, Johann Sebastian
#> 2: !!!CDT: 1685/02/21/-1750/07/28/
#> 3: !!!OTL@@DE: Aus meines Herzens Grunde
#> 4: !!!OTL@EN: From the Depths of My Heart
#> 5: !!!SCT: BWV 269
#> 6: !!!PC#: 1
#> 7: !!!AGN: chorale
#> 8: **kern **kern **kern **kern
#> 9: *ICvox *ICvox *ICvox *ICvox
#> 10: *Ibass *Itenor *Ialto *Isoprn
#> 11: *I"Bass *I"Tenor *I"Alto *I"Soprano
#> 12: *>[A,A,B] *>[A,A,B] *>[A,A,B] *>[A,A,B]
#> 13: *>norep[A,B] *>norep[A,B] *>norep[A,B] *>norep[A,B]
#> 14: *>A *>A *>A *>A
#> 15: *clefF4 *clefGv2 *clefG2 *clefG2
#> 16: *k[f#] *k[f#] *k[f#] *k[f#]
#> 17: *G: *G: *G: *G:
#> 18: *M3/4 *M3/4 *M3/4 *M3/4
#> 19: *MM100 *MM100 *MM100 *MM100
#> 20: 4GG|4G|4E 4B|4B|8cL 4d|4d|4e 4g|2g|4dd
#> 21: =1 =1 =1 =1
#> 22: . . . .
#> 23: . . . .
#> 24: . 8BJ . .
#> 25: 4F# 4A 4d .
#> 26: =2 =2 =2 =2
#> 27: 4G 4G 2d 4.b
#> 28: 4D 4F#|4G|8cL . .
#> 29: . . . 8a
#> 30: 4E|4C|8BBL . 4B 4g
#> 31: =3 =3 =3 =3
#> 32: . . 8eL|8d|8e 4.g|8a|4b
#> 33: . 8BJ . .
#> 34: . 4c . .
#> 35: 8AAJ . 8f#J .
#> 36: 4GG 4d 4g .
#> 37: =4 =4 =4 =4
#> 38: 2D; 2d;|4d|4A 2f#; 2a;
#> 39: 4GG|4FF#|4GG . 4g|4d|4e 4b
#> 40: =5 =5 =5 =5
#> 41-133::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> ######################## ^^^ chor001.krn ^^^ #########################
#>
#> (two more pieces...)
#>
#> ######################## vvv chor004.krn vvv #########################
#> 1-63::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> 64: 8G#L . 4b 8eeL
#> 65: 8EJ . . 8dd#J
#> 66: 4F# 4A# 4f# 4cc#|4b;|4b
#> 67: 4BB;|4E|4G# 4d#; 4f#;|4g#|8eL .
#> 68: . 4e . .
#> 69: =7 =7 =7 =7
#> 70: . 4B|4e|4e . 4ee
#> 71: . . 8f#J .
#> 72: 4E . 4g# 4b
#> 73: 8AnXL . 4a 4cc#
#> 74: 8BJ . . .
#> 75: 4c#|4E#|4F# 4c# 8eL|8f#J|4g# 8g#L|8aJ|4b
#> 76: . . . .
#> 77: =8 =8 =8 =8
#> 78: . 4d . .
#> 79: . 4c# 4f# 4a
#> 80: 4C#; 4c#;|[4B|4B] 4e#; 4g#;
#> 81: 4E . 4e 4g#
#> 82: =9 =9 =9 =9
#> 83: 4BB . 4d#|4c#|4B 4f#|4a|4g#
#> 84: 8C#L|8D#J|4E 4e . .
#> 85: . . . .
#> 86: . 4e . .
#> 87: 4BB 8d#L 4B 4f#
#> 88: . 8BJ|2F#|4G#; . .
#> 89: =10 =10 =10 =10
#> 90: 4AA# . 4c# 4c#
#> 91: 4BB . 4B 4d#
#> 92: 4EE; . 4B; 4e;
#> 93: == == == ==
#> 94: *- *- *- *-
#> 95: !!!hum2abc: -Q ''
#> 96: !!!title: @{PC#}. @{OTL@@DE}
#> 97: !!!YOR1: 371 vierstimmige Choralgesänge von Jo***
#> 98: !!!YOR2: 4th ed. by Alfred Dörffel (Leipzig: B***
#> 99: !!!YOR2: c.1875). 178 pp. Plate "V.A.10". reprint:***
#> 100: !!!YOR4: Chorales (New York: Associated Music Publi***
#> 101: !!!SMS: B&H, 4th ed, Alfred Dörffel, c.1875, p***
#> 102: !!!EED: Craig Stuart Sapp
#> 103: !!!EEV: 2009/05/22
#> ######################## ^^^ chor004.krn ^^^ #########################
#> (***five global comments truncated due to screen size***)
#>
#> humdrumR corpus of four pieces.
#>
#> Data fields:
#> Token :: character
#> *paste(Token, collapse = "|") :: character
#>