% This file contains test sentences, and the expected positions (start, end)
% of their words. The first P line is char position, and the second one is
% byte position.

Ithis is a test
PLEFT-WALL(0, 0) this.p(0, 4) is.v(5, 7) a(8, 9) test.n(10, 14) RIGHT-WALL(14, 14)
PLEFT-WALL(0, 0) this.p(0, 4) is.v(5, 7) a(8, 9) test.n(10, 14) RIGHT-WALL(14, 14)
P
% Middle extra whitespace.
Ithis is a  test
PLEFT-WALL(0, 0) this.p(0, 4) is.v(5, 7) a(8, 9) test.n(11, 15) RIGHT-WALL(15, 15)
PLEFT-WALL(0, 0) this.p(0, 4) is.v(5, 7) a(8, 9) test.n(11, 15) RIGHT-WALL(15, 15)
P
% Initial whitespace.
I this is a test
PLEFT-WALL(0, 0) this.p(1, 5) is.v(6, 8) a(9, 10) test.n(11, 15) RIGHT-WALL(15, 15)
PLEFT-WALL(0, 0) this.p(1, 5) is.v(6, 8) a(9, 10) test.n(11, 15) RIGHT-WALL(15, 15)
P

% Various kinds of input splits.
Iit's a test.
PLEFT-WALL(0, 0) it(0, 2) 's.v(2, 4) a(5, 6) test.n(7, 11) .(11, 12) RIGHT-WALL(12, 12)
PLEFT-WALL(0, 0) it(0, 2) 's.v(2, 4) a(5, 6) test.n(7, 11) .(11, 12) RIGHT-WALL(12, 12)
P

Ithis is--a test
PLEFT-WALL(0, 0) this.p(0, 4) is.v(5, 7) --.r(7, 9) a(9, 10) test.n(11, 15) RIGHT-WALL(15, 15)
PLEFT-WALL(0, 0) this.p(0, 4) is.v(5, 7) --.r(7, 9) a(9, 10) test.n(11, 15) RIGHT-WALL(15, 15)
P

% A different byte and char positions for non-ASCII.
II love going to the café.
PLEFT-WALL(0, 0) I.p(0, 1) love.v(2, 6) going.v(7, 12) to.r(13, 15) the(16, 19) café.n(20, 24) .(24, 25) RIGHT-WALL(25, 25)
PLEFT-WALL(0, 0) I.p(0, 1) love.v(2, 6) going.v(7, 12) to.r(13, 15) the(16, 19) café.n(20, 25) .(25, 26) RIGHT-WALL(26, 26)
P

% Test linkages w/null-linked words
-max_null_count=1

IThis is a the test
PLEFT-WALL(0, 0) this.p(0, 4) is.v(5, 7) [a](8, 9) the(10, 13) test.n(14, 18) RIGHT-WALL(18, 18)
PLEFT-WALL(0, 0) this.p(0, 4) is.v(5, 7) [a](8, 9) the(10, 13) test.n(14, 18) RIGHT-WALL(18, 18)
P

% Here "As" gets split by the tokenizer into 2 alternatives:
% 1: As
% 2: A.u s.u
% There is no full linkage, and in order to show the result in a reasonable
% way (this is most beneficial if there are more then two such alternatives)
% the library combines back the failed splits and should recalculate
% the word position of these combined splits correctly.

IAs no linkage
PLEFT-WALL(0, 0) [as](0, 2) no.misc-d(3, 5) linkage.n-u(6, 13) RIGHT-WALL(13, 13)
PLEFT-WALL(0, 0) [as](0, 2) no.misc-d(3, 5) linkage.n-u(6, 13) RIGHT-WALL(13, 13)
P
