Commit a2582831 authored by Martti Käärik's avatar Martti Käärik
Browse files

Revamp of identifier rules (TO BE REVIEWED)

parent b1edae65
Loading
Loading
Loading
Loading
Loading
+277 −0
Original line number Diff line number Diff line
# Identifier rules in `TDLtx.xtext`

This document describes the identifier-shape rules used by the TDL textual
grammar, the rationale for each rule's admission set, and the principles that
govern when a name slot or cross-ref should use which rule.

## Why multiple rules

In Xtext, every literal string in the grammar (`'name'`, `'type'`,
`'component'`, …) becomes its own keyword terminal at lexer time. The lexer
runs before the parser and produces keyword tokens whenever a keyword pattern
matches; the parser cannot retroactively ask the lexer to treat a keyword
token as a generic identifier. So if a user writes `Type name`, the second
token is lexed as the keyword `name`, not as `ID`.

To allow keywords to be used as names anyway, an identifier rule lists the
keywords as alternatives — `Identifier: ID | 'name' | 'type' | …`. The parser
then accepts either token kind in the name position. Each keyword added to
such a rule is a usability win (one less identifier the user has to escape
with `^`) but costs ambiguity budget: the parser has to be able to
distinguish that keyword as a name from any other role the keyword plays at
the same syntactic position.

Different name positions in the grammar tolerate different amounts of
admission. The rules below are factored along that axis so that each
position uses the widest rule it can while staying unambiguous.

## Position taxonomy

Each name slot in the grammar belongs to one of these position kinds:

| Position | Description | Example |
|---|---|---|
| **Free** | Reachable as a first token of an expression (DataUseWrapped). Competes with `'instance'`, `'parameter'`, `'omit'`, `'an'`, `'size'`, `'not'`, `'-'`, `'('`, `'?'`, `'*'`, `'@'` as alternative starts. | `dataElement` in DataElementUse, `componentInstance` in VariableUse |
| **Tight** | Anchored on at least one side by a structural token (`.`, `(`, type-ref, `=`, `\|`, `,`, `)`). No competing alternative starts with an identifier in this state. | Member name after `.`, parameter name after `(` in a binding |
| **Introducer-anchored** | Preceded by a strong introducer keyword (`Type`, `Action`, `'execute'`, `'perform'`, …). The next token's role is fixed by the parent rule. | Top-level decl names, anchored cross-refs |
| **Qualifier-prefixed** (TO) | Preceded by `Qualifier*` greedy chain in TO contexts. Both the qualifier loop and the name slot accept any identifier; the parser greedily takes all but the last. | TO `LiteralValueFragment.name`, `DataContent.name` |

Free positions are the most constrained — every keyword admitted there must
not be a sibling-first token, not a Qualifier-vocabulary word, and not an
expression operator. Tight positions are looser. Introducer-anchored
positions are the loosest, bounded mostly by what the rule's *follower*
keyword set is.

## TDL-side rules (three tiers)

### `Identifier` — free-position rule

Used wherever a name appears at a position reachable from inside an
expression: bare `DataElementUse`, `VariableUse`'s component segment,
cross-refs to `NamedElement`, `DataType`, `ComponentInstance`, `Variable`,
`Timer`, `TimeLabel`, `DataInstance`, formal parameters, etc.

Admits a conservative keyword set:

```
'name' 'type' 'value' 'attribute'
'time' 'point' 'default'
'entity' 'event' 'component' 'variable' 'timer'
'argument' 'action' 'behaviour'
'verdict' 'exception'
'get'
'when' 'then'
'check' 'where'
'sends' 'receives' 'triggers' 'accepts'
```

Excluded and why:

| Excluded | Reason |
|---|---|
| `size`, `instance`, `parameter`, `omit`, `not`, `an` | First-token of a sibling alternative under `DataUseWrapped` |
| `start`, `stop`, `from`, `to`, `before`, `after`, `during`, `within`, `of`, `in`, `on`, `for`, `by`, `into` | Member of a typed `Qualifier` rule (CommonWord, Direction, TimeConstraint). Admitting them would degrade TO comment classification |
| `a`, `an`, `the` | `ArticleQualifier` member; `'an'` also a sibling-first of `DataInstanceUse` |
| `and`, `or`, `xor`, `not`, `mod`, `as` | Expression operators / cast keyword |
| `gate` | Would clash with the `'on' (ComponentInstance | 'gate' GR)` discriminator in `Quiescence` |

### `TightName` — tight-position rule

Used at: `Member.name` (struct decl), `MemberReference` (after `.`),
`MemberAssignment` cross-ref, TDL-side `ParameterBinding.parameter` cross-ref.

Extends `Identifier` with keywords that are blocked from `Identifier`
by expression-level conflicts but become safe when the position is
anchored on at least one side by a structural token:

```
+ 'start' 'stop'
+ 'from' 'to'
+ 'before' 'after'
+ 'size'
+ 'instance' 'parameter'
+ 'gate'
```

(The `Identifier`-blocking reasons — sibling-first tokens, Qualifier
vocabulary, Quiescence's `'gate'` discriminator — are all unreachable from
inside a tight-anchored slot, so the keywords become safe.)

### `DeclName` — introducer-anchored rule

Used at TDL-side decl-name slots (top-level introducer-anchored) and at
their anchored cross-refs:

- Decls: `TestObjective`, `Constraint`, `ProcedureSignature`, `Action`,
  `Function`, `GateType`, `ComponentType`, `TestConfiguration`,
  `TestDescription`, `Time`, `Timer`.
- Cross-refs: `extending=[PackageableElement|DeclName]`, `Import`'s
  packageable-element list, `'execute' [TestDescription|DeclName]`,
  `'perform' [Action|DeclName]`, `'uses' [TestConfiguration|DeclName]`,
  `Objective: [TestObjective|DeclName]`, `'calls'`/`'responds with'`
  `[ProcedureSignature|DeclName]`, `'instance' 'returned' 'from'`
  `[Function|DeclName]`, `[ComponentType|DeclName]` and `[GateType|DeclName]`
  in typed-decl positions, `[ConstraintType|DeclName]`,
  `[MappableDataElement|DeclName]` in `Map`, `[Timer|DeclName]` after `'::'`.

Admits everything `Identifier` admits, plus the same extras `TightName`
adds, plus a few more that are tight-or-introducer-only:

```
+ everything Identifier admits
+ everything TightName adds
+ 'during' 'within'
+ 'omit'
```

Excluded and why:

| Excluded | Reason |
|---|---|
| `with` | Universal optional decl follower (body-block opener) |
| `extends` | Follower of `Type`, `Structure`, `Component`, `Gate` decl names |
| `optional` | Follower of `Structure` decl name |
| `of` | Follower of `Collection` decl name |
| `returns` | Follower of `Function` decl name |
| `accepts` | Follower of `Gate` decl name |
| `uses` | Follower of `TestDescription` decl name |
| `now` | Follower of `TimeLabel` decl name |
| `as` | Follower of `ComponentInstance` decl name; introducer of mapping clauses |
| `Description`, `Reference`, `Configuration`, `PICS`, `PIXIT`, `Bindings`, `Objective` | TP / Variant header-section keywords |
| `Note` | Annotation-comment introducer |
| Top-level / capitalized section keywords (`Package`, `Type`, `Structure`, `Action`, `Function`, …) | Sibling decl introducers |
| `and`, `or`, `xor`, `not`, `mod` | Expression operators |

## Auxiliary rules

### `AIdentifier` — annotation key/name

Annotation keys and `AnnotationType` decl names. Extends `Identifier` with
the multi-word Test-Purpose-block keywords so `@Initial conditions`,
`@Expected behaviour`, `@PICS`, `@PIXIT`, `@Test Purpose Description` parse
as annotation keys. `@when`, `@then`, `@check`, `@where` parse via the
`Identifier` branch since those four keywords are admitted there.

### `CheckIdentifier` — closed vocabulary

A two-keyword choice (`check` | `where`) used at the annotation key slot
that introduces a check-style annotation. Despite the suffix, this is a
constraint, not a generalised name rule.

### `KIdentifier` — TO Event names

`ID` plus the gate-action keywords `'sends'`, `'receives'`, `'triggers'`,
`'accepts'`, and `'in'`. Used for `to::Event` decl and refs only. Kept TO-
local because Event naming is part of the TO sub-language's natural-language
style.

### `GRIdentifier` — qualified gate reference

`Identifier ('::' Identifier)?` — qualified `component::gate` form. Used
only at cross-ref slots that resolve to `GateReference` (`Quiescence`'s gate
alternative, Message/ReceiveMessage source/target, ProcedureCall source,
etc.). Each segment uses `Identifier` so the auto-qualified form
`<CI.name>::<GI.name>` resolves correctly when component or gate names use
keyword admissions — both `ComponentInstance.name` and `GateInstance.name`
are on `Identifier`.

The single-segment form (no `::`) targets the local-alias slot on
`GateReference` (`name=ID`). Local aliases are intentionally narrow (raw
`ID`); cross-ref input that uses an `Identifier` keyword as the
single-segment form will parse but always fail to resolve, which is
harmless.

### `PackageName` — raw `ID`

The Package decl name and each segment of `QIdentifier`. Kept narrow
deliberately: package names appear as segments of qualified paths, and
qualified paths may meet expression context in future extensions. Keeping
the segment admission narrow avoids ambiguity at those future positions.

### `QIdentifier` — dotted package path

`PackageName ('.' PackageName)*`. Used only at `'from'
importedPackage=[Package|QIdentifier]`.

### `NIdentifier` — numeric label

`'-'? INT ('.' INT)?`. Despite the suffix, does not reference `ID`.
Appears in name slots where numeric labels are acceptable (clause numbers
in test-purpose descriptions, AnyValue / SpecialValue numeric names, TO
content names).

## TO sub-language: `TOIdentifier`

Used in TO-only rules (`StructuredTestObjective`, `Variant`,
`TestPurposeDescription`, `EventSpecificationTemplate`, `Entity`, `PICS`,
the `Qualifier` family, `LiteralValueFragment`, `DataContent`,
`LiteralValueReference` and `ContentReference` cross-refs, `TODataElementUse`,
`TOParameterBinding`).

Admits only the four canonical keywords (`'name'`, `'type'`, `'value'`,
`'attribute'`). Held narrow so that TDL-side widening of `Identifier` does
not leak into TO parsing — the TO sub-language has its own evolution path
and its `Qualifier*` chains are sensitive to the keyword admission set
(both for parsing safety, since unbounded keyword admission grows the
LL(*) lookahead, and for fidelity, since admitting Qualifier-vocabulary
keywords into `Qualifier` would degrade typed-comment classification).

## Symmetry between decl and ref rules

A name a user can declare must also be referenceable. For each element
kind, the decl rule and the cross-ref rule for that element must be
keyword-compatible: the cross-ref rule must admit at least every keyword
the decl rule does, otherwise a declared name becomes unreferenceable.

The current splits respect this:

- Element kinds whose cross-refs reach expression positions (`DataType`,
  `ComponentInstance`, `GateInstance`, `Variable`, `TimeLabel`,
  `NamedElement`, `DataInstance`) keep their decl names on `Identifier`
  so the cross-ref can stay on `Identifier` too without expression-level
  conflicts.
- Element kinds whose cross-refs are always introducer-anchored
  (`TestObjective`, `TestConfiguration`, `TestDescription`, `Action`,
  `Function`, `ConstraintType`, `ComponentType`, `GateType`,
  `ProcedureSignature`, `Time`, `Timer`, `MappableDataElement`,
  `PackageableElement` via `Import`/`extending`) use `DeclName` on both
  sides.
- `Member`, `FormalParameter`, `ProcedureParameter`, and the abstract
  `Parameter` cross-refs (in `ParameterBinding`, `ParameterMapping`,
  `ValueAssignmentProcedure`, and the `'parameter' [FormalParameter|...]`
  slot) all use `TightName` on both sides.
- `GateReference` is two parities in one rule. Auto-qualified
  `<CI.name>::<GI.name>` references parse via `GRIdentifier`'s two-segment
  form, where each segment is `Identifier` to match the underlying
  `ComponentInstance.name` and `GateInstance.name` rules. Local aliases
  declared as `name=ID` are matched by the single-segment form of
  `GRIdentifier`; an Identifier-keyword in the single-segment form parses
  but never resolves (no decl admits keyword aliases), which is harmless.
- TO sub-language uses `TOIdentifier` on both sides.

Some intentional asymmetries:

- The `MappableDataElement` cross-ref uses `DeclName` to cover its widest
  subclass (`Action`, on `DeclName`); `DataInstance` and `DataType` are
  narrower (`Identifier`) but still resolve through the wider rule.
- `GateReference.name` and `ExtendedGateReference.name` local-alias slots
  use raw `ID` rather than the wider `Identifier` admitted by the
  single-segment cross-ref form. The narrower decl is deliberate —
  aliases are not expected to use keyword names — and the wider cross-ref
  side just yields harmless unresolvable lookups for keyword input.

## Trade-offs at a glance

| Rule | Reach | Admits keywords | Use it for |
|---|---|---|---|
| `Identifier` | Free (expression-reachable) | Most conservative | Names referenced in expressions |
| `TightName` | Tight (anchored both sides) | + sibling-first / Qualifier-vocab safe | Member / parameter-binding tight slots |
| `DeclName` | Introducer-anchored | + nearly all non-follower keywords | Top-level decls and their anchored cross-refs |
| `TOIdentifier` | TO sub-language only | Frozen narrow | All TO name positions |
| `AIdentifier` | Annotation key/name | Identifier + TP-block multi-word keywords | Annotation key/name slots |
| `KIdentifier` | TO Event names | ID + gate-action keywords | `to::Event` decl/ref |
| `CheckIdentifier` | Annotation key for check/where | Closed two-keyword choice | Check-annotation key only |
| `GRIdentifier` | Gate-reference cross-ref | Identifier per segment | `component::gate` cross-ref |
| `PackageName` | Package decl + QIdentifier segment | Raw ID | Package names and qualified-path segments |
| `QIdentifier` | Qualified package path | Raw ID per segment | `'from' [Package|QIdentifier]` only |
| `NIdentifier` | Numeric-as-name | Numeric literals only | Name slots accepting numeric labels |
+198 −106

File changed.

Preview size limit exceeded, changes collapsed.