Skip to content

Utf8ByteMap

Namespace: Hex1b.Documents

Assembly: Hex1b.dll

Bidirectional mapping between raw byte offsets and character indices. Built from actual document bytes (not re-encoded text), so the mapping is accurate even when the bytes contain invalid UTF-8 sequences.

csharp
public sealed class Utf8ByteMap

Inheritance

ObjectUtf8ByteMap

Constructors

Utf8ByteMap(ReadOnlySpan<byte>)

Builds a byte↔char map by decoding the raw bytes as UTF-8. Tracks how many source bytes each decoded character consumed.

Parameters:

csharp
public Utf8ByteMap(ReadOnlySpan<byte> bytes)

Utf8ByteMap(string)

Builds a byte↔char map from a string by encoding to UTF-8 first. Use this overload only when the string was decoded from valid UTF-8 (no byte-level edits). Prefer the ReadOnlySpan&lt;byte&gt; overload when raw document bytes are available.

Parameters:

csharp
public Utf8ByteMap(string text)

Properties

CharCount

Total number of characters produced by decoding.

Returns: Int32

csharp
public int CharCount { get; }

TotalBytes

Total number of bytes in the source.

Returns: Int32

csharp
public int TotalBytes { get; }

Methods

ByteToChar(int)

Maps a byte offset to the character that contains it. Returns the character index and the byte's position within that character's source byte range.

Parameters:

Returns: ValueTuple<Int32, Int32>

csharp
public (int charIndex, int byteWithinChar) ByteToChar(int byteOffset)

CharByteLength(int)

Returns the number of source bytes that produced the character at charIndex.

Parameters:

Returns: Int32

csharp
public int CharByteLength(int charIndex)

CharToByteStart(int)

Returns the starting byte offset for the character at charIndex.

Parameters:

Returns: Int32

csharp
public int CharToByteStart(int charIndex)

Released under the MIT License.