Utf8ByteMap
Namespace: Hex1b.Documents
Assembly: Hex1b.dll
Bidirectional mapping between raw byte offsets and character indices. Built from actual document bytes (not re-encoded text), so the mapping is accurate even when the bytes contain invalid UTF-8 sequences.
public sealed class Utf8ByteMapInheritance
Object → Utf8ByteMap
Constructors
Utf8ByteMap(ReadOnlySpan<byte>)
Builds a byte↔char map by decoding the raw bytes as UTF-8. Tracks how many source bytes each decoded character consumed.
Parameters:
bytes(ReadOnlySpan<Byte>):
public Utf8ByteMap(ReadOnlySpan<byte> bytes)Utf8ByteMap(string)
Builds a byte↔char map from a string by encoding to UTF-8 first. Use this overload only when the string was decoded from valid UTF-8 (no byte-level edits). Prefer the ReadOnlySpan<byte> overload when raw document bytes are available.
Parameters:
text(String):
public Utf8ByteMap(string text)Properties
CharCount
Total number of characters produced by decoding.
Returns: Int32
public int CharCount { get; }TotalBytes
Total number of bytes in the source.
Returns: Int32
public int TotalBytes { get; }Methods
ByteToChar(int)
Maps a byte offset to the character that contains it. Returns the character index and the byte's position within that character's source byte range.
Parameters:
byteOffset(Int32):
Returns: ValueTuple<Int32, Int32>
public (int charIndex, int byteWithinChar) ByteToChar(int byteOffset)CharByteLength(int)
Returns the number of source bytes that produced the character at charIndex.
Parameters:
charIndex(Int32):
Returns: Int32
public int CharByteLength(int charIndex)CharToByteStart(int)
Returns the starting byte offset for the character at charIndex.
Parameters:
charIndex(Int32):
Returns: Int32
public int CharToByteStart(int charIndex)