Some IPv6 Address Examples

If someone is going to parse ipv6 addresses, can you give a good spread of examples of the edge cases and things to worry about? A good list of ipv6 addresses that is fairly short but gives coverage of the kinds of things you need to support?


IPv6 Address Parsing Test Cases

Basic Format Variations

2001:0db8:85a3:0000:0000:8a2e:0370:7334     # Full format
2001:db8:85a3:0:0:8a2e:370:7334             # Leading zeros omitted
2001:db8:85a3::8a2e:370:7334                # Zero compression

Zero Compression Edge Cases

::1                                         # Loopback (all zeros except last)
::                                          # All zeros address
2001:db8::                                  # Trailing zeros compressed
::2001:db8                                  # Leading zeros compressed
2001::1                                     # Middle zeros compressed
fe80::1%lo0                                 # With zone identifier

Special Addresses

::ffff:192.0.2.1                           # IPv4-mapped IPv6
::ffff:0:192.0.2.1                         # IPv4-mapped (alternative)
64:ff9b::192.0.2.33                        # IPv4-embedded IPv6
2002:cb00:7100::                            # 6to4 prefix
ff02::1                                     # Multicast
fe80::1                                     # Link-local
fc00::1                                     # Unique local

Bracket and Port Notation

[2001:db8::1]:80                            # With port
[::1]:8080                                  # Loopback with port
[fe80::1%eth0]:22                           # Zone ID with port

Maximum Length Cases

ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff     # All F's (maximum)
2001:0db8:85a3:0000:0000:8a2e:0370:7334     # Full expanded form

Tricky Parsing Cases

2001:db8:0:0:1:0:0:1                        # Multiple zero groups (can't compress)
2001:0:0:0:0:0:0:1                          # Could be 2001::1
a:b:c:d:e:f:1:2                             # All single hex digits
::1:2:3:4:5:6:7                             # Compression at start with 7 groups
1:2:3:4:5:6:7::                             # Compression at end with 7 groups

Case Sensitivity

2001:DB8:85A3::8A2E:370:7334                # Uppercase
2001:db8:85a3::8a2e:370:7334                # Lowercase
2001:Db8:85A3::8a2E:370:7334                # Mixed case

Invalid Cases (Should Reject)

2001:db8::1::2                              # Double compression (invalid)
2001:db8:85a3::8a2e:370:7334:extra          # Too many groups
2001:db8:85a3:8a2e:370:7334                 # Too few groups (7 instead of 8)
2001:db8:85ag::1                            # Invalid hex character 'g'
2001:db8:12345::1                           # Group too long (>4 hex digits)

Key Parsing Rules to Implement

  1. Zero Compression: Only one :: allowed per address
  2. Group Count: Must have exactly 8 groups when expanded
  3. Hex Validation: Each group is 1-4 hexadecimal digits
  4. Case Insensitive: Accept both upper and lowercase
  5. Leading Zeros: Optional in each group
  6. Zone Identifiers: Handle % suffix for link-local addresses
  7. Brackets: Required when specifying ports
  8. IPv4 Mapping: Special handling for ::ffff: prefix

Every now and again I wonder if Ren-C's lexing is robust enough to represent things like IPv6. So I wanted to take a look at some examples.

Right now there is a rule that you can't have internal SPACE in CHAIN!, PATH!, or TUPLE!. So no a//b or a::b or a..b. This rule is enforced in order to let http://example.com/stuff be recognized as a URL!, and not a PATH! like:

to path! [http: _ example.com stuff]

If we relaxed this rule for CHAIN!, then things like this would be closer to LOAD-able syntax:

But it runs afoul of the fact that we don't support things that start with digits and have letters after them.

Then there's wackiness like:

While it's interesting to see us getting closer to LOAD-ability for these things, I'm thinking we need to draw the line and use strings. People have praised Rebol's crazy literals in the past, there's just too much competition in the lexical space and it makes a mess.

So I'm fully abandoning any idea that IPv6 addresses scan as some weird CHAIN! (just as I had fully abandoned the idea that URL!s scan as some weird PATH!

We're moving toward this, which isn't terrible for when you want to do some dynamic creation:

address: {ipv6! join "fe80::" "1%lo0"}

And it occurs to me the opening up of -{...}- strings is a perfect opportunity to let them become "construction strings". This would be where you get a hook into LOAD itself, where you run on fully unbound data... and let whatever "recognizers" you choose duke it out:

address: -{fe80::1%lo0}-
cash: -{$10.20}-

So I'll be transitioning things like existing MONEY! code to use this. (It's better than using backticks, which was proposed before.)

2 Likes