`\d` in a regex isn't always equal to `[0-9]`

Published on Oct 31, 2023 in JavaScript and Regular expressions

In JavaScript, \d and [0-9] are equal and match only Arabic numerals (the numbers 0–9). But in some other languages, \d matches also non-Arabic numerals.

There are 680 Unicode characters in the "Number, Decimal Digit" category. For example:

Name	Characters
Digits (Arabic numerals)	0123456789
Arabic-Indic digits	٠١٢٣٤٥٦٧٨٩
Extended Arabic-Indic digits	۰۱۲۳۴۵۶۷۸۹
NKo digits	߀߁߂߃߄߅߆߇߈߉
Devanagari digits	०१२३४५६७८९

Testing on regex101.com:

\d in JavaScript matches 10 of them (Arabic numerals only)
\d in C# matches 370 of them
\d in Python matches 540 of them
\d in Rust matches all 680

(I haven't tested whether \d in those three languages matches also other characters than those 680.)

[0-9] matches only Arabic numerals in those four languages, but even [0-9] can't be always trusted (a few typos fixed in the quotation):

It is generally believed that [0-9] matches only the ASCII digits 0123456789. That is painfully false in some instances: Linux in some locale that is not "C" (June 2020) systems, for example:

Assume:
str='0123456789 ٠١٢٣٤٥٦٧٨٩ ۰۱۲۳۴۵۶۷۸۹ ߀߁߂߃߄߅߆߇߈߉ ०१२३४५६७८९'
Try grep to discover that it allows most of them:
$ echo "$str" | grep -o '[0-9]\+'
0123456789
٠١٢٣٤٥٦٧٨
۰۱۲۳۴۵۶۷۸
߀߁߂߃߄߅߆߇߈
०१२३४५६७८
sed has some troubles. Should remove only 0123456789 but removes almost all digits. That means that it accepts most digits but not some nines (???):
$ echo "$str" | sed 's/[0-9]\{1,\}//g'
 ٩ ۹ ߉ ९
Even expr suffers from the same issues as sed:
expr "$str" : '$[0-9 ]*$' # also matching spaces
0123456789 ٠١٢٣٤٥٦٧٨
And also ed:
printf '%s\n' 's/[0-9]/x/g' '1,p' Q | ed -v <(echo "$str")
105
xxxxxxxxxx xxxxxxxxx٩ xxxxxxxxx۹ xxxxxxxxx߉ xxxxxxxxx९

Huh. Curious.

I guess I'd better remember to avoid \d and prefer [0-9] or even [0123456789] when using other languages than JavaScript.

When writing JavaScript, I'll continue using \d as it's as clear (d = digit) as and shorter than [0-9].

\d in a regex isn't always equal to [0-9]

`\d` in a regex isn't always equal to `[0-9]`