Please wait while loading

IRG Working Set 2024v4.0

Source: Henry CHAN
Date: Generated on 2026-04-19

Show Deleted | Show comments from version: 1.0 2.0 3.0 4.0 | Show comments with status: Show All New Only Unresolved Only

Unification

Showing 6 comments.

SnImage/SourceComment TypeDescription
04665
04665
齊 210.10.2
SAT-09352
TS 20 · IDS 𱍸
Unification
U+9F4E
Unification to 齎 (U+9F4E)?

If we consider ⿳亠⿲刀了𱍸口 to be unifiable to 齊 (with new UCV added), then this would be a unification between ⿱齊貝 and ⿵齊貝 (also new UCV can be added).
02316
02316
爿 90.6.1
SAT-09705
TS 10 · IDS 𤕫
Unification
U+75BC
Unify to 疼 (U+75BC); add a new UCV of 疒 and 𤕫 level 2.

Seems to be extremely common variation of 𤕫 and 疒.

In existing encoded characters, 5 are variants of not "疒":

U+26896 𦢖 = 膺 (NOT 疒)
U+27B6D 𧭭 = 譍 (NOT 疒)
U+28FF3 𨿳 = 䧹 (NOT 疒)
U+2A1FF 𪇿 = 鷹 (NOT 疒)
U+2E34E 𮍎 = 臧 (NOT 疒)

Meanwhile all other characters are "疒":
U+2457A 𤕺 = 疾
U+308F1 𰣱 = 痱
U+30901 𰤁 = 癅
U+30905 𰤅 = 癘
U+3090A 𰤊 = 癬
U+32B48 𲭈 = 痬
02316 = 疼
02318 = 痠
02321 = 𤸃
02322 = 瘠
02323 = 瘻
02718
02718
石 112.12.1
SAT-09838
TS 17 · IDS 𤘆
Unification
This variation seems to be somewhat systematic:


Unification
The difference is also not that large when you consider a bigger set of historically seen variants for this component:



01098
01098
宀 40.11.1
T13-3E27
TS 14 · IDS
Unification
U+21ABD
On the issue of unification to 𡪽 U+21ABD:

The following 5 characters are encoded containing the 䂓 component:

𡪽 U+21ABD - Extension B, "canonical" form being suggested for separate encoding
𢵅 U+22D45 - Extension B, "canonical" form encoded as 摫 U+646B
𥨖 U+25A16 - Extension B, "canonical" form encoded as 窺 U+7ABA
𧡯 U+2786F - Extension B, error form of 瞡, canonical form "⿰月䂓" not encoded
𭬉 U+2DB09 - Extension F, "canonical" form encoded as 槻 U+69FB

𡪽 U+21ABD itself is an error form of 窺 U+7ABA.

If we apply the principle that "a 'canonical' form must be encoded separately from an existing variant form", then the pseudo-canonical form for U+2786F would also be encoded separately.

This argument of "a 'canonical' form must be encoded separately from an existing variant form as a glyph change will destabilize existing data sources" has been pushed by China multiple times and it has been applied in some previous working sets. At least one character has been disunified because the variant form was found in the GKX, as GKX was considered authoritative.

Per my memory this is the first time it is being applied to a character from GHZ origin. Historically China has opted to modify some character glyphs for those with the GHZ origin, especially the canonical glyph was found in GHZR.

As GHZ is also considered an authoritative dictionary by IRG, same argument could be made. If IRG accepts the idea of "a 'canonical' form must be encoded separately from an existing variant form", we should limit it to only characters where the existing encoded character is encoded from specific authoritative sources.

What is regarded as a canonical form may also change based on who is looking at it. So if this principle is accepted, every time this principle is invoked, IRG should keep a record of the selected form, such that future characters are normalized to prevent duplicated forms from being encoded.

--

On the issue of the general unifiability of 規 and 䂓:

This was added from the discussion of 00543.

A number of variant forms have already been registered on zi.tools:


(first character and third character)



Other examples from the MOE Variant Dictionary:



I prefer to keep this UCV.
02446
02446
玉 96.11.2
TB-7021
TS 15 · IDS 𡈼
UCV
Add a new UCV, ⿰⿱山王攵, ⿰⿱山壬攵, ⿰⿱山𡈼攵, ⿰⿱山主攵, ⿰⿳山一王攵, ⿰⿳山一壬攵, ⿰⿳山一𡈼攵, level 1.

Add a new UCV, ⿰⿱山王攴, ⿰⿱山壬攴, ⿰⿱山𡈼攴, ⿰⿱山主攴, ⿰⿳山一王攴, ⿰⿳山一壬攴, ⿰⿳山一𡈼攴, level 1.

UCV 攵 and 攴 level 1 already exist.

Attributes

Showing 1 comments.

SnImage/SourceComment TypeDescription
02611
02611
目 109.8.4
VN-F200F
TS 13 · IDS
Residual Stroke Count
[ Unresolved from v2.0 ]
SC=9, TS=14.

务 should be counted as ⿱攵力 here per Kangxi conventions.

Glyph Design & Normalization

Showing 1 comments.

SnImage/SourceComment TypeDescription
00277
00277
艸 140.9.3
刀 18.10.1
GZ-1852301
TS 12 · IDS
Normalization
[ Unresolved from v3.0 ]
Support normalization to ⿱艹剑 as the traditional form 𧁴 ⿱艹劍 is coded as U+27074.

Other

Showing 4 comments.

SnImage/SourceComment TypeDescription
03168
03168
肉 130.10.2
GCCPP-00019
TS 14 · IDS
Comment
[ Unresolved from v1.0 ]
The Traditional Variant needs to be checked as the Traditional Variant is currently under radical moon instead of expected meat.
00264
00264
刀 18.6.4
GXM-00528
TS 8 · IDS
Comment
[ Unresolved from v1.0 ]
Semantic Variant of 𣶒?
02050
02050
毛 82.11.2
SAT-09392
TS 15 · IDS
Comment
[ Unresolved from v1.0 ]
See also: https://hc.jsecs.org/irg/ws2017/app/?id=01867
Another form of this character is ⿰睪毛.
02097
02097
水 85.8.4
UK-30451
TS 11 · IDS
Comment
Kushim mentioned that similar forms were unified in the IVD for U+52DD:

Data for Unihan

Showing 1 comments.

SnImage/SourceComment TypeDescription
04529
04529
鳥 196.3.1
士 33.11.3
VN-F1E88
TS 14 · IDS
Unihan data
kSpoofingVariant 䲧 (U+4CA7)