Please wait while loading

IRG Working Set 2021v5.0

Source: Lee COLLINS
Date: Generated on 2024-12-14

Show Deleted | Show comments from version: 1.0 2.0 3.0 4.0 5.0 6.0 7.0
The Image/Source column is displayed as it was in WS2021 v5.0. The character may have a different status in the latest working set.

Unification

SnImage/SourceComment TypeDescription
03698
03698
衣 145.4.1
SAT-06753
TS 10 · IDS 𧘇
Oppose Unification
While I agree that in this case, the character is a variant of U+8870 / U+2E571, it is also used as a variant of 襄, both as a component in U+2A48C and U+2C340 and standalone. I would argue for separation.
Oppose Unification
I don't currently have evidence of the standalone usage of this as U+8944 襄, but given the examples of component usage as 襄, it's certainly possible that we would find this as a standalone form.
00251
00251
儿 10.18.3
SAT-06900
TS 20 · IDS
Oppose Unification
Unifying 兒 with 鬼 would be too confusing.
01721
01721
日 72.16.3
V0-3962
TS 20 · IDS
Oppose Unification
In the 30 plus years of experience with encoding Nôm we have come up with exactly 5 pairs of characters which distinguish 𦥷 and 興. With the exception of V0-3962 and VN-232F1, all are already encoded: e.g U+2C2D9 and U+2444D. I don't see this as a big win, it would just add another difficult to explain unification rule.
04034
04034
酉 164.10.2
V0-4562
TS 17 · IDS
Oppose Unification
As Eiso points out, these two characters were addressed by the IRG during the discussion of IRGN2429. They different in shape, non-cognate, and should not be unified.
01611
01611
攴 66.10.2
VN-F024D
TS 14 · IDS 𭓇
Oppose Unification
We should not unify simplified forms.
00036
00036
丿 4.11.5
VN-F0BE9
TS 12 · IDS 𠂊丿𰀁
Oppose Unification
We oppose unification of simplified forms.
00880
00880
土 32.11.4
VN-F1779
TS 14 · IDS 𱷥
Oppose Unification
For extension H, the IRG previously debated unification U+31E22 𱷥 with 籠 and U+31DE5 𱷥 with variants of 龍. Unification was rejected. There is no new information here and I see no reason to change.
02082
02082
水 85.11.4
VN-F19BF
TS 14 · IDS 𱷥
Oppose Unification
For extension H, the IRG previously debated unification U+31E22 𱷥 with 籠 and U=31DE5 𱷥 with variants of 龍. Unification was rejected. There is no new information here and I see no reason to change.
02595
02595
田 102.11.4
VN-F1C41
TS 16 · IDS 𱷥
Oppose Unification
U+31DE5 is less common than 竜. For extension H, the IRG previously debated unification U+31E22 𱷥 with 籠 and U+31DE5 𱷥 with variants of 龍. Unification was rejected. There is no new information here and I see no reason to change.


Attributes

SnImage/SourceComment TypeDescription
03597
03597
虫 142.11.1
GKJ-00409
TS 17 · IDS
IDS
More concise IDS with U+31858: ⿱𱡘虫
00341
00341
力 19.11.1
SAT-06152
TS 13 · IDS
IDS
More concise IDS with U+31858: ⿱𱡘力
04242
04242
門 169.14.5
V0-4925
TS 22 · IDS
Residual Stroke Count
My understanding is that for IRG work we count 及 as 4 strokes (different from Unihan data). 盍 is 10, so SC = 14. This is consistent with the simplified form: U+28E0D
00259
00259
米 119.7.5
VN-F001E
TS 13 · IDS
Radical
Looking at all of the encoded Vietnamese characters with 娄 on the left side, since 娄 is the phonetic, with the exception of U+21890, which uses # 38, the radicals used are based on the right side semantic element. Since we we have now changed the basis for determining the radical, we might want to go back and the change previously encoded characters at some time.
03735
03735
見 147.2.4
VN-F07B9
TS 9 · IDS 𬼀
IDS
⿽見𬼀 is probably the best now that we have it, since it is neither ⿱ nor ⿰
00244
00244
儿 10.11.2
VN-F0B0E
TS 13 · IDS
Radical
Radical 10 is based on the semantic (光 > open), but could also use the phonetic 豸 153 (SC 6) as more intuitive.
04240
04240
門 169.11.4
VN-F1690
TS 19 · IDS
Radical
Agree with Ken
04252
04252
门 169′.14.1
VN-F1CB7
TS 17 · IDS 𫔭
Radical
门 is the semantic element (the character means "to open"), so it should at least be kept as secondary


Evidence

SnImage/SourceComment TypeDescription
00117
00117
亠 8.10.3
GKJ-00984
TS 12 · IDS
Evidence
I would be careful about assuming that this is an error form. In Vietnamese, this form has a specific meaning, for example, U+2015C 𠅜 (⿱亠例) is thought to be an abbreviation of ⿱麻例 (ma+lệ) where the reduced form 亠 for "ma" indicates an initial "*ml-" in spoken Vietnamese when the character was first used. We need more information.
00115
00115
亠 8.9.4
SAT-04260
TS 11 · IDS
Unclear evidence response
Agree with #12638, this seems to be a simplification of the the form 褱, similar to the the way 坏 is used for 壞. Modern Chinese editions typically use simplified forms for classical texts, but that does not invalidate the form or the edition.
00048
00048
乙 5.4.3
UK-20570
TS 5 · IDS
Misidentified glyph
This is and the character next to it, are arguably not transcriptions of Sanskrit, but attempts to reproduce forms of the Siddham characters "ni" and "svā" Variants of this script can be found throughout East Asia. We should consider whether to encode the complete alphabet and combinations in a separate block.
Evidence
Agree with Andrew's comment in #14761
02305
02305
牛 93.1.5
UK-20572
TS 5 · IDS
Misidentified glyph
This is arguably not a transcription of Sanskrit, but an attempt to reproduce a form of the Siddham character "ha". Variants of this script can be found throughout East Asia. Here is an example from Vietnam:



It might be better to collect the whole alphabet and combined forms as a separate script rather than encode piecemeal in the Ideographic blocks.
04746
04746
鳥 196.10.2
V1-6C7B
TS 21 · IDS
Evidence
It is important to note that the Vietnamese is not a misprint. 茶 (Sino-Vietnamese, "trà") is clearly the desired phonetic for "chà". 荼 (SV: đồ) is more commonly used for "dưa", "giưa", etc.
00300
00300
刀 18.4.2
VN-F0053
TS 6 · IDS
Evidence
The Vietnamese means "to split lengthwise". It's unlikely that this is cognate with the Chinese usage, but since the shapes are identical we can treat both usages as a single character. This is similar to other separately created characters, such as 畑: Vietnamese "đèn" = lamp, Japanese はたけ = dry field.
01509
01509
手 64.11.2
VN-F021D
TS 14 · IDS
Evidence
The dictionary, Giúp Đọc, just explains the structure: "thủ" = 手 and the phonetic, "sùng" 崇. The final, 'ung' of 崇 is an exact match, so I guess the question is about the initial "s-". There are certainly many cases where the phonetic is not an exact match, but the use of "s-" here to represent a velar does seem to be an outlier. Possibly this is a borrowing from an older stage of Chinese that contained a velar (at least according to Baxter-Sagart reconstructions). For example, "sen" (lotus) is thought to be derived from 蓮, which they reconstruct as *k.[r]ˤe[n]. Similarly, the word river, "sông" is thought to be derived from an earlier Austroasiatic word similar to Mon "kruŋ". Perhaps in some circumstances, the velar was retained.
00005
00005
一 1.6.5
VN-F0BEA
TS 7 · IDS &S8-01;
Evidence
In the Vietnamese source, this is a stand-alone character, read "rông" (from "rồng" = "dragon"), meaning "unrestrained, dissolute". The new evidence shows that it can also be used as a component.
01457
01457
手 64.6.4
VN-F18AA
TS 10 · IDS
Evidence
Prof Hồng is fairly consistent in distinguishing 扌 and 手 in his analyses. We will try to contact him to determine which is correct.
01713
01713
日 72.13.4
VN-F191D
TS 17 · IDS
Evidence
I agree with the comment. Structurally, 淫 (dâm) is a better phonetic than 滛 (phonetic "dao"), but if you see here (https://hvdic.thivien.net/whv/滛), many Vietnamese dictionaries consider the two to be equivalent. Here is another example where phonetic 滛 has a reading based on 淫

Evidence
Prof. Hồng, the author of the source, explains that the character is written with 滛. His intent in the structural analysis is to show that the intended phonetic was 淫. So, the font is correct.
03983
03983
辵 162.11.3
VN-F19B0
TS 15 · IDS
Evidence
According to Prof Hồng, the compiler of the dictionary, both forms are in use and the analysis is a short way to indicate that 速 can be abbreviated to 束.


Glyph Design & Normalization

SnImage/SourceComment TypeDescription
00048
00048
乙 5.4.3
UK-20570
TS 5 · IDS
Glyph design
It might be premature to change the shape. ⿱⿹⺄夕一 and ⿹⺄𢆰 possibly represent different Sanskrit syllables. The original evidence only explains ⿱⿹⺄夕一 as 尼. However, 尼 is used to write both dental "ni" (e.g 釋迦牟尼 śākyamuni) and retroflex "ṇi" (摩尼 maṇi). If these are attempts to represent some Brahmic script (梵書), it's possible that the difference in shape reflects the different sounds. The evidence for ⿹⺄𢆰 shows it the context of a dhāraṇī that is clearly "ṇi" in "maṇi". It would be safer to see ⿱⿹⺄夕一 in context to determine whether they are the same or different.
00681
00681
口 30.15.1
V0-3364
TS 18 · IDS 𤮃
Normalization
The difference is subtle and matches the design of most of our glyphs with the 瓦 component.
00717
00717
冫 15.17.3
VN-F00EC
TS 19 · IDS
Glyph design
We can consider an update
Glyph design
I checked other glyphs that have the same element, 善. Almost all of the characters with 善 on the right use the same design as VN-F00EC, many with 善 on the left do too. Since this includes many encoded characters it's not a trivial effort, requiring a proposal and review.
00742
00742
口 30.18.2
VN-F170F
TS 21 · IDS 𧀒
Normalization
This could possibly be 攵, but I don't have access to the source text cited by Prof. Hồng. Most of the glyphs with the component 嫩 have 攵, but U+21128 / V2-7259 "non: 𡄨 has 女. But, as you can see in the 2nd character of line 1369 of Kiều, the glyph is clearly 攵.



It would be reasonable to normalize VN-F170F and V2-7259 to 攵
Glyph design
We have updated the font to reflect the proposed change.
03814
03814
貝 154.2.3
VN-F1B83
TS 9 · IDS
Glyph design
This form is similar to many others, such as V2-6E38 𠂫. The form that looks more like 乃 would also be acceptable, but there is no point to do this unless we change the 50 odd glyphs, mostly encoded, to use it.
00023
00023
丨 2.7.1
VN-F20B5
TS 9 · IDS &S2-01;𭟯
Glyph design
While some manuscripts show a dot, most have a longer stroke for this element, so in our normalized form, it looks more like 丨 than a "dot". Here is another character with the same element on the left;


Other

SnImage/SourceComment TypeDescription
00420
00420
赤 155.4.1
TC-5644
TS 11 · IDS
Other
Similar to TC-596D, which is also similar to U+8D67 赧
02611
02611
疒 104.6.4
VN-F1DE1
TS 11 · IDS
Other
VN-F1DE1 is a Nôm Tày character meaning stale, spoiled and it is read "nẩu" from the phonetic 𱜢 < 鬧 (nháo). While the left side is similar, as the radical suggests, there is no relationship semantically to U+2DC59, a variant of U+3D11 㴑, (which in turn is a variant of 逆) meaning "against, opposite". The Shuowen cited here uses U+2DC59 to explain the meaning of 洄 as "against the current". This is very different from VN-F1DE1


Data for Unihan

SnImage/SourceComment TypeDescription
00136
00136
人 9.5.2
SAT-06123
TS 7 · IDS 𰀊
Semantic variant
俜 (U+4FDC)
00413
00413
厶 28.8.5
二 7.8.5
SAT-06514
TS 10 · IDS 𠓞
Semantic variant
According to the evidence, this is a variant of U+9F4A 齊
00160
00160
人 9.8.3
SAT-06519
TS 10 · IDS 𭾜
Semantic variant
偱 (U+5071)
00422
00422
又 29.9.5
TC-5336
TS 11 · IDS
Semantic variant
U+53E0 叠
00418
00418
又 29.6.4
UK-20294
TS 8 · IDS
Simp variant
U+2BA4E 𫩎
00275
00275
冫 15.3.3
UTC-03235
TS 5 · IDS
Unihan data
Japanese reading shown is さむし, which is the classical Japanese form. In modern Japanese this would be さむい. The latter might be preferred for the value of kJapanese.