In principle I think ⿱宀总 could be unified with 𥦗 (U+25997), and represented using IVS. However, I think there is merit in Wang Xieyang's argument that ⿱宀总 and ⿱穴𢗀 are intrinsically different characters because they have different radicals. Ken's argument (#14618) also makes me inclined to take a pragmatic view of the encoding of this character. Therefore I now support encoding, and as sufficient evidence has now been supplied, GDM-00313 should be moved back to the M-set.
Oppose unification to 𣝼 (U+2377C) as there is no liding rule, and a new UCV for ⿳自㓁𠔽 ~ ⿱鳥𠔿 ~ ⿱鳥囚 is not reasonable. Sufficient evidence has been provided that this is a stable variant which should be encoded.
Oppose unification. The inclusion of ⿱殸⬚ in UCV 312d seems unreasonable as it is not an example of "differences in relative length of strokes" (j-2). UCV 312d should only cover ⿱𣪊⬚ and ⿹𣪊⬚, and ⿱殸⬚ should be removed from the rule.
Other editions of 《禪要經》 give "衰酢", therefore SAT-06753 should be a variant of U+8870 衰 or U+2E571 𮕱 (a variant of 衰). Therefore suggest ad hoc unification to either U+8870 or U+2E571.
Oppose unification as the right side of UK-20573 bears no resemblence to 曶. Unified glyph forms should be easily recognizable as variant forms of the same character.
There are three evidences showing ⿱臨玉 but only one showing 𤪋, therefore ⿱臨玉 seems likely to be the correct form in this case. We can accept either ad hoc unification of UK-20941 with 𤪋 (U+24A8B), or a new ucv if there are other examples of 臨~𰯲 glyph variation.
The purpose of the second radical is to aid discoverability, but it is unlikely that anyone would expect to find this character under 言 radical, so agree that the second radical is not required in this case.
Keep as Radical 74.0 (月) as the association with '腦' (meat radical) is only one modern author's interpretation, and not necessarily reliable. My own interpretation is that the character represents 'the moon through the window'.
Secondary radical seems unnecessary as this character is the simplified form of the common character U+993D 餽, so the primary radical should be obvious.
PnP section 2.2.1 d. (5) c): "If the technically correct (aka semantic) radical for an ideograph hampers its discoverability, or is region-dependent, the primary radical shall be assigned as though made by an ideograph expert who is neither a specialist in the history of the Han script nor familiar with ideograph etymology. The technically correct radical can be assigned as a second radical."
Therefore use Radical 82.0 (毛) as the primary radical for this character, and add 162.0 (辵) as a second radical if considered necessary.
The text in the evidence seems to be related to the quotation given in the entry for 《續方言》 in 《四庫全書總目提要》 which has "𧐒𧐎" instead of "䲇⿰魚覓". Therefore ⿰魚覓 can be considered to be a variant of U+2740E 𧐎. I think it is OK to move it back to the M-set.
Probably a variant of 蜇, not a modern typo. It would be helpful to see the original source for the text given in the evidence, but I think it is OK to encode on the given evidence.
It seems very probable that ⿰鱼𬶨 is a mistake for ⿰鱼暨 (simplified form of U+29F59 𩽙) in the two modern evidences given (someone just added a fish radical to the left of 𬶨 without noticing that the fish radical form of the character has 暨 on the right). If ⿰鱼𬶨 is correct we would expect ⿰魚鱀 to exist as well, but it does not.
The two evidences provided are both general texts which just mention Baiji dolphin in passing, so they cannot be considered to be authoritative sources. Please try to find additional evidence for either ⿰鱼𬶨 or ⿰鱼暨 from a zoological source that specifically discusses the Baiji dolphin. If there is no additional evidence then the character should be withdrawn. If additional evidence shows ⿰鱼暨 then suggest to change IDS and glyph to ⿰鱼暨.
What does "箕裘方~起" mean? It looks like ⿰爵鳥 should be a corruption of some reasonably common character, but it is not clear to me what character it should be. An internet search does not come up with any other examples of this phrase.
The glyph form is suspicious as ⿱𥫗鼎 does not exist as an independent character or as a component in any other character. I strongly suspect that ⿰金⿱𥫗鼎 is a mistake for U+28BB0 𨮰 zhá is it has the same reading (士戛 is also the fanqie for 鍘 zhá) and same meaning (a type of knife used to cut hay).
Therefore I suggest to postpone pending additional evidence, or withdraw.
《世説新語校箋》(中華書局,1984年) p. 75 also gives U+93A9 鎩 which should not simplify to ⿰钅刹. Without any more evidence for ⿰钅刹, I suggest that GKJ-00531 is withdrawn.
The list of eight rodents given in Evidence 2 (鼸鼶鼮鼣鼭鼤䶅䶈) corresponds to the list of eight rodents given in 説略
where the second character (鼶) is written as ⿺鼠虎. Evidence 2 notes that the original form of 鼶 is ⿰鼠秃 which makes little sense as it is not close phonetically or graphically. Based on the new evidence, the original form of 鼶 is written as ⿺鼠虎, and ⿰鼠秃 is a mistake for ⿺鼠虎.
I suggest to change IDS and glyph to ⿰鼠虎 to match the new evidence.
The evidence quotes a poem by the Song dynasty poet Su Zhe 蘇轍. There are two versions of the poem, one with U+9E96 麖 jīng and one with U+9E8F 麏 jūn.
蘇轍《𡗝中詩》: 江流日益深,民語漸以變。遙想彼中人,狀類麖/麏鹿竄。
In the evidence ⿸鹿吉 should be an error for 麏. It cannot be considered a variant as 吉 is entering tone, whereas 麖 and 麏 are both level tone, so it would not fit the tonal pattern of the poem.
Therefore suggest to postpone for additional evidence.
The text given in the evidence seems to be a mistake. Kangxi Dictionary p. 1276 states: "説文本作⿰麃邑"
And Shuowen Jiezi (Zhonghua Shuju 1963, p. 132) does indeed have an entry for U+287BB 𨞻, which is of course written as ⿰麃邑 in seal script.
So ⿰鹿邑 is a mistake for ⿰麃邑 which is the archaic form for U+287BB 𨞻. As all characters with rhs 阝 can be said to be written with 邑 in the Shuowen dictionary, I do not think that it is a good idea to separately encode any more variant forms of characters with 邑 for 阝. Therefore I suggest a new UCV for 阝~邑, and withdraw GKJ-00693.
Evidence 1 quotes a poem by the Song dynasty poet Su Zhe 蘇轍 which has U+9E96 麖 jīng or U+9E8F 麏 jūn in this position (see GKJ-00679), so ⿸鹿吾 is probably a mistake for U+9E8F 麏. The other three evidences are OK.
None of the evidences show the glyph form very clearly, so it is not certain that the component is 免. I think that they are all intended to be forms of U+9E91 麑. Therefore suggest to postpone for better evidence.
In response to Tao Yang, I guess it is a question of whether we should encode known error forms noted in critical editions of texts, or whether cited error forms such as this should be represented as PUA characters. (I personally am happy to encode error forms if they are cited in printed editions.)
The evidence is not sufficient as the source is not very reliable, and has many glyph errors. The evidence quotes a long-lost book (食經 by 崔禹錫), so it is unclear what the actual source for the character is. As there is no G-source for U+2CD5C 𬵜, it is not obvious that the original source referred to U+2CD5C 𬵜. I therefore suggest to postpone pending additional evidence that GKJ-00768 is the simplified form of U+2CD5C 𬵜.
This is obviously a corrupt form of some common character as the text is using it as a size comparison that readers are expected to understand. I strongly suspect it is just a corrupt form of U+8C4C 豌 (variant form ⿰豆⿱宀外 with 宀 lost). Therefore suggest withdaw or pending.
Agree with Comment #13582 that ⿰骨彡 is probably a corrupt form of 骸. 《三農紀》 is not a very reliable source, with many glyph errors, so I suggest to postpone pending additional evidence.
It looks like this character should be a corruption of some common character, but it is not obvious to me what it should be. Therefore I suggest to postpone for additional evidence.
I very much think that the character with 虫 bug (!) must be a mistaken form of U+8D12 贒, which is most likely to be the character the living person actually uses. Can you please show the original handwritten evidence provided by the living person who uses this character so that we can see if they really do want to be a 'bug' 虫 or 'loyal' 忠.
Looks like the same character (unifiable per UCV 401) used in the description of Brazil on this 1602 world map (from {https://twitter.com/egasmb/status/1650216295198072835 twitter}):
The Han script sometimes incorporates characters and symbols from other writing systems, for example 𣥬𤔞𧳤𠐂 are corrupt forms of the Tangut characters 𘜶𗵐𘏨𘔭 recorded in a Song dynasty numismatic work; U+303C3 𰏃 is a corrupt form of a Khitan character; U+2CF01 𬼁 is derived from the dram sign ʒ (U+0292); U+2CF04 𬼄 is derived from the ounce sign ℥ (U+2125); and WS2021 UTC-03225 is derived from the pound sign ℔ (U+2114). In the sources for UK-20570 etc. corrupt forms of certain syllables of the Siddham script which are used in mantras have been treated as Han characters, so it is appropriate to encode them as Han ideographs. Note that the Siddham script is already encoded since Unicode 7.0, so encoding these particular Han-ified Siddham syllables does not affect the use of Unicode Siddham.
New evidence
Via Sven Osterkamp (@schrift_sprache) on twitter, these are the nine characters in 《字孳補》 (the source used for the glyph forms in the UK submission), which together form the Mantra of Ratnasikhine Tathagata (寶髻如來護生咒), i.e. 唵縛嚩悉波羅摩尼莎訶 which corresponds to "oṁ va svara maṇi svāhā" in Sanskrit. The 7th character corresponds to Sanskrit ṇi, which matches the evidence given by Huang Junliang in #13137, thus I think both glyph forms represent the same Sanskrit transcription character, so it should be OK to change the glyph form for UK-20570 based on the new evidence.
New evidence
Via Edward W. (@edwardW2) on twitter, this is an example of the Mantra of Ratnasikhine Tathagata (寶髻如來護生咒) from 《通天曉》(1841 ed.), here noted as corresponding to "唵縛嚩悉波羅摩尼莎訶" (i,e, "oṁ va svara maṇi svāhā"). The glyph form for the 7th character matches the form shown in the evidence from Huang Junliang (also the glyph forms for the 1st, 3rd, and 9th characters match the correct forms suggested by evidence from Huang Junliang).
The Han script sometimes incorporates characters and symbols from other writing systems, for example 𣥬𤔞𧳤𠐂 are corrupt forms of the Tangut characters 𘜶𗵐𘏨𘔭 recorded in a Song dynasty numismatic work; U+303C3 𰏃 is a corrupt form of a Khitan character; U+2CF01 𬼁 is derived from the dram sign ʒ (U+0292); U+2CF04 𬼄 is derived from the ounce sign ℥ (U+2125); and WS2021 UTC-03225 is derived from the pound sign ℔ (U+2114). In the sources for UK-20572 etc. corrupt forms of certain syllables of the Siddham script which are used in mantras have been treated as Han characters, so it is appropriate to encode them as Han ideographs. Note that the Siddham script is already encoded since Unicode 7.0, so encoding these particular Han-ified Siddham syllables does not affect the use of Unicode Siddham.
"囉𪡈{⿰口聃}" is the transcription of the name of the Canton-based British translator Robert Thom (1807–1846). The first character is here written as ⿰口𣆀 with a missing stroke, but ⿰口聃 should be the correct form of the character.
Re #10956, the glyph shown in the evidence appears to be an imperfect form of ⿴囗峦, with a damaged final stroke. However, the final horizontal stroke is still visible as a thin line below 山, so ⿴囗峦 should be correct, and no change to the glyph is required.
Font glyph component 乃 is strange, and does not match the evidence. Please confirm whether this is the preferred Vietnam form of 乃, and if not then correct the font glyph.
I would be interested in knowing how common U+25997 𥦗 is. The source is is GHZ-42731.08, but 《汉语大字典》(第二版) p. 2922 does not give any sources, and merely states 同“窓(窗)”.
As L F Cheng says (#14310), maybe the dictionary form is a normalization of ⿱宀总. If 𥦗 is not a common form, then China can consider changing the glyph for U+25997 to ⿱宀总.
⿱不貴 is not encoded. My strong opinion is that every Chinese simplified character that we encode should have a corresponding traditional form, and if it does not then we should simply add the corresponding traditional form to the same working set.
Thank you Eiso for the helpful discussion. It make sense that ⿰爵鳥 is a variant of 鵲, but it is odd that there are no other attestations of this character that I can find.
Strongly support not encoding as cjkui simple ligatures of two characters (i.e. where two characters are simply squeezed together into a single character space for aesthetic or practical purposes) where there is no semantic difference between the ligatured and unligatured forms.
Yes, it is odd. I cannot find any evidence for an old or variant form of 顯 which looks like this. My suspicion is that it is simply a corrupt form of 費.
I suggest not to encode UTC-03224 as a cjkui as it is a simple ligature of the two characters 敕 and 令 with no difference in meaning or reading. I think it is best to represent simple ligatures such as this (where the two ligatured characters maintain their original glyph forms) as ZWJ-ligatures (or using some other mechanism as discussed in L2/23-073).
Note that there are several different encoded forms of 敕 (勅勑𠡠敕𱡘𢽟) which could all potentially form ligatures with 令.
Eiso (#14608) is absolutely correct that 蓋 and 盖 are a T/S pair of characters, but are not a T/S pair for components. 蓋/盖 are defined in Table 1 of the General Tables of Simplified Characters (简化字总表), for T/S pairs which are explicitly not applicable to components "不作简化偏旁用的简化字".
However, the kSimplifiedVariant and kTraditionalVariant fields defined in Unihan_Variants.txt do not correspond to the officially-defined set of simplified/traditional characters, but include all sorts of non-standard and de facto simplifications, as well as treating old style and new style orthographic forms (新旧字形) as if they were traditional/simplified pairs (e.g. 呂/吕 are defined as a T/S pair in Unihan, even though they are not a 繁體字/简体字 pair but a 舊字形/新字形 pair).
It is no surprise that Unihan treats 蓋 and 盖 as T/S components for the character pairs 㯼/𣙥, 䡷/𰺡, 壒/𭏦, 礚/𥕤, 鑉/𫠁:
Therefore, based on the current loose definitions of kSimplifiedVariant and kTraditionalVariant in Unihan, it is reasonable to add GZ-1742502 as a simplified variant of U+2103D 𡀽 (also U+6FED 濭 and U+31A1A 𱨚).
However, I think that the current Unihan definitions of simplified and traditional variants is very unsatisfactory, and needs to be thoroughly revised. My opinion would be to remove all old/new orthographic forms (such as 呂/吕), and define new keys for them (e.g. kOldForm and kNewForm). I would also remove kSimplifiedVariant from all characters which are not explicitly or implicitly defined as simplified characters in the latest official table of simplifications, and replace them with some other key.
IRG Working Set 2021v5.0
Source: Andrew WEST
Date: Generated on 2026-01-19
Unification
Maybe unifiable with 亝 (U+4E9D)?
There are three evidences showing ⿱臨玉 but only one showing 𤪋, therefore ⿱臨玉 seems likely to be the correct form in this case. We can accept either ad hoc unification of UK-20941 with 𤪋 (U+24A8B), or a new ucv if there are other examples of 臨~𰯲 glyph variation.
Attributes
Therefore use Radical 82.0 (毛) as the primary radical for this character, and add 162.0 (辵) as a second radical if considered necessary.
Evidence
⿰身犬 is possibly a variant/mistake for ⿰身大 which also has the reading māng according to Kushim Jiang. Additional evidence would be helpful.
The two evidences provided are both general texts which just mention Baiji dolphin in passing, so they cannot be considered to be authoritative sources. Please try to find additional evidence for either ⿰鱼𬶨 or ⿰鱼暨 from a zoological source that specifically discusses the Baiji dolphin. If there is no additional evidence then the character should be withdrawn. If additional evidence shows ⿰鱼暨 then suggest to change IDS and glyph to ⿰鱼暨.
The word ⿰畫/画鳥鶘 seems to be a variant of 鵡鶘.
Are the two pieces sufficient to consider this a stable error?
一切經音義 (T2128) 卷34:
Therefore I suggest to postpone pending additional evidence, or withdraw.
where the second character (鼶) is written as ⿺鼠虎. Evidence 2 notes that the original form of 鼶 is ⿰鼠秃 which makes little sense as it is not close phonetically or graphically. Based on the new evidence, the original form of 鼶 is written as ⿺鼠虎, and ⿰鼠秃 is a mistake for ⿺鼠虎.
I suggest to change IDS and glyph to ⿰鼠虎 to match the new evidence.
Evidence 1 and 4 should be mistake for U+2A2A8 𪊨 as 《説文解字》 gives 麂 as a variant of 𪊨.
Evidence 2 is a mistake for U+2A2A8 𪊨 (see same text in 兩漢博聞).
Evidence 3 is a mistake for U+9E90 麐 as 《宋史》卷218 gives the name "希麐".
Therefore, ⿸鹿日 is a mistake for U+232F4 𣋴.
蘇轍《𡗝中詩》: 江流日益深,民語漸以變。遙想彼中人,狀類麖/麏鹿竄。
In the evidence ⿸鹿吉 should be an error for 麏. It cannot be considered a variant as 吉 is entering tone, whereas 麖 and 麏 are both level tone, so it would not fit the tonal pattern of the poem.
Therefore suggest to postpone for additional evidence.
Based on this evidence, ⿸鹿心 should be a mistake for U+9E83 麃.
And Shuowen Jiezi (Zhonghua Shuju 1963, p. 132) does indeed have an entry for U+287BB 𨞻, which is of course written as ⿰麃邑 in seal script.
So ⿰鹿邑 is a mistake for ⿰麃邑 which is the archaic form for U+287BB 𨞻. As all characters with rhs 阝 can be said to be written with 邑 in the Shuowen dictionary, I do not think that it is a good idea to separately encode any more variant forms of characters with 邑 for 阝. Therefore I suggest a new UCV for 阝~邑, and withdraw GKJ-00693.
Therefore I suggest to withdraw GKJ-00999.
Also provides additional evidence for UK-20785, UK-20786, nd UK-20787.
A new UCV for 闊~濶 would be helpful.
《江西通志》(清雍正刊本)卷 55:
《江南通志》(清乾隆元年刊本)卷177:
I suggest China consider a horizontal extension for ⿱𦥯玉.
I think "{⿰口丹}𠺮哪" is the name of a fake British ship in a fake report.
"囉𪡈{⿰口聃}" is the transcription of the name of the Canton-based British translator Robert Thom (1807–1846). The first character is here written as ⿰口𣆀 with a missing stroke, but ⿰口聃 should be the correct form of the character.
Glyph Design & Normalization
Other
As L F Cheng says (#14310), maybe the dictionary form is a normalization of ⿱宀总. If 𥦗 is not a common form, then China can consider changing the glyph for U+25997 to ⿱宀总.
Note that there are several different encoded forms of 敕 (勅勑𠡠敕𱡘𢽟) which could all potentially form ligatures with 令.
Data for Unihan
However, the kSimplifiedVariant and kTraditionalVariant fields defined in Unihan_Variants.txt do not correspond to the officially-defined set of simplified/traditional characters, but include all sorts of non-standard and de facto simplifications, as well as treating old style and new style orthographic forms (新旧字形) as if they were traditional/simplified pairs (e.g. 呂/吕 are defined as a T/S pair in Unihan, even though they are not a 繁體字/简体字 pair but a 舊字形/新字形 pair).
It is no surprise that Unihan treats 蓋 and 盖 as T/S components for the character pairs 㯼/𣙥, 䡷/𰺡, 壒/𭏦, 礚/𥕤, 鑉/𫠁:
U+3BFC kSimplifiedVariant U+23665
U+23665 kTraditionalVariant U+3BFC
U+4877 kSimplifiedVariant U+30EA1
U+30EA1 kTraditionalVariant U+4877
U+58D2 kSimplifiedVariant U+2D3E6
U+2D3E6 kTraditionalVariant U+58D2
U+791A kSimplifiedVariant U+25564
U+25564 kTraditionalVariant U+791A
U+9449 kSimplifiedVariant U+2B801
U+2B801 kTraditionalVariant U+9449
Therefore, based on the current loose definitions of kSimplifiedVariant and kTraditionalVariant in Unihan, it is reasonable to add GZ-1742502 as a simplified variant of U+2103D 𡀽 (also U+6FED 濭 and U+31A1A 𱨚).
However, I think that the current Unihan definitions of simplified and traditional variants is very unsatisfactory, and needs to be thoroughly revised. My opinion would be to remove all old/new orthographic forms (such as 呂/吕), and define new keys for them (e.g. kOldForm and kNewForm). I would also remove kSimplifiedVariant from all characters which are not explicitly or implicitly defined as simplified characters in the latest official table of simplifications, and replace them with some other key.
Submitter Request
Consider adding a level 2 UCV for 田 and 由.