I suspect that the proposed character (⿰虫㥯) should be U+2E524 𮔤 which is written as ⿰䖝𢚩 in the entry for 石𧉧 in the 閩書 evidence provided by Huang Junliang. Therefore unify to 𮔤 (U+2E524) unless further evidence is forthcoming.
It seems to be an error for 䏣 in the head character only, and the entry actually uses 䏣 (肉中蟲謂之䏣蝇). Therefore this evidence is insufficient for encoding. Suggest unifying to 䏣.
This character appears to be an error for 舓 (this error form is also found in 蒙古字韻). As it is a common mistake to miswrite 易 as 昜 and vice versa, perhaps we should have a UCV rule for 易 and 昜 when cognate.
Given that any character with a grass radical could be written with 艸 instead, I think it would be a good idea to define a UCV for 艸 = 艹 where cognate.
U+2C7CF 𬟏 is the modern transcription form of an oracle bone script character that means "autumn", and the original oracle bone script character it represents is thought to be a drawing of a grasshopper (which is active in autumn). The grass radical shown in the G-source glyph for U+2C7CF should correctly be 卝 as it actually represents the grasshopper's antennae.
On the other hand, UK-20437 is used in the evidence provided as an alternate way of writing the character 鱉 'softshell turtle' in the specific word 紫鱉, an ancient name for a type of plant. In this case it is obvious that the character is composed of a grass radical above a turtle.
Thus UK-20437 and U+2C7CF are non-cognate, and have significantly different glyph forms, and so cannot be unified.
Agree to unify UK-20682 / SN03761 to this character. The glyph shown for UK-20679 appears to be the correct form (cf. new evidence for UK-20704 and UK-20766).
In other texts, ⿰口𫆀 is written as 㖿 in the same sequences of syllables (example shown below). Therefore, unify to 㖿 (U+35BF) and create a new UCV for 𫆀 = 耶.
Here is the SAT evidence (SAT No. 1822 倶舍論疏 T1822_.41.0477c03 Note ⓴)
Based on this, the two characters appear to be non-cognate, therefore should not be unified according to the Non-cognate Rule ("Ideographs with different glyph shapes that are unrelated in historical derivation (non-cognate characters) are not unified no matter how similar their glyph shapes may be").
UK-20925 has reading hù according to 《中华千家姓氏录》, so is not cognate with 𪪘 (U+2AA98) (reading yì). Therefore we oppose unification of UK-20925 with U+2AA98.
Evidence is unclear. I cannot make out whether the right side is 𭖔 or not, and as 𭖔 is not a component used in any other character it is not obvious that it should be 𭖔 here. Please provide additional evidence showing clearer images of the character.
I really don't think this is sufficient evidence for encoding. I would like to see evidence of actual use in printed text, demonstrating that there really is a need to encode this character. If IRG considers that local-use vulgar simplifications can be encoded on the basis of unverified second-hand evidence such as this, then there are hundreds and hundreds of Part 2 second stage simplified characters for which there is a stronger case for encoding.
The new evidence shows an error form. The quoted poem actually has "白鳥翯翯". My personal rule is not to accept for encoding error forms only attested in a single edition (cf. my comment to UK-20004). I would only accept this character for encoding if additional evidence could be supplied that demonstrates that it is not an error form or that it is an error form that has been transmitted widely enough to be considered a "stable error".
Seems to be an error form for 蜩 (cf. https://baike.baidu.hk/item/%E8%9C%A9%E8%9F%A7/4697910). DO not encode without additional evidence that the character is correct or is a stable error.
I would prefer not encode an obvious graphic variant of U+8776 on the basis of this single piece of evidence. Are there other sources which show this glyph form?
The evidence appears to show that these three characters are vulgar forms of some other character, but the extract does not show what it is. Can the full page for the evidence be provided so we can better understand the meaning of these characters ?
Also, why are ⿰至及 and ⿰至支 not also proposed for encoding? It seems pointless to encode ⿰至戾 but not the two other characters in the same extract.
A quick search on the internet confirms that "黄芪" is also known as "芰草", therefore it is extremely likely that the proposed character is an error form for 芰. In this light, the evidence shown is not sufficient for encoding, and the character should be withdrawn.
As far as I can tell, the original quote given in all other sources is 至於桑野,是謂晏食。至於衡陽,是謂隅中。What is the reason that 至 is here written as ⿰至竟? Is this a weird mistake, or is there some reason for writing it this way.
From context this appears to be an error form of 葉. 《三農紀》 seems to have many error forms, and so is not a reliable source. I would not like to see this character encoded on the basis of this evidence alone.
It seems that both 𤭌 and ⿰第瓦 are used in different sources, and as 弟 and 第 are not unifiable, it is acceptable to encode ⿰第瓦 based on the original evidence and the additional evidence provided by Huang Junliang.
The original TCA evidence is not sufficient for encoding. The evidence provided by Lee Collins is sufficient for encoding if proposed by Vietnam, but should not be used to justify encoding by TCA.
The original evidence provided by TCA is not sufficient for encoding, and the additional evidence provided by Eiso Chan is not sufficient to support the TCA encoding of this character.
The original TCA evidence is not sufficient for encoding, and the additional evidence provided by Eiso Chan is not sufficient to support the encoding of this character by TCA.
The original evidence provided by TCA is not sufficient for encoding.
The additional evidence provided by Tao Yang is questionable as the correct title for chapter 4 of this novel is "蟠龍嶺群英相會". Therefore, it seems that the first two characters in this edition are corrupted, and the additional evidence should be rejected.
Agree with Huang Junliang that this should be a corrupted form of U+97AE. Therefore WITHDRAW pending additional evidence that this is not an error form or that it is a stable error.
Agree with comment by Huang Junliang that this is an error form for U+21F8B 𡾋. Also note discussion of possible UCV for 宀~山 for UK-20577. Therefore we WITHDRAW this character,
Agree with Ng Hou Man that this is an error form for U+8A86 誆. See the quotation ("因而吏书等役亦各乘机诓索,诛求万状,在在有之") given at http://rdbk1.ynlib.cn:6251/Qw/Paper/513986
The comment by Huang Junliang is sufficient to throw doubt on the correctness of the proposed character. The form ⿱𰏒心 shown in the original evidence could easily be a mistake for U+2263F 𢘿, so it would be best not to encode this character based on the single evidence provided. Therefore we WITHDRAW this character pending additional evidence.
All other sources that list synonyms for cicada give ... 蛥蚗, 螇螰, 蟪蛄 ... suggesting that ⿰虫産 is an error for 螰, although the two characters are not very similar.
Søren Egerod (易家乐): "The Lungtu Dialect: A Descriptive and Historical Study of a South Chinese Idiom" (《隆都方言》), Copenhagen: S. L. Møllers Bogtrykkeri, 1956, p. 87
After U+3023E (UK-01666) was encoded, it was brought to our attention that the evidence cited in Fig. 349 above incorrectly transcribes the character, and that character is actually written as ⿰口𪁪 (and of course 𪁪 mǎng is a much better transcription for Sanskrit māṃ than 𩾵).
It may be disruptive to change the glyph for U+3023E at this stage, which is why we proposed ⿰口𪁪 (UK-20393) as a new character. If IRG thinks it is more appropriate, we can withdraw UK-20393 and change the glyph for U+3023E to ⿰口𪁪. In any case, the simplified form ⿲口尨鸟 (UK-20394) is a new character, and should be encoded based on the evidence provided.
The available image from the 四庫全書 edition (https://ctext.org/library.pl?if=en&file=5647&page=8) is even less clear. However, biographies for these two brothers are given in 南齊書 (https://zh.wikisource.org/wiki/%E5%8D%97%E9%BD%8A%E6%9B%B8/%E5%8D%B739), where their names are written as 劉瓛 and 劉璡. In the cited text the jade radical is changed to the stone radical, so it is clear that the character in question must be ⿰石獻.
The late Qing manuscript transcription of the Juyongguan inscriptions held at the University of Manchester John Rylands Library transcribes the character as ⿰口𫊗 with the expected dot.
New evidence
In "Chü-Yung-Kuan: The Buddhist Arch of the Fourteenth Century A.D. at the Pass of the Great Wall Northwest of Peking" (Kyōto: 1957; Murata Jirō ed.) the character is inaccurately transcribed as U+3615 㘕, just missing the 罒 element.
Jerry You has a useful overview of the text of the 《亳州老君碑》 on his blog at http://blog.ccamc.org/?p=365 showing images of three versions of the text inscribed on stone, but it is difficult to find authoritative printed evidence of the text.
New evidence
董沛文 主编,王燕喜 编校:《老君碑留古字解 玉皇心印妙经直解》 (北京:宗教文化出版社,2013年) [ISBN 978-7-80254-705-6] p. 18:
The quality of the image is not great, but I think it is obvious that the character's structure is ⿱山地, and that the horizontal line seen in the character shown in columns 1 and 3 is not part of the character, but a printing artefact. The character shown in column 5 does not have this printing artefact.
Looks to me like the Buddhist term 三昧 with mouth radicals added. The middle component is clearly 日 and could not be anything else. Of course, it is desirable to know the reading and meaning of every proposed character, but that is not always possible, and in the case of mystical syllables used in Buddhist and Daoist texts there may be no specific meaning. The primary purpose of encoding the Daoist-usage characters proposed by the UK is to enable the representation of these particular texts in electronic format, and it is not necessary to know the meaning or reading of these characters to do this. It should be noted that many already-encoded CJK unified ideographs characters, as well as some characters in other extinct scripts such as Egyptian Hieroglyphs, Tangut, and Khitan Small Script, have unknown meaning and pronunciation. The primary criterion for encoding characters is evidence of usage, and we believe that the evidence we have supplied for the Daoist-usage characters is very solid.
Of course, it is desirable to know the reading and meaning of every proposed character, but that is not always possible, and in the case of mystical syllables used in Buddhist and Daoist texts there may be no specific meaning. The primary purpose of encoding the Daoist-usage characters proposed by the UK is to enable the representation of these particular texts in electronic format, and it is not necessary to know the meaning or reading of these characters to do this. It should be noted that many already-encoded CJK unified ideographs characters, as well as some characters in other extinct scripts such as Egyptian Hieroglyphs, Tangut, and Khitan Small Script, have unknown meaning and pronunciation. The primary criterion for encoding characters is evidence of usage, and we believe that the evidence we have supplied for the Daoist-usage characters is very solid.
《正一天師申文發奏科儀》 has the word "𡄼⿰口㘞" (cf. version at https://www.daoisms.org/article/sort026/info-21366_2.html where the two characters are transcribed as "(口虩)(口㘞)"). This "⿰口㘞" should be the same character as UK-20684.
In Source 1 the first character is not written carefully, but we believe it is a handwritten form of ⿸尸盖 not ⿸尸⿱至皿. In Source 2 the first two characters are not clear, but the third character is very clearly ⿸尸盖. We believe that the two clear occurences of ⿸尸盖 in Source 1 and the one clear occurence of ⿸尸盖 in Source 2 are sufficient evidence for encoding ⿸尸盖.
The evidence appears to show an extract from a code chart, which is insufficient evidence for encoding. Please provide evidence of the character in actual use in printed text.
Given the context, the expected glyph form should be ⿰月氩. In cases like this, where use in simplified chinese text is to be expected, I would prefer to see both the traditional and simplified forms encoded.
The 2.0 glyph has been changed to ⿱微皿. This is wrong, and should be changed back to ⿱微血 (note that the additional evidence provided by Persikov Sergeevich has a glyph error; from context in both evidences the character should have a blood radical).
ISO/IEC 10646 and the Unicode Standard are not glyph registers. The glyph should be normalized following TCA conventions regardless of the use of one individual person.
ISO/IEC 10646 and the Unicode Standard are not glyph registers. The glyph should be normalized following TCA conventions regardless of the use of one individual person.
ISO/IEC 10646 and the Unicode Standard are not glyph registers. The glyph should be normalized following TCA conventions regardless of the use of one individual person.
Personal name usage is no reason not to normalize the glyph to conform to TCA conventions. I still believe that the grass radical should be normalized.
ISO/IEC 10646 and the Unicode Standard are not glyph registers. The glyph should be normalized following TCA conventions regardless of the use of one individual person.
ISO/IEC 10646 and the Unicode Standard are not glyph registers. The glyph should be normalized following TCA conventions regardless of the use of one individual person.
The character is evidently a variant of 范(範) in the word 範銅 'cast bronze'. Therefore, of the three variants shown in the additional evidence produced by Huang Junliang, ⿱氾土 would seem to be most correct. We therefore suggest changing the glyph and IDS to ⿱氾土.
We agree with Huang Junliang and the editors of the 上海古籍出版社 1999 edition of 藝文類聚 that the right side should be ⿱宀夕. Therefore no glyph change is required.
There is no reason to replicate the exact glyph form given in the source evidence, so we normalize the 'mouse' radical to the standard form of the character.
It is difficult to be certain from the rubbing and available photographs whether there is a dot or not in the inscription. In any case, normalization of the right side to 𫊗 is acceptable and preferable because this is the expected form of the character as a variant of U+3615 㘕 (cf. the transcription given in the first additional evidence).
The glyph form shown in the additional evidence provided by Tao Yang seems preferable to the glyph form shown in the original evidence. Therefore, consider changing the glyph to match the new evidence.
Elsewhere in this edition of 《梵音斗科》 the character 𤚥 (U+246A5) is written as ⿰牟𫩧 (as shown below). If we accept that ⿰牟𫩧 is a unifiable variant of U+246A5 then it makes sense to normalize ⿲口牟𫩧 to ⿰口𤚥 with a new UCV for 含 = 𫩧.
According to the new evidence (which shows this character in several places), the character should be ⿰言⿵冂?, and as the bottom part is identical to UK-20679, this would seem to be the correct glyph form. Therefore suggest changing the glyph to match the new evidence.
Shape of 羊 radical does not match the evidence, but it appears that Vietnam convention is to use the straight 羊 radical (e.g. U+7F9D 羝, U+2636B 𦍫, U+263AC 𦎬, U+2B155 𫅕, and WS217-03523 V-F1AE0; but note that U+7FB6 羶 has a bent ⺶ radical).
It seems that Vietnam convention is to use the form of the 羽 radical with two sloping strokes (e.g. WS2021-03177 VN-F1CA9 and WS2017-03545 VN-F1775). Consider modifying the glyph to match this convention.
The best solution would be to add an extra character to WS2021, and encode both simplified and traditional forms in one go. If that is not acceptable, then China should change the glyph to the simplified form, matching their original evidence.
Yes, indeed, it is that character with a mouth radical. As you say, it may well be a mantra-final syllable similar to Buddhist hūṃ. I hope that we can find additional evidence explaining the usage and reading of ⿻丅口.
The reading of puh for this character is unexpected. This hymn is available on several websites (e.g. http://hymn.pct.org.tw/Hymn.aspx?PID=P2011080400003) where the unencoded character is represented as "puh", so the reading "puh" should be correct. But why has a character with a grass radical and 吐 phonetic been created to represent the reading "puh"? Is this a mistake?
IRG Working Set 2021v2.0
Source: Andrew WEST
Date: Generated on 2026-01-19
Unification
Single example of an idiosyncratic way of writing 窓, not sufficient evidence for encoding. Unify with 𥦗 by UCV #22.
Unify to 𮬮 (U+2EB2E)
I suspect that the proposed character (⿰虫㥯) should be U+2E524 𮔤 which is written as ⿰䖝𢚩 in the entry for 石𧉧 in the 閩書 evidence provided by Huang Junliang. Therefore unify to 𮔤 (U+2E524) unless further evidence is forthcoming.
Unify to 𰲘 (U+30C98) 'female tiger' which is the correct transcription form of the oracle bone script character.
Unify with 糣 (U+7CE3), and add a new UCV for 朁~替.
Unify to 𦵻 (U+26D7B) and extend UCV #388 to cover 𣈆 = 晉 = 𦵻.
It seems to be an error for 䏣 in the head character only, and the entry actually uses 䏣 (肉中蟲謂之䏣蝇). Therefore this evidence is insufficient for encoding. Suggest unifying to 䏣.
This character appears to be an error for 舓 (this error form is also found in 蒙古字韻). As it is a common mistake to miswrite 易 as 昜 and vice versa, perhaps we should have a UCV rule for 易 and 昜 when cognate.
Unify with 㻸 (U+3EF8), and add a new UCV for 朁~替. See also SAT-06893 and TE-2F54.
Unify to 𦂯 (U+260AF)
Unify with 𦅦 (U+26166), and add a new UCV for 朁~替. See also SAT-06893 and TD-6D41.
Unify to 𫲛 (U+2BC9B)
Agree to unification with 𥊑 (U+25291) if new UCV is defined for 曼 = 𭦟 = 𭦗.
On the other hand, UK-20437 is used in the evidence provided as an alternate way of writing the character 鱉 'softshell turtle' in the specific word 紫鱉, an ancient name for a type of plant. In this case it is obvious that the character is composed of a grass radical above a turtle.
Thus UK-20437 and U+2C7CF are non-cognate, and have significantly different glyph forms, and so cannot be unified.
Unify to 㘞 (U+361E) as this is the form of the character shown several times in the source shown below:
《文帝全書》36:14B
In other texts, ⿰口𫆀 is written as 㖿 in the same sequences of syllables (example shown below). Therefore, unify to 㖿 (U+35BF) and create a new UCV for 𫆀 = 耶.
Based on this, the two characters appear to be non-cognate, therefore should not be unified according to the Non-cognate Rule ("Ideographs with different glyph shapes that are unrelated in historical derivation (non-cognate characters) are not unified no matter how similar their glyph shapes may be").
Attributes
Evidence
Based on the new evidence, I still believe that unification with U+25997 𥦗 is appropriate.
The character here is also a simplification for U+571D 圝.
Also, why are ⿰至及 and ⿰至支 not also proposed for encoding? It seems pointless to encode ⿰至戾 but not the two other characters in the same extract.
The additional evidence provided by Tao Yang is questionable as the correct title for chapter 4 of this novel is "蟠龍嶺群英相會". Therefore, it seems that the first two characters in this edition are corrupted, and the additional evidence should be rejected.
Therefore we WITHDRAW this character.
It may be disruptive to change the glyph for U+3023E at this stage, which is why we proposed ⿰口𪁪 (UK-20393) as a new character. If IRG thinks it is more appropriate, we can withdraw UK-20393 and change the glyph for U+3023E to ⿰口𪁪. In any case, the simplified form ⿲口尨鸟 (UK-20394) is a new character, and should be encoded based on the evidence provided.
This evidence shows U+226E3, therefore agree to unification.
先天斗母奏告玄科 : 先天斗母奏告玄科
In this text the left side has been simplified to U+20ADA 𠫚, and so has the left side of UK-20686 below it.
Note that the typseset edition normalizes the right side from 𫩧 shown in the woodblock edition to 含.
In this text the left side has been simplified to U+20ADA 𠫚, and so has the left side of UK-20685 above it.
See also UTC-03221
This typeset text shows two variants of the character (6 × 有 and 9 × 有), as well as UTC-03214 (⿺辶⿳⿲日日日⿲日日日⿲日日日)
Glyph Design & Normalization
Other
Data for Unihan