PDA

View Full Version : php utf-8 "ord" type of function - enjoy


jmichae3
12-12-2012, 07:01 AM
you were probably wondering why an ord() function doesn't exist that works with utf-8, UNICODE, and other encodings for strings. while I don't have a complete solution yet, this function doesn't exist in PHP. only a workaround, this.

one thing I discovered about this code is that: - the page must be encoded as UTF-8 without BOM (such as using notepad++, Encoding, Convert to utf8 without BOM. - you must also include a meta tag
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
better coded utf-8 only version and one that actually works: (and by the way, this works with ascii too)

//returns ordinal value of character in string $str at $index
//and increments $index past current utf-8 character.
//based on the table at http://doc.cat-v.org/bell_labs/utf-8_history
function utf8_ord_next_char($str, &$index) {
if ($index+1 <= strlen($str)
&& 0x80 == 0x80 & ord($str[$index + 0])
) {
$result =
(ord($str[$index+0])&0x7f) ;
$index += 1;
return $result;
} else if ($index+2 <= strlen($str)
&& 0xc0 == 0xe0 & ord($str[$index + 0])
&& 0x80 == 0xc0 & ord($str[$index + 1])
) {
$result =
(ord($str[$index+0])&0x1f) +
(ord($str[$index+1])&0x3f) ;
$index += 2;
return $result;
} else if ($index+3 <= strlen($str)
&& 0xe0 == 0xf0 & ord($str[$index + 0])) {
&& 0x80 == 0xc0 & ord($str[$index + 1])
&& 0x80 == 0xc0 & ord($str[$index + 2])
) {
$result =
(ord($str[$index+0])&0x0f) +
(ord($str[$index+1])&0x3f) +
(ord($str[$index+2])&0x3f) ;
$index += 3;
return $result;
} else if ($index+4 <= strlen($str)
&& 0xf0 == 0xf8 & ord($str[$index + 0])) {
&& 0x80 == 0xc0 & ord($str[$index + 1])
&& 0x80 == 0xc0 & ord($str[$index + 2])
&& 0x80 == 0xc0 & ord($str[$index + 3])
) {
$result =
(ord($str[$index+0])&0x07) +
(ord($str[$index+1])&0x3f) +
(ord($str[$index+2])&0x3f) +
(ord($str[$index+3])&0x3f) ;
$index += 4;
return $result;
} else if ($index+5 <= strlen($str)
&& 0xf8 == 0xfc & ord($str[$index + 0])) {
&& 0x80 == 0xc0 & ord($str[$index + 1])
&& 0x80 == 0xc0 & ord($str[$index + 2])
&& 0x80 == 0xc0 & ord($str[$index + 3])
&& 0x80 == 0xc0 & ord($str[$index + 4])
) {
$result =
(ord($str[$index+0])&0x03) +
(ord($str[$index+1])&0x3f) +
(ord($str[$index+2])&0x3f) +
(ord($str[$index+3])&0x3f) +
(ord($str[$index+4])&0x3f) ;
$index += 5;
return $result;
} else if ($index+6 <= strlen($str)
&& 0xfc == 0xfe & ord($str[$index + 0])) {
&& 0x80 == 0xc0 & ord($str[$index + 1])
&& 0x80 == 0xc0 & ord($str[$index + 2])
&& 0x80 == 0xc0 & ord($str[$index + 3])
&& 0x80 == 0xc0 & ord($str[$index + 4])
&& 0x80 == 0xc0 & ord($str[$index + 5])
) {
$result =
(ord($str[$index+0])&0x01) +
(ord($str[$index+1])&0x3f) +
(ord($str[$index+2])&0x3f) +
(ord($str[$index+3])&0x3f) +
(ord($str[$index+4])&0x3f) +
(ord($str[$index+5])&0x3f) ;
$index += 6;
return $result;
}
$result = ord($str[$index+0]);
$index++;
return $result;
}



this was code I submitted in a bug report to php.net, I am hoping it gets in the php manual at least, but even more I am hoping for a cross-encoding solution for ord(). if enough people vote for this bug something may be done about it.

https://bugs.php.net/bug.php?id=63732

jmichae3
12-15-2012, 09:54 AM
may I ask why I was downed? was it the bug report?