logo-dw

Go Back   Dreamweaver Club Forums > Hand Coders Forum > PHP
Register FAQ Members List Search Today's Posts Mark Forums Read

Reply
 
Thread Tools Display Modes
Old 12-12-2012, 07:01 AM   #1
jmichae3
 
Join Date: Dec 2010
Posts: 366
Thumbs down php utf-8 "ord" type of function - enjoy

you were probably wondering why an ord() function doesn't exist that works with utf-8, UNICODE, and other encodings for strings. while I don't have a complete solution yet, this function doesn't exist in PHP. only a workaround, this.

one thing I discovered about this code is that: - the page must be encoded as UTF-8 without BOM (such as using notepad++, Encoding, Convert to utf8 without BOM. - you must also include a meta tag
Code:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
better coded utf-8 only version and one that actually works: (and by the way, this works with ascii too)
Code:
//returns ordinal value of character in string $str at $index 
//and increments $index past current utf-8 character.
//based on the table at http://doc.cat-v.org/bell_labs/utf-8_history
function utf8_ord_next_char($str, &$index) { 
    if ($index+1 <= strlen($str) 
        && 0x80 == 0x80 & ord($str[$index + 0])
        ) {
        $result =  
            (ord($str[$index+0])&0x7f) ;
        $index += 1;
        return $result;
    } else if ($index+2 <= strlen($str) 
        && 0xc0 == 0xe0 & ord($str[$index + 0]) 
        && 0x80 == 0xc0 & ord($str[$index + 1])
        ) {
        $result =  
            (ord($str[$index+0])&0x1f) + 
            (ord($str[$index+1])&0x3f) ;
        $index += 2;
        return $result;
    } else if ($index+3 <= strlen($str) 
        && 0xe0 == 0xf0 & ord($str[$index + 0])) {
        && 0x80 == 0xc0 & ord($str[$index + 1])
        && 0x80 == 0xc0 & ord($str[$index + 2])
        ) {
        $result =  
            (ord($str[$index+0])&0x0f) + 
            (ord($str[$index+1])&0x3f) +
            (ord($str[$index+2])&0x3f) ;
        $index += 3;
        return $result;
    } else if ($index+4 <= strlen($str) 
        && 0xf0 == 0xf8 & ord($str[$index + 0])) {
        && 0x80 == 0xc0 & ord($str[$index + 1])
        && 0x80 == 0xc0 & ord($str[$index + 2])
        && 0x80 == 0xc0 & ord($str[$index + 3])
        ) {
        $result =  
            (ord($str[$index+0])&0x07) + 
            (ord($str[$index+1])&0x3f) +
            (ord($str[$index+2])&0x3f) +
            (ord($str[$index+3])&0x3f) ;
        $index += 4;
        return $result;
    } else if ($index+5 <= strlen($str) 
        && 0xf8 == 0xfc & ord($str[$index + 0])) {
        && 0x80 == 0xc0 & ord($str[$index + 1])
        && 0x80 == 0xc0 & ord($str[$index + 2])
        && 0x80 == 0xc0 & ord($str[$index + 3])
        && 0x80 == 0xc0 & ord($str[$index + 4])
        ) {
        $result =  
            (ord($str[$index+0])&0x03) + 
            (ord($str[$index+1])&0x3f) +
            (ord($str[$index+2])&0x3f) +
            (ord($str[$index+3])&0x3f) +
            (ord($str[$index+4])&0x3f) ;
        $index += 5;
        return $result;
    } else if ($index+6 <= strlen($str) 
        && 0xfc == 0xfe & ord($str[$index + 0])) {
        && 0x80 == 0xc0 & ord($str[$index + 1])
        && 0x80 == 0xc0 & ord($str[$index + 2])
        && 0x80 == 0xc0 & ord($str[$index + 3])
        && 0x80 == 0xc0 & ord($str[$index + 4])
        && 0x80 == 0xc0 & ord($str[$index + 5])
        ) {
        $result =  
            (ord($str[$index+0])&0x01) + 
            (ord($str[$index+1])&0x3f) +
            (ord($str[$index+2])&0x3f) +
            (ord($str[$index+3])&0x3f) +
            (ord($str[$index+4])&0x3f) +
            (ord($str[$index+5])&0x3f) ;
        $index += 6;
        return $result;
    }
    $result = ord($str[$index+0]);
    $index++;
    return $result;
}
this was code I submitted in a bug report to php.net, I am hoping it gets in the php manual at least, but even more I am hoping for a cross-encoding solution for ord(). if enough people vote for this bug something may be done about it.

https://bugs.php.net/bug.php?id=63732
__________________
------------
Jim Michaels
HTML Code:
improperly<strong>nested<em>elements</strong>cause</em>
browser confusion (I believe the term is 'tag soup')!

Last edited by jmichae3; 12-12-2012 at 07:12 AM.. Reason: bbs scrambled my code
jmichae3 is offline   Reply With Quote
Old 12-15-2012, 09:54 AM   #2
jmichae3
 
Join Date: Dec 2010
Posts: 366
Default

may I ask why I was downed? was it the bug report?
__________________
------------
Jim Michaels
HTML Code:
improperly<strong>nested<em>elements</strong>cause</em>
browser confusion (I believe the term is 'tag soup')!
jmichae3 is offline   Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 07:50 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Copyright 2006 DreamweaverClub.com