When I use substr() I get a strange character at the end

$articleText = substr($articleText,0,500);

I have an output of 500 chars and � <–

How can I fix this? Is it an encoding problem? My language is Greek.

substr is counting using bytes, and not characters.

greek probably means you are using some multi-byte encoding, like UTF-8 — and counting per bytes is not quite good for those.

Maybe using mb_substr could help, here : the mb_* functions have been created specifically for multi-byte encodings.


Use mb_substr instead, it is able to deal with multiple encodings, not only single-byte strings as substr:

$articleText = mb_substr($articleText,0,500,'UTF-8');


Looks like you’re slicing a unicode character in half there. Use mb_substr instead for unicode-safe string slicing.


Alternative solution for UTF-8 encoded strings – this will convert UTF-8 to characters before cutting the sub-string.

$articleText = substr(utf8_decode($articleText),0,500);

To get the articleText string back to UTF-8, an extra operation will be needed:

$articleText = utf8_encode( substr(utf8_decode($articleText),0,500) );


use this function, It worked for me

function substr_unicode($str, $s, $l = null) {
    return join("", array_slice(
        preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY), $s, $l));



ms_substr() also works excellently for removing strange trailing line breaks as well, which I was having trouble with after parsing html code. The problem was NOT handled by:



 var_dump(preg_match('/^\n|\n$/', $variable));


str_replace (array('\r\n', '\n', '\r'), ' ', $text)

Don’t catch.


You are trying to cut unicode character.So i preferred instead of substr() try mb_substr() in php.


substr ( string $string , int $start [, int $length ] )


mb_substr ( string $str , int $start [, int $length [, string $encoding ]] )

