Diacritics problem

zaboss

Perch
I have stumble into a strange problem and it's been already two days I wasn't abble to solve it.
I have a comment form in a Classic ASP CMS that I am building right now. If I use Romanian characters, with diacritics - like "ş", "ţ", "î", "ă" and "â" - they are not taken correctly. I am using UTF-8 character encoding. From what I have tested so far, it is ASP that is screwing them. What I have tested:

- Response.Write Request.Form("comment")
> they are displayed correct
- Session("Coments")=Request.Form("comment")
Response.Write Session("Comments")
>> They are not displayed correctly
- Response.Write Request.Form
=
Code:
Name=Name&Email=eu%40mysite.ro&Website=www.mysite.ro&Comment=Testing+diacritics%0D%0A%C5%9F%0D%0A%C5%A3%0D%0A%C3%A2%0D%0A%C4%83%0D%0A%C3%AE%0D%0A&strArtId=15&securityCode=fesTer&B1=Submit
The page is now on response write, until I finish checking. Any suggestion, most welcomed.
 
OK, I have made a step forward and got a solution that works on my computer. I have set the codepage to Central European and it works. Now it raises another problem as it doesnît work on jodo.
The error I got is:
Active Server Pages error 'ASP 0245'

Mixed usage of Code Page values

/header.asp, line 2

The @CODEPAGE value specified differs from that of the including file's CODEPAGE or the file's saved format.
What puzzles me is that line doesn't thrown an error but only when I post something. The issue was encountered few years ago here.
 
I had a look at your soft4web.ro site.

I don't really have the answer but I think I can at least explain what's happening. Both Firefox and I.E. correctly interpret your meta tag and switch to utf-8. I see the question marks. Now if I manually change my browser to ISO-8859-1 or Windows-1252 then the characters display correctly. At least I think they do, I don't understand the language but the question marks change to something more likely.

If these were static pages then I would say that your html editor is saving the files in something other than utf-8. Most editors (even Notepad) have something to set the encoding when you save a file. See the SaveAs dialog in Notepad.

In other words, you're telling the browser to "interpret this as utf-8" but your actual data is a different encoding. I don't really understand the @CODEPAGE stuff but it sounds like that error is saying something similar.

Sorry, that's a bit vague. I'm not sure what happens about encoding with POST data from a form. I think you are using an Access database. Have you tried downloading the db and reading the data in Access? How does it look in Access? You need to narrow down whether its a writing or reading problem. I'm not sure where the database drivers fit into this because somewhere the text gets encoded from unicode in Access to utf-8 for the browser.

The usual article people point to about this stuff is: http://www.joelonsoftware.com/articles/Unicode.html Read that and you'll see that utf-8 is really the way to go. Anything using code pages is old and limited. The whole character encoding thing is more complicated than it first appears.

I'll let you know if I have any more thoughts.

Cheers
Ross
 
Hi Ross,

you actually indicate where the problem was.
The text on the first page is not a problem, as the articles there are samples and indeed were created in a page with ISO-8859-2 encoding long ago and I didn't bother to change the text to utf-8.

The problem was that the comments.asp file - the one that grabs the input of the form - was having different encoding as a file - not the html encoding but the file encoding (as you pointed out). All the files in the sites were ANSI, while somehow the comments.asp was UTF-8, and that's why the codepages conflict. After setting the correct encoding to the file all worked and I gave up the page code.

Now I don't understand why it worked on my computer as based on this explanation it shouldn't ;).
 
I revive this thread as I run into a new and strange problem. Look at this and then at this. It's the same database, almost the same script that pulls data from it. Discard the fact that the latter is UTF-8 encoded as it doesn't matter, it still gets them wrong even with ISO-8859-2.
 
Back
Top