For e.g. the string "test" was stored as : "< ;div class="user"> ;test< ;/div> ;"
You may notice that the opening and closing angle brackets <> were converted to < and >. To get the string in its original form I used HttpUtility.HtmlDecode() function. Decoding the above string once produced:test. Which was expected but there were certain records which looked like: < ;div class="user"> ;Gareth’s advice was not followed.< ;/div> ;
Decoding them once was no good as the #8217; never got decoded to an apostrophe. I decoded the string once again and got the expected output:
Gareth's advice was not followed.
Now I had to remove the html tags and extract the text in between them. Here's the VB code for this:
'Delete text between angled brackets
mStartPos = InStr(strContent, "<") mEndPos = InStr(strContent, ">")
Do While mStartPos <> 0 And mEndPos <> 0 And mEndPos > mStartPos
mString = Mid(strContent, mStartPos, mEndPos - mStartPos + 1)
strContent = Replace(strContent, mString, "")
mStartPos = InStr(strContent, "<") mEndPos = InStr(strContent, ">")
Loop
Do While Left(strContent, 1) = Chr(13) Or Left(strContent, 1) = Chr(10)
strContent = Mid(strContent, 2)
Loop
txt = strContent
Now I had to remove the html tags and extract the text in between them. Here's the VB code for this:
'Delete text between angled brackets
mStartPos = InStr(strContent, "<") mEndPos = InStr(strContent, ">")
Do While mStartPos <> 0 And mEndPos <> 0 And mEndPos > mStartPos
mString = Mid(strContent, mStartPos, mEndPos - mStartPos + 1)
strContent = Replace(strContent, mString, "")
mStartPos = InStr(strContent, "<") mEndPos = InStr(strContent, ">")
Loop
Do While Left(strContent, 1) = Chr(13) Or Left(strContent, 1) = Chr(10)
strContent = Mid(strContent, 2)
Loop
txt = strContent
No comments:
Post a Comment