Archive

Archive for February, 2012

SharePoint: Enhanced ItemPicker

February 28th, 2012 Admin No comments

    I was asked to develop a control based on the ItemPicker control, which, in addition to ability of choosing an external data item through BDC, brings it to client side without page postback. There was the following supposed sequence of actions:

  1. An user chooses a data item in the Picker Dialog;
  2. The identifier of the selected item is sent to the server through an Ajax-like technology;
  3. Using the received identifier, the server fetches the proper data out through BDC and sends it back to the client. By “the proper data” I mean the values of either all fields available in the selected data item or only fields defined in control declaration;
  4. On the client side, the received data is parsed and displayed in UI;

Additionally, the control should be free of bindings to SPField as it should be capable to reside within an independent aspx-page locating in the _layout folder.

*Note: for better understanding of BDC infrastructure, please read the following blog posts: SharePoint: Brief introduction to Business Data Catalog and SharePoint: Understanding BusinessData Column.

So, I developed the required control and made it as reusable as possible. Let’s call the control MyItemPicker (it’s so unusual, isn’t? :) ). For sake of simplicity I decided to use the ASP.Net client callbacks applied through the ICallbackEventHandler interface. The ASP.Net client callbacks can be considered as a wrapper on XMLHTTP object. Also the MyItemPicker comprises and uses the standard ItemPicker.

Ok, let’s start with declaration of the control within page:

<MYCC:MyItemPicker id="myItemPicker" runat="server" LobSystemInstanceName="Products"
EntityName="Product" PrimaryColumnName="Name" ClientCallback="MyClientCallback"
ClientCallbackError="MyClientCallbackError" CallbackBDCFieldFilter="Price,Producer" />

The significant properties here are

  • LobSystemInstanceName is the name of the Lob System Instance, through which data is provided to pick;
  • EntityName is the type name of data items populating the picker;
  • PrimaryColumnName is the name of the data item field, the value of which is used as a display value;
  • ClientCallback is the name of the JavaScript function, which has to be present within the page. In case of success, the given function accepts and processes the server response containing fetched data;
  • ClientCallbackError is the name of the JavaScript function, which can be within the page and is called, when server fails to fulfill request. This property is optional;
  • CallbackBDCFieldFilter is the comma-separated string containing names of data item fields that should be included in server response. For example, if a BDC Entity has four fields – ID, Name, Price and Producer, you might want to have on client side only two of them – Price and Producer. If the CallbackBDCFieldFilter property is empty or not presented in the declaration, server response contains the values of all available fields of BDC Entity;

The sample of the JavaScript functions, which should be indicated in the ClientCallback and ClientCallbackError properties, is shown below. Note the functions’ signatures.

<script type="text/javascript">

    function MyClientCallback(result, context) {

        alert("Result: " + result);

        if (result != null && typeof result != "undefined" && result.length != 0) {
            var res = eval("(" + result + ")");

            alert('Price: '    + res.Price);
            alert('Producer: ' + res.Producer);

            // update UI with received data
        }
    }

    function MyClientCallbackError(result, context) {
        alert('Error: ' + result);
    }

</script>

The server response looks like

{ 'Price' : '10.00',
  'Producer' : 'Microsoft Corporation' }

Thus, the response is formatted in the manner to be easily turned into JavaScript object by means of the eval function.

So, it’s time to examine the source code of the MyItemPicker itself.

MyItemPicker Source

using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.SharePoint;
using Microsoft.SharePoint.Portal.WebControls;
using System.Web.UI;
using Microsoft.Office.Server.ApplicationRegistry.Infrastructure;
using Microsoft.Office.Server.ApplicationRegistry.MetadataModel;
using Microsoft.Office.Server.ApplicationRegistry.Runtime;
using System.Data;

namespace MyControls
{
    public class MyItemPicker : Control, ICallbackEventHandler
    {
        #region fields & properties
        protected ItemPicker _picker                    = null;
        protected string     _callbackRequestedEntityId = string.Empty;

        public string LobSystemInstanceName  { get; set; }
        public string EntityName             { get; set; }
        public string PrimaryColumnName      { get; set; }

        public string ClientCallback         { get; set; }
        public string ClientCallbackError    { get; set; }

        public string CallbackBDCFieldFilter { get; set; }
        #endregion

        #region public methods
        // Implementation of the ICallbackEventHandler interface
        // Generates response that will be sent to client
        public string GetCallbackResult()
        {
            return GetJSResult();
        }
        // Implementation of the ICallbackEventHandler interface
        // Retrieves and preserves identifier of selected data item sent from client
        public void RaiseCallbackEvent(string eventArgument)
        {
            _callbackRequestedEntityId = eventArgument;
        }
        #endregion

        #region internal methods
        protected override void OnInit(EventArgs e)
        {
            base.OnInit(e);
            EnsureChildControls();
        }

        protected override void CreateChildControls()
        {
            base.CreateChildControls();

            if (_picker == null)
            {
                _picker = new ItemPicker();
                _picker.MultiSelect = false;
                _picker.ID = ID + "_ItemPicker";
                try
                {
                    this.SetExtendedDataOnPicker(_picker);
                }
                catch (Exception exception)
                {
                    _picker.ErrorMessage = exception.Message;
                    _picker.Enabled = false;
                }

                this.Controls.Add(_picker);
            }
        }

	/// <summary>
        /// Initilizes main item picker's properties
        /// </summary>
        protected virtual void SetExtendedDataOnPicker(ItemPicker picker)
        {
            ItemPickerExtendedData data = new ItemPickerExtendedData();

            BDCMetaRequest request = new BDCMetaRequest(LobSystemInstanceName, EntityName);
            data.SystemInstanceId  = request.FoundLobSystemInstance.Id;
            data.EntityId          = request.FoundEntity.Id;

            List<uint> list = new List<uint>();
            FieldCollection fields = request.FoundEntity.GetSpecificFinderView().Fields;
            foreach (Field field in fields)
                if (string.Equals(field.Name, PrimaryColumnName, StringComparison.OrdinalIgnoreCase))
                    data.PrimaryColumnId = field.TypeDescriptor.Id;
                else
                    list.Add(field.TypeDescriptor.Id);

            data.SecondaryColumnsIds = list.ToArray();
            picker.ExtendedData = data;
        }

        protected override void OnPreRender(EventArgs e)
        {
            base.OnPreRender(e);
            AddJSCallbackFunctions();
            AddAdditionalJSFunctions();
        }

        /// <summary>
        /// Generates and adds auxiliary JavaScript functions to the page
        /// </summary>
        protected void AddAdditionalJSFunctions()
        {
            if (_picker != null)
            {
                _picker.LoadPostData(null, null); // this line is required to force CreateChildControls() and to have HiddenEntityKey created
                Control upLevelDiv = FindControlRecursive(_picker, "upLevelDiv");
                if (upLevelDiv != null)
                {
                    string clearFuncName = "ClearItemPicker_" + ID;
                    string clearFunc =
                        "function " + clearFuncName + "() {" +
                            "var upLevelDiv = document.getElementById('" + upLevelDiv.ClientID + "');" +
                            "if (upLevelDiv != null) {" +
                                "upLevelDiv.innerHTML = '';" +
                                "updateControlValue('" + _picker.ClientID + "');" +
                                "}" +
                            "}";
                    Page.ClientScript.RegisterClientScriptBlock(GetType(), clearFuncName, clearFunc, true);
                }

                Control hiddenEntityDisplayTextControl = FindControlRecursive(_picker, "HiddenEntityDisplayText");
                if (hiddenEntityDisplayTextControl != null)
                {
                    string getDisplayTextFuncName = "GetDisplayText_" + ID;
                    string getDisplayTextFunc =
                        "function " + getDisplayTextFuncName + "() {" +
                            "var hiddenEntityDisplayTextControl = document.getElementById('" + hiddenEntityDisplayTextControl.ClientID + "');" +
                            "return hiddenEntityDisplayTextControl != null ? hiddenEntityDisplayTextControl.value : '';" +
                        "}";
                    Page.ClientScript.RegisterClientScriptBlock(GetType(), getDisplayTextFuncName, getDisplayTextFunc, true);
                }
            }
        }

        /// <summary>
        /// Generates and adds the picker's AfterCallbackClientScript to the page
        /// </summary>
        protected void AddJSCallbackFunctions()
        {
            if (_picker != null)
            {
                string callbackFunc = null;
                if (!string.IsNullOrEmpty(ClientCallback) && !string.IsNullOrEmpty(ClientCallbackError))
                    callbackFunc = Page.ClientScript.GetCallbackEventReference(this, "arg", ClientCallback, "context", ClientCallbackError, true);
                else
                {
                    if (!string.IsNullOrEmpty(ClientCallback))
                        callbackFunc = Page.ClientScript.GetCallbackEventReference(this, "arg", ClientCallback, "context", true);
                }
                if (!string.IsNullOrEmpty(callbackFunc))
                {
                    _picker.LoadPostData(null, null); // this line is required to force CreateChildControls() and to have HiddenEntityKey created

                    Control pickerEntityKeyHidden = FindControlRecursive(_picker, "HiddenEntityKey");
                    if (pickerEntityKeyHidden != null)
                    {
                        string clientFuncName = "GetBdcFieldValuesAsync_" + ID;
                        string clientFunc =
                            "function " + clientFuncName + "(context)" +
                            "{" +
                                "var pickerEntityKeyHidden = document.getElementById('" + pickerEntityKeyHidden.ClientID + "');" +
                                "if (pickerEntityKeyHidden != null)" +
                                "{" +
                                    "var arg = pickerEntityKeyHidden.value;" +
                                    callbackFunc + ";" +
                                "}" +
                            "}";
                        Page.ClientScript.RegisterClientScriptBlock(GetType(), clientFuncName, clientFunc, true);
                        _picker.AfterCallbackClientScript = clientFuncName + "('" + ID + "');";
                    }
                }
            }
        }

        /// <summary>
        /// Makes request to external data source and returns json-result
        /// </summary>
        protected string GetJSResult()
        {
            string res = string.Empty;

            try
            {
                if (!string.IsNullOrEmpty(_callbackRequestedEntityId))
                {
                    Dictionary<string, byte> bdcFieldFilter = GetBDCFieldFilter(CallbackBDCFieldFilter);

                    BDCRequestById request = new BDCRequestById(LobSystemInstanceName, EntityName, _callbackRequestedEntityId);

                    StringBuilder sb = new StringBuilder();
                    sb.Append("{");
                    foreach (Field field in request.FoundEntityInstance.ViewDefinition.Fields)
                    {
                        if (bdcFieldFilter.Count == 0 || bdcFieldFilter.ContainsKey(field.Name))
                        {
                            if (sb.Length > 1)
                                sb.Append(", ").AppendLine();
                            sb.Append("'").Append(field.Name).Append("' : ");
                            sb.Append("'").Append(Convert.ToString(request.FoundEntityInstance.GetFormatted(field))).Append("'");
                        }
                    }
                    sb.Append("}");                    

                    res = sb.ToString();
                }
            }
            catch (Exception ex)
            {
                // write error to log
            }

            return res;
        }

        /// <summary>
        /// Parses the user defined list of bdc fields, the values of which should be retrieved
        /// </summary>
        protected static Dictionary<string, byte> GetBDCFieldFilter(string commaSeparatedBdcFields)
        {
            Dictionary<string, byte> res = new Dictionary<string, byte>(StringComparer.OrdinalIgnoreCase);

            if (!string.IsNullOrEmpty(commaSeparatedBdcFields))
            {
                string[] bdcFields = commaSeparatedBdcFields.Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries);
                foreach (string field in bdcFields)
                    res.Add(field, 0);
            }

            return res;
        }
        #endregion
    }
}

You probably noticed that the functions AddJSCallbackFunctions and AddAdditionalJSFunctions generate and add some JavaScript functions to the page. The exact names of these JavaScript functions depend on the id attribute defined in control declaration. For example, if control id is “myItemPicker“, the functions’ name will be GetBdcFieldValuesAsync_myItemPicker, ClearItemPicker_myItemPicker and GetDisplayText_myItemPicker.

Let’s take a look at the functions. The main function is GetBdcFieldValuesAsync_myItemPicker, which extracts the encoded id of selected item from ItemPicker and then makes the Ajax-like client callback to the server. The rest two functions are auxiliary, they are not used by MyItemPicker directly, but they are very useful for developing an interaction between user and MyItemPicker. As their names imply, the ClearItemPicker_myItemPicker clears the ItemPicker, and the GetDisplayText_myItemPicker returns the text displayed to user in ItemPicker. The listing below demonstrates the functions within page:

<script type="text/javascript">

    function GetBdcFieldValuesAsync_myItemPicker(context) {
        var pickerEntityKeyHidden = document.getElementById('myItemPicker_ItemPicker_HiddenEntityKey');
        if (pickerEntityKeyHidden != null) {
            var arg = pickerEntityKeyHidden.value;
            WebForm_DoCallback('myItemPicker', arg, ClientCallback, context, ClientCallbackError, true);
        }
    }

    function ClearItemPicker_myItemPicker() {
        var upLevelDiv = document.getElementById('myItemPicker_ItemPicker_upLevelDiv');
        if (upLevelDiv != null) {
            upLevelDiv.innerHTML = '';
            updateControlValue('myItemPicker_ItemPicker');
        }
    }

    function GetDisplayText_myItemPicker() {
        var hiddenEntityDisplayTextControl = document.getElementById('myItemPicker_ItemPicker_HiddenEntityDisplayText');
        return hiddenEntityDisplayTextControl != null ? hiddenEntityDisplayTextControl.value : '';
    }
</script>

The SetExtendedDataOnPicker and GetJSResult methods of MyItemPicker employ the classes BDCMetaRequest and BDCRequestById that are described in my post SharePoint: How to get value from BDC.

The FindControlRecursive method is mentioned in another my post.

C#: Html Stripper

February 13th, 2012 Admin No comments

    Some time ago I developed the project comparing the visible content of two html-pages and displaying all found differences. The main goal of project was to detect whether page was updated through the time, so usually comparison was between different versions of one html-page. To fetch the visible content and throw away the hidden one I created the Regex-based Html stripper embodied in the HtmlStripper class. The HtmlStripper follows the next three steps:

  1. to remove all service and auxiliary Html tags. In most cases that means to remove the <head>-tag with all its content, along with the <style>, <script> and other tags residing within <body>;
  2. to remove all the rest Html tags;
  3. to replace escape sequences found in Html with its text analogs. In practice that means that, for example,
    • the &nbsp; sequence should be replaced with the space symbol ‘ ’;
    • &lt; – with ‘<’;
    • &gt – with ‘>’ and so on

    There is a huge amount of escape sequences (or HTML codes), so the HtmlStripper operates with the most popular in my opinion;

So, the code of HtmlStripper is shown below:

HtmlStripper Source

using System;
using System.Text.RegularExpressions;

namespace Helpers
{
    public static class HtmlStripper
    {
        #region fields
        /// <summary>
        /// Allows to find the HTML tags hidden from view (style, script code and so on)
        /// </summary>
        private static Regex _findHtmlTagsWithInvisibleTextRegex = new Regex
            (@"<head[^>]*?>.*?</head> | <style[^>]*?>.*?</style> | <script[^>]*?.*?</script> |
               <object[^>]*?.*?</object> | <embed[^>]*?.*?</embed> | <applet[^>]*?.*?</applet> |
               <noframes[^>]*?.*?</noframes> | <noscript[^>]*?.*?</noscript> | <noembed[^>]*?.*?</noembed>",
                RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace | 

RegexOptions.Singleline);

        /// <summary>
        /// Allows to find all HTML tags
        /// </summary>
        private static Regex _findHtmlTagsRegex = new Regex
            (@"<(?:(!--) |(\?) |(?i:( TITLE  | SCRIPT | APPLET | OBJECT | STYLE )) | ([!/A-Za-z]))(?(4)(?:(?![\s=][""`'])

[^>] |[\s=]`[^`]*`
             |[\s=]'[^']*'|[\s=]""[^""]*"")*|.*?)(?(1)(?<=--))(?(2)(?<=\?))(?(3)</(?i:\3)(?:\s[^>]*)?)>",
             RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace | 

RegexOptions.Singleline);

        private static string _escapeGroupName = "escSeq";
        /// <summary>
        /// Allows to find escape sequences that are used in HTML
        /// </summary>
        private static Regex _findHtmlEscapeSequencesRegex = new Regex
            (string.Format(@"[&] (([#](?<{0}>\d+)) | ([#](?<{0}>x[\dabcdef]+)) | (?<{0}>\w+));?", _escapeGroupName),
            RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace | 

RegexOptions.Singleline);
        #endregion

        public static string Strip(string htmlSrc)
        {
            string xTmpStr = _findHtmlTagsWithInvisibleTextRegex.Replace(htmlSrc, "");
            xTmpStr = _findHtmlTagsRegex.Replace(xTmpStr, "");
            return RemoveHtmlEscapeSequences(xTmpStr);
        }        

        private static string RemoveHtmlEscapeSequences(string htmlStr)
        {
            return _findHtmlEscapeSequencesRegex.Replace(htmlStr, delegate(Match match)
            {
                string replacement = GetHtmlEscapeSequenceSubstitution(match.Groups[_escapeGroupName].Value);
                return replacement == null ? match.Value : replacement;
            });
        }

        private static string GetHtmlEscapeSequenceSubstitution(string htmlEscape)
        {
            htmlEscape = htmlEscape.TrimStart('0'); // sometimes number contains leading zeros

            bool? isNumber = null;

            // if it's a hex number, convert it into string containing decimal number
            if (htmlEscape.StartsWith("x"))
            {
                int num = Int32.Parse(htmlEscape.TrimStart('x'), System.Globalization.NumberStyles.HexNumber);
                htmlEscape = num.ToString();
                isNumber = true;
            }

            // check if it's a number
            if(isNumber == null)
            {
                int tmpOut;
                isNumber = int.TryParse(htmlEscape, out tmpOut);
            }

            // find substitution for either the peeled number or the original string
            string res = FindSubstitution(htmlEscape);

            // if it's a number, any further attempts with different string variations are senseless
            if (!isNumber.Value && res == null)
            {
                // there are a few Html codes consisting of all capital letters,
                // try to find substitution for them
                htmlEscape = htmlEscape.ToUpperInvariant();
                res = FindSubstitution(htmlEscape);               

                if (res == null)
                {
                    // try to find substitution for string with the first capital letter
                    htmlEscape = htmlEscape.Substring(0, 1).ToUpperInvariant() + htmlEscape.Substring(1).ToLowerInvariant();
                    res = FindSubstitution(htmlEscape);
                }

                if (res == null)
                {
                    // try to find substitution for the lower string
                    htmlEscape = htmlEscape.ToLowerInvariant();
                    res = FindSubstitution(htmlEscape);
                }
            }

            return res;
        }

        private static string FindSubstitution(string htmlEscape)
        {
            switch (htmlEscape)
            {
                // All browsers support

                /* space */                             /* zero */
                case "32": return " ";                  case "48": return "0";
                /* exclamation point */                 /* one */
                case "33": return "!";                  case "49": return "1";
                /* double quotes */                     /* two */
                case "34": case "quot": return "\"";    case "50": return "2";
                /* number sign */                       /* three */
                case "35": return "#";                  case "51": return "3";
                /* dollar sign */                       /* four */
                case "36": return "$";                  case "52": return "4";
                /* percent sign */                      /* five */
                case "37": return "%";                  case "53": return "5";
                /* ampersand */                         /* six */
                case "38": case "amp": return "&";      case "54": return "6";
                /* single quote */                      /* seven */
                case "39": return "'";                  case "55": return "7";
                /* opening parenthesis */               /* eight */
                case "40": return "(";                  case "56": return "8";
                /* closing parenthesis */               /* nine */
                case "41": return ")";                  case "57": return "9";
                /* asterisk */                          /* colon */
                case "42": return "*";                  case "58": return ":";
                /* plus sign */                         /* semicolon */
                case "43": return "+";                  case "59": return ";";
                /* comma */                             /* less than sign */
                case "44": return ",";                  case "60": case "lt": return "<";
                /* minus sign - hyphen */               /* equal sign */
                case "45": return "-";                  case "61": return "=";
                /* period */                            /* greater than sign */
                case "46": return ".";                  case "62": case "gt": return ">";
                /* slash */                             /* question mark */
                case "47": return "/";                  case "63" : return "?";

                /* at symbol */             case "83": return "S";
                case "64": return "@";      case "84": return "T";
                case "65": return "A";      case "85": return "U";
                case "66": return "B";      case "86": return "V";
                case "67": return "C";      case "87": return "W";
                case "68": return "D";      case "88": return "X";
                case "69": return "E";      case "89": return "Y";
                case "70": return "F";      case "90": return "Z";
                case "71": return "G";      /* opening bracket */
                case "72": return "H";      case "91": return "[";
                case "73": return "I";      /* backslash */
                case "74": return "J";      case "92": return "\\";
                case "75": return "K";      /* closing bracket */
                case "76": return "L";      case "93": return "]";
                case "77": return "M";      /* caret - circumflex */
                case "78": return "N";      case "94": return "^";
                case "79": return "O";      /* underscore */
                case "80": return "P";      case "95": return "_";
                case "81": return "Q";      /* grave accent */
                case "82": return "R";      case "96": return "`";                                            

                case "97": return "a";      case "114": return "r";
                case "98": return "b";      case "115": return "s";
                case "99": return "c";      case "116": return "t";
                case "100": return "d";     case "117": return "u";
                case "101": return "e";     case "118": return "v";
                case "102": return "f";     case "119": return "w";
                case "103": return "g";     case "120": return "x";
                case "104": return "h";     case "121": return "y";
                case "105": return "i";     case "122": return "z";
                case "106": return "j";     /* opening brace */
                case "107": return "k";     case "123": return "{";
                case "108": return "l";     /* vertical bar */
                case "109": return "m";     case "124": return "|";
                case "110": return "n";     /* closing brace */
                case "111": return "o";     case "125": return "}";
                case "112": return "p";     /* equivalency sign - tilde */
                case "113": return "q";     case "126": return "~";

                /* non-breaking space */                    /* degree sign */
                case "160": case "nbsp" : return " ";       case "176": case "deg": return "°";
                /* inverted exclamation mark */             /* plus-or-minus sign */
                case "161": case "iexcl": return "¡";       case "177": case "plusmn": return "±";
                /* cent sign */                             /* superscript two - squared */
                case "162": case "cent" : return "¢";       case "178": case "sup2": return "²";
                /* pound sign */                            /* superscript three - cubed */
                case "163": case "pound": return "£";       case "179": case "sup3": return "³";
                /* currency sign */                         /* acute accent - spacing acute */
                case "164": case "curren": return "¤";      case "180": case "acute": return "´";
                /* yen sign */                              /* micro sign */
                case "165": case "yen": return "¥";         case "181": case "micro": return "µ";
                /* broken vertical bar */                   /* pilcrow sign - paragraph sign */
                case "166": case "brvbar": return "¦";      case "182": case "para": return "¶";
                /* section sign */                          /* middle dot - Georgian comma */
                case "167": case "sect": return "§";        case "183": case "middot": return "·";
                /* spacing diaeresis - umlaut */            /* spacing cedilla */
                case "168": case "uml": return "¨";         case "184": case "cedil": return "¸";
                /* copyright sign */                        /* superscript one */
                case "169": case "copy": return "©";        case "185": case "sup1": return "¹";
                /* feminine ordinal indicator */            /* masculine ordinal indicator */
                case "170": case "ordf": return "ª";        case "186": case "ordm": return "º";
                /* left double angle quotes */              /* right double angle quotes */
                case "171": case "laquo": return "«";       case "187": case "raquo": return "»";
                /* not sign */                              /* fraction one quarter */
                case "172": case "not": return "¬";         case "188": case "frac14": return "¼";
                /* soft hyphen */                           /* fraction one half */
                case "173": case "shy": return "­";          case "189": case "frac12": return "½";
                /* registered trade mark sign */            /* fraction three quarters */
                case "174": case "reg": return "®";         case "190": case "frac34": return "¾";
                /* spacing macron - overline */             /* inverted question mark */
                case "175": case "macr": return "¯";        case "191": case "iquest": return "¿";

                /* latin capital letter A with grave */         /* latin capital letter ETH */
                case "192": case "Agrave": return "À";          case "208": case "ETH": return "Ð";
                /* latin capital letter A with acute */         /* latin capital letter N with tilde */
                case "193": case "Aacute": return "Á";          case "209": case "Ntilde": return "Ñ";
                /* latin capital letter A with circumflex */    /* latin capital letter O with grave */
                case "194": case "Acirc": return "Â";           case "210": case "Ograve": return "Ò";
                /* latin capital letter A with tilde */         /* latin capital letter O with acute */
                case "195": case "Atilde": return "Ã";          case "211": case "Oacute": return "Ó";
                /* latin capital letter A with diaeresis */     /* latin capital letter O with circumflex */
                case "196": case "Auml": return "Ä";            case "212": case "Ocirc": return "Ô";
                /* latin capital letter A with ring above */    /* latin capital letter O with tilde */
                case "197": case "Aring": return "Å";           case "213": case "Otilde": return "Õ";
                /* latin capital letter AE */                   /* latin capital letter O with diaeresis */
                case "198": case "AElig": return "Æ";           case "214": case "Ouml": return "Ö";
                /* latin capital letter C with cedilla */       /* multiplication sign */
                case "199": case "Ccedil": return "Ç";          case "215": case "times": return "×";
                /* latin capital letter E with grave */         /* latin capital letter O with slash */
                case "200": case "Egrave": return "È";          case "216": case "Oslash": return "Ø";
                /* latin capital letter E with acute */         /* latin capital letter U with grave */
                case "201": case "Eacute": return "É";          case "217": case "Ugrave": return "Ù";
                /* latin capital letter E with circumflex */    /* latin capital letter U with acute */
                case "202": case "Ecirc": return "Ê";           case "218": case "Uacute": return "Ú";
                /* latin capital letter E with diaeresis */     /* latin capital letter U with circumflex */
                case "203": case "Euml": return "Ë";            case "219": case "Ucirc": return "Û";
                /* latin capital letter I with grave */         /* latin capital letter U with diaeresis */
                case "204": case "Igrave": return "Ì";          case "220": case "Uuml": return "Ü";
                /* latin capital letter I with acute */         /* latin capital letter Y with acute */
                case "205": case "Iacute": return "Í";          case "221": case "Yacute": return "Ý";
                /* latin capital letter I with circumflex */    /* latin capital letter THORN */
                case "206": case "Icirc": return "Î";           case "222": case "THORN": return "Þ";
                /* latin capital letter I with diaeresis */     /* latin small letter sharp s - ess-zed */
                case "207": case "Iuml": return "Ï";            case "223": case "szlig": return "ß";

                /* latin small letter a with grave */           /* latin small letter eth */
                case "224": case "agrave": return "à";          case "240": case "eth": return "ð";
                /* latin small letter a with acute */           /* latin small letter n with tilde */
                case "225": case "aacute": return "á";          case "241": case "ntilde": return "ñ";
                /* latin small letter a with circumflex */      /* latin small letter o with grave */
                case "226": case "acirc": return "â";           case "242": case "ograve": return "ò";
                /* latin small letter a with tilde */           /* latin small letter o with acute */
                case "227": case "atilde": return "ã";          case "243": case "oacute": return "ó";
                /* latin small letter a with diaeresis */       /* latin small letter o with circumflex */
                case "228": case "auml": return "ä";            case "244": case "ocirc": return "ô";
                /* latin small letter a with ring above */      /* latin small letter o with tilde */
                case "229": case "aring": return "å";           case "245": case "otilde": return "õ";
                /* latin small letter ae */                     /* latin small letter o with diaeresis */
                case "230": case "aelig": return "æ";           case "246": case "ouml": return "ö";
                /* latin small letter c with cedilla */         /* division sign */
                case "231": case "ccedil": return "ç";          case "247": case "divide": return "÷";
                /* latin small letter e with grave */           /* latin small letter o with slash */
                case "232": case "egrave": return "è";          case "248": case "oslash": return "ø";
                /* latin small letter e with acute */           /* latin small letter u with grave */
                case "233": case "eacute": return "é";          case "249": case "ugrave": return "ù";
                /* latin small letter e with circumflex */      /* latin small letter u with acute */
                case "234": case "ecirc": return "ê";           case "250": case "uacute": return "ú";
                /* latin small letter e with diaeresis */       /* latin small letter u with circumflex */
                case "235": case "euml": return "ë";            case "251": case "ucirc": return "û";
                /* latin small letter i with grave */           /* latin small letter u with diaeresis */
                case "236": case "igrave": return "ì";          case "252": case "uuml": return "ü";
                /* latin small letter i with acute */           /* latin small letter y with acute */
                case "237": case "iacute": return "í";          case "253": case "yacute": return "ý";
                /* latin small letter i with circumflex */      /* latin small letter thorn */
                case "238": case "icirc": return "î";           case "254": case "thorn": return "þ";
                /* latin small letter i with diaeresis */       /* latin small letter y with diaeresis */
                case "239": case "iuml": return "ï";            case "255": case "yuml": return "ÿ";

                // Browser support: Internet Explorer > 4, Netscape > 4   

                /* latin capital letter OE */                   /* single low-9 quotation mark */
                case "338": return "Œ";                         case "8218": return "‚";
                /* latin small letter oe */                     /* left double quotation mark */
                case "339": return "œ";                         case "8220": return "“";
                /* latin capital letter S with caron */         /* right double quotation mark */
                case "352": return "Š";                         case "8221": return "”";
                /* latin small letter s with caron */           /* double low-9 quotation mark */
                case "353": return "š";                         case "8222": return "„";
                /* latin capital letter Y with diaeresis */     /* dagger */
                case "376": return "Ÿ";                         case "8224": return "†";
                /* latin small f with hook - function */        /* double dagger */
                case "402": return "ƒ";                         case "8225": return "‡";
                                                                /* bullet */
                /* en dash */                                   case "8226": return "•";
                case "8211": return "–";                        /* horizontal ellipsis */
                /* em dash */                                   case "8230": return "…";
                case "8212": return "—";                        /* per thousand sign */
                /* left single quotation mark */                case "8240": return "‰";
                case "8216": return "‘";                        /* euro sign */
                /* right single quotation mark */               case "8364": case "euro": return "€";
                case "8217": return "’";                        /* trade mark sign */
                                                                case "8482": return "™";
            }

            return null;
        }
    }
}

Applying the HtmlStripper to the simple piece of Html like this

<div>text in the first div&nbsp;
     <div><text in the nested div><div/>
     &nbsp;&#8364;text in the first div again
</div>

the following result is got

text in the first div
    <text in the nested div>
     €text in the first div again

The next code sample demonstrates how to use HtmlStripper in field conditions:

using System;
using System.Text;
using System.Web;
using System.Net;
using System.IO;
using Helpers;

namespace HtmlStripperConsoleApp
{
    class Program
    {
        static void Main(string[] args)
        {
            String responseString = null;

            WebRequest request = HttpWebRequest.Create("http://dotnetfollower.com/wordpress/2011/12/sharepoint-understanding-businessdata-column-bdc-field/");            

            using(WebResponse response = request.GetResponse())
                using (Stream stream = response.GetResponseStream())
                {
                    StreamReader reader = new StreamReader(stream, Encoding.UTF8);
                    responseString = reader.ReadToEnd();
                }

            string strippedText = HtmlStripper.Strip(responseString);
            File.WriteAllText(@"c:\text.txt", strippedText);
        }
    }
}
Categories: C#, Regex Tags: ,