SharePoint: Manually Upgrade Business Data Catalog Application Definitions to Business Data Connectivity Models

March 13th, 2012 No comments

    Trying to import a legacy Application Definition File of SharePoint 2007 into Business Data Connectivity Service of SharePoint 2010, you apparently got at least one of the errors shown below:

  • Application definition import failed. The following error occurred: The root element of a valid Metadata package must be ‘Model’ in namespace ‘http://schemas.microsoft.com/windows/2007/BusinessDataCatalog’. The root in the given package is ‘LobSystem’ in namespace ‘http://schemas.microsoft.com/office/2006/03/BusinessDataCatalog’. Error was encountered at or just before Line: ‘2’ and Position: ‘2’;
  • Application definition import failed. The following error occurred: BDC Model does not correctly match the schema. The required attribute ‘Namespace’ is missing. Error was encountered at or just before Line: ’20’ and Position: ’10’;
  • Application definition import failed. The following error occurred: ReturnTypeDescriptor of MethodInstance with Name ‘ProductSpecificFinderInstance’ on Entity (External Content Type) with Name ‘Product’ in Namespace ‘ExternalProductDB’ should not be a Collection TypeDescriptor for MethodInstances of Type ‘SpecificFinder’. Parameter name: rawValues.ReturnTypeDescriptorId Error was encountered at or just before Line: ‘171’ and Position: ’18’;
  • and so on

As it’s known, in SharePoint 2010, Business Data Catalog (BDC) was replaced with Business Data Connectivity with the same abbreviation. One of the changed things is the format of xml-based Application Definition Files. If you make an in-place upgrade of a live SharePoint 2007 application to SharePoint 2010, bdc metadata will be automatically upgraded as well and will become usable with the Business Data Connectivity. But if the in-place upgrade isn’t an option for you, you can upgrade your xml-based Application Definition Files manually. The manual algorithm step by step is described here – How to: Manually Upgrade Business Data Catalog Application Definitions to Business Data Connectivity Models.

For one of our applications we settled on the manual upgrade of its metadata files. But If I call myself a programmer, I have to try to automate the algorithm, especially taking into account 20+ files required to upgrade. So, I’ve developed a simple application for alteration of the legacy xml-based Application Definition Files to make them compatible with SharePoint 2010. However I’d like to notice that the given converter doesn’t follow entirely the procedure described by Microsoft, but performs only steps allowing our particular metadata files to be successfully imported into the Business Data Connectivity Service. For example, our files don’t comprise actions and associations, thus the application does nothing at all with <Action> and <Association> elements. So, consider this converter as a start point of developing the new one satisfying your own conditions and requirements.

Below I enumerated the necessary and sufficient changes to be applied to our particular metadata files so that it enables us to make them compatible with SharePoint 2010. Exactly these very steps and a few less important I’ve implemented in the converter.

  • the root element in the Application Definition File must be a <Model>;
  • the <Model> must contain <LobSystems> element, which in turn must wrap the former root node – <LobSystem>;
  • the <LobSystem> element mustn’t contain the Version-attribute;
  • the <Entity> element must contain the new attributes – Namespace and Version;
  • the <Identifier> element mustn’t contain an assembly name in its TypeName-attribute; For example, TypeName=”System.String, mscorlib” has to turn into TypeName=”System.String”;
  • the <MethodInstance> element with Type-attribute value of SpecificFinder should include the Default-attribute with value of true;
  • if the <TypeDescriptor> element has the IsCollection attribute set to true, the MethodInstance return TypeDescriptor should be updated to be an element of the collection. In practice, that means the ReturnTypeDescriptorPath-attribute with an appropriate value should be added to <MethodInstance> element, and the obsolete ReturnTypeDescriptorLevel and ReturnTypeDescriptorName attributes should be deleted;

Note that when modifying a <Entity> element, the values of the Namespace and Version attributes are copied respectively from the values of Name and Version attributes of the <LobSystem> element, which wraps the <Entity> element. The same approach is used while the in-place upgrade takes place.

After the changes are applied, and if they are sufficient for your metadata, the result files can be imported into Business Data Connectivity Service. During the import process, you may get the warnings. Consider fixing them in the future, but at the present stage you can simply ignore them. The most popular warnings are listed below:

  • This Model contains LobSystem (External System) of Type ‘WebService’ which is deprecated and may be removed in future releases.
  • The MethodInstance of type ‘Finder’ with Name ‘FindProducts’ does not have a Limit Filter.
  • The TypeDescriptor ‘From’ on Parameter ‘GetProductsFiltered’ on Method ‘GetItems’ of Entity (External Content Type) ‘Product’ with Namespace ‘ExternalProductDB’ has a DateTime value, but there is no Interpretation describing what the External System expects and returns as the DateTimeKind. The DateTime value is assumed to be UTC.
  • Note: I made the converter-application fix the warnings regarding the DateTime type and UTC, so they won’t bother you.

    The converter is very straightforward to use. Using the button ‘+’, simply add to the left section the files to be upgraded. Then click the button ‘>>’, and you’ll get the upgraded ones in the right section. Double click on file name opens an overview form to browse the input or result xml. Physically, the result files are located in c:\output folder. The application doesn’t use any SharePoint-related libraries.

    Upgrade 2007 BDC model to SP2010

    You can download the application from this page or by using the direct link. The Visual Studio 2010 solution and appropriate executable file are in the archive.

SharePoint: Migration of custom upload page derived from UploadPage to SP 2010

March 5th, 2012 No comments

    In our SharePoint 2007 application, there was a custom upload page for a document library. The custom upload page was derived from the Microsoft.SharePoint.ApplicationPages.UploadPage defined in the Microsoft.SharePoint.ApplicationPages.dll. Within application web pages, all links to the upload page were direct, while the page itself was located in the _layouts folder. After migration to the SharePoint 2010, I’ve found out that every request to the custom upload page is transferred to the Uploadex.aspx (located in the _layouts folder as well): the URL in browser corresponds to our upload page, but the content corresponds to the Uploadex.aspx. After a short investigation by means of .Net Reflector I found the reason in the UploadPage base class defined in Microsoft.SharePoint.ApplicationPages.dll, Version=14.0.0.0. Let’s take a look at the OnPreInit method of the UploadPage:

protected override void OnPreInit(EventArgs e)
{
    if (!this.customPage)
    {
        string customUploadPage = base.Web.CustomUploadPage;
        if (!string.IsNullOrEmpty(customUploadPage))
        {
            try
            {
                base.Server.Transfer(customUploadPage);
            }
            catch (Exception)
            {
            }
        }
    }
    base.OnPreInit(e);
}

*Note: this code was added in SharePoint 2010 and wasn’t presented in SharePoint 2007

As we can see, the SharePoint 2010 itself allows to define the CustomUploadPage for a website. If it’s defined, the UploadPage automatically transfers every request to it. Apparently, after migration the CustomUploadPage was somehow set to _layouts/Uploadex.aspx. Another interesting moment here is the customPage boolean field, which acts as marker indicating whether the current page is the custom upload page or not.

Probably, the acceptable solution for our issue would be to set the CustomUploadPage property of the SPWeb object to the URL of our own custom upload page. But, firstly, the problem here is that our upload page derived from the UploadPage doesn’t know that it’s a custom one as the customPage field is set to false by default. Thus, without any code modification, we will have a kind of loop here, because our custom page due to the UploadPage base class will be transferring the request to itself infinitely. Secondly, I don’t want to rely on the CustomUploadPage property, which can be unexpectedly changed. I just want when I click on the direct link to our upload page, this page would be opened without any sudden redirections and transfers.

So, the best solution is to set the customPage field to true within the constructor of our custom upload page. No transfer happens in this case, and we don’t need to deal with the CustomUploadPage at all. In our case It looks like:

public class MyUploadPage : UploadPage
{
    // ...
    public MyUploadPage()
    {
        customPage = true;
    }
    // ...
}

That’s all!

SharePoint: Enhanced ItemPicker

February 28th, 2012 No comments

    I was asked to develop a control based on the ItemPicker control, which, in addition to ability of choosing an external data item through BDC, brings it to client side without page postback. There was the following supposed sequence of actions:

  1. An user chooses a data item in the Picker Dialog;
  2. The identifier of the selected item is sent to the server through an Ajax-like technology;
  3. Using the received identifier, the server fetches the proper data out through BDC and sends it back to the client. By “the proper data” I mean the values of either all fields available in the selected data item or only fields defined in control declaration;
  4. On the client side, the received data is parsed and displayed in UI;

Additionally, the control should be free of bindings to SPField as it should be capable to reside within an independent aspx-page locating in the _layout folder.

*Note: for better understanding of BDC infrastructure, please read the following blog posts: SharePoint: Brief introduction to Business Data Catalog and SharePoint: Understanding BusinessData Column.

So, I developed the required control and made it as reusable as possible. Let’s call the control MyItemPicker (it’s so unusual, isn’t? :)). For sake of simplicity I decided to use the ASP.Net client callbacks applied through the ICallbackEventHandler interface. The ASP.Net client callbacks can be considered as a wrapper on XMLHTTP object. Also the MyItemPicker comprises and uses the standard ItemPicker.

Ok, let’s start with declaration of the control within page:

<MYCC:MyItemPicker id="myItemPicker" runat="server" LobSystemInstanceName="Products" 
EntityName="Product" PrimaryColumnName="Name" ClientCallback="MyClientCallback" 
ClientCallbackError="MyClientCallbackError" CallbackBDCFieldFilter="Price,Producer" />

The significant properties here are

  • LobSystemInstanceName is the name of the Lob System Instance, through which data is provided to pick;
  • EntityName is the type name of data items populating the picker;
  • PrimaryColumnName is the name of the data item field, the value of which is used as a display value;
  • ClientCallback is the name of the JavaScript function, which has to be present within the page. In case of success, the given function accepts and processes the server response containing fetched data;
  • ClientCallbackError is the name of the JavaScript function, which can be within the page and is called, when server fails to fulfill request. This property is optional;
  • CallbackBDCFieldFilter is the comma-separated string containing names of data item fields that should be included in server response. For example, if a BDC Entity has four fields – ID, Name, Price and Producer, you might want to have on client side only two of them – Price and Producer. If the CallbackBDCFieldFilter property is empty or not presented in the declaration, server response contains the values of all available fields of BDC Entity;

The sample of the JavaScript functions, which should be indicated in the ClientCallback and ClientCallbackError properties, is shown below. Note the functions’ signatures.

<script type="text/javascript">

    function MyClientCallback(result, context) {

        alert("Result: " + result);
        
        if (result != null && typeof result != "undefined" && result.length != 0) {
            var res = eval("(" + result + ")");

            alert('Price: '    + res.Price);
            alert('Producer: ' + res.Producer);
            
            // update UI with received data
        }        
    }

    function MyClientCallbackError(result, context) {
        alert('Error: ' + result);            
    }

</script>

The server response looks like

{ 'Price' : '10.00', 
  'Producer' : 'Microsoft Corporation' }

Thus, the response is formatted in the manner to be easily turned into JavaScript object by means of the eval function.

So, it’s time to examine the source code of the MyItemPicker itself.

MyItemPicker Source

using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.SharePoint;
using Microsoft.SharePoint.Portal.WebControls;
using System.Web.UI;
using Microsoft.Office.Server.ApplicationRegistry.Infrastructure;
using Microsoft.Office.Server.ApplicationRegistry.MetadataModel;
using Microsoft.Office.Server.ApplicationRegistry.Runtime;
using System.Data;

namespace MyControls
{
    public class MyItemPicker : Control, ICallbackEventHandler
    {
        #region fields & properties
        protected ItemPicker _picker                    = null;
        protected string     _callbackRequestedEntityId = string.Empty;

        public string LobSystemInstanceName  { get; set; }
        public string EntityName             { get; set; }
        public string PrimaryColumnName      { get; set; }

        public string ClientCallback         { get; set; }
        public string ClientCallbackError    { get; set; }

        public string CallbackBDCFieldFilter { get; set; }
        #endregion

        #region public methods
        // Implementation of the ICallbackEventHandler interface
        // Generates response that will be sent to client
        public string GetCallbackResult()
        {
            return GetJSResult();
        }
        // Implementation of the ICallbackEventHandler interface
        // Retrieves and preserves identifier of selected data item sent from client
        public void RaiseCallbackEvent(string eventArgument)
        {
            _callbackRequestedEntityId = eventArgument;
        }
        #endregion

        #region internal methods
        protected override void OnInit(EventArgs e)
        {
            base.OnInit(e);
            EnsureChildControls();            
        }

        protected override void CreateChildControls()
        {            
            base.CreateChildControls();

            if (_picker == null)
            {
                _picker = new ItemPicker();                
                _picker.MultiSelect = false;
                _picker.ID = ID + "_ItemPicker";
                try
                {
                    this.SetExtendedDataOnPicker(_picker);
                }
                catch (Exception exception)
                {
                    _picker.ErrorMessage = exception.Message;
                    _picker.Enabled = false;
                }

                this.Controls.Add(_picker);
            }            
        }

	/// <summary>
        /// Initilizes main item picker's properties
        /// </summary>        
        protected virtual void SetExtendedDataOnPicker(ItemPicker picker)
        {            
            ItemPickerExtendedData data = new ItemPickerExtendedData();

            BDCMetaRequest request = new BDCMetaRequest(LobSystemInstanceName, EntityName);
            data.SystemInstanceId  = request.FoundLobSystemInstance.Id;
            data.EntityId          = request.FoundEntity.Id;

            List<uint> list = new List<uint>();
            FieldCollection fields = request.FoundEntity.GetSpecificFinderView().Fields;
            foreach (Field field in fields)
                if (string.Equals(field.Name, PrimaryColumnName, StringComparison.OrdinalIgnoreCase))                
                    data.PrimaryColumnId = field.TypeDescriptor.Id;
                else
                    list.Add(field.TypeDescriptor.Id);

            data.SecondaryColumnsIds = list.ToArray();
            picker.ExtendedData = data;
        }

        protected override void OnPreRender(EventArgs e)
        {
            base.OnPreRender(e);
            AddJSCallbackFunctions();
            AddAdditionalJSFunctions();
        }

        /// <summary>
        /// Generates and adds auxiliary JavaScript functions to the page
        /// </summary> 
        protected void AddAdditionalJSFunctions()
        {
            if (_picker != null)
            {
                _picker.LoadPostData(null, null); // this line is required to force CreateChildControls() and to have HiddenEntityKey created
                Control upLevelDiv = FindControlRecursive(_picker, "upLevelDiv");
                if (upLevelDiv != null)
                {
                    string clearFuncName = "ClearItemPicker_" + ID;
                    string clearFunc =
                        "function " + clearFuncName + "() {" +
                            "var upLevelDiv = document.getElementById('" + upLevelDiv.ClientID + "');" +
                            "if (upLevelDiv != null) {" +
                                "upLevelDiv.innerHTML = '';" +
                                "updateControlValue('" + _picker.ClientID + "');" +
                                "}" +
                            "}";
                    Page.ClientScript.RegisterClientScriptBlock(GetType(), clearFuncName, clearFunc, true);
                }
                
                Control hiddenEntityDisplayTextControl = FindControlRecursive(_picker, "HiddenEntityDisplayText");
                if (hiddenEntityDisplayTextControl != null)
                {
                    string getDisplayTextFuncName = "GetDisplayText_" + ID;
                    string getDisplayTextFunc =
                        "function " + getDisplayTextFuncName + "() {" +
                            "var hiddenEntityDisplayTextControl = document.getElementById('" + hiddenEntityDisplayTextControl.ClientID + "');" +
                            "return hiddenEntityDisplayTextControl != null ? hiddenEntityDisplayTextControl.value : '';" +
                        "}";
                    Page.ClientScript.RegisterClientScriptBlock(GetType(), getDisplayTextFuncName, getDisplayTextFunc, true);
                }
            }            
        }

        /// <summary>
        /// Generates and adds the picker's AfterCallbackClientScript to the page
        /// </summary> 
        protected void AddJSCallbackFunctions()
        {
            if (_picker != null)
            {
                string callbackFunc = null;
                if (!string.IsNullOrEmpty(ClientCallback) && !string.IsNullOrEmpty(ClientCallbackError))
                    callbackFunc = Page.ClientScript.GetCallbackEventReference(this, "arg", ClientCallback, "context", ClientCallbackError, true);
                else
                {
                    if (!string.IsNullOrEmpty(ClientCallback))
                        callbackFunc = Page.ClientScript.GetCallbackEventReference(this, "arg", ClientCallback, "context", true);
                }
                if (!string.IsNullOrEmpty(callbackFunc))
                {
                    _picker.LoadPostData(null, null); // this line is required to force CreateChildControls() and to have HiddenEntityKey created

                    Control pickerEntityKeyHidden = FindControlRecursive(_picker, "HiddenEntityKey");
                    if (pickerEntityKeyHidden != null)
                    {
                        string clientFuncName = "GetBdcFieldValuesAsync_" + ID;
                        string clientFunc =
                            "function " + clientFuncName + "(context)" +
                            "{" +
                                "var pickerEntityKeyHidden = document.getElementById('" + pickerEntityKeyHidden.ClientID + "');" +
                                "if (pickerEntityKeyHidden != null)" +
                                "{" +
                                    "var arg = pickerEntityKeyHidden.value;" +
                                    callbackFunc + ";" +
                                "}" +
                            "}";
                        Page.ClientScript.RegisterClientScriptBlock(GetType(), clientFuncName, clientFunc, true);
                        _picker.AfterCallbackClientScript = clientFuncName + "('" + ID + "');";
                    }
                }
            }
        }

        /// <summary>
        /// Makes request to external data source and returns json-result
        /// </summary>         
        protected string GetJSResult()
        {
            string res = string.Empty;

            try
            {
                if (!string.IsNullOrEmpty(_callbackRequestedEntityId))
                {
                    Dictionary<string, byte> bdcFieldFilter = GetBDCFieldFilter(CallbackBDCFieldFilter);
                    
                    BDCRequestById request = new BDCRequestById(LobSystemInstanceName, EntityName, _callbackRequestedEntityId);

                    StringBuilder sb = new StringBuilder();
                    sb.Append("{");
                    foreach (Field field in request.FoundEntityInstance.ViewDefinition.Fields)
                    {
                        if (bdcFieldFilter.Count == 0 || bdcFieldFilter.ContainsKey(field.Name))
                        {
                            if (sb.Length > 1)
                                sb.Append(", ").AppendLine();
                            sb.Append("'").Append(field.Name).Append("' : ");
                            sb.Append("'").Append(Convert.ToString(request.FoundEntityInstance.GetFormatted(field))).Append("'");
                        }
                    }
                    sb.Append("}");                    

                    res = sb.ToString();
                }
            }
            catch (Exception ex)
            {
                // write error to log
            }

            return res;
        }

        /// <summary>
        /// Parses the user defined list of bdc fields, the values of which should be retrieved
        /// </summary>        
        protected static Dictionary<string, byte> GetBDCFieldFilter(string commaSeparatedBdcFields)
        {
            Dictionary<string, byte> res = new Dictionary<string, byte>(StringComparer.OrdinalIgnoreCase);

            if (!string.IsNullOrEmpty(commaSeparatedBdcFields))
            {
                string[] bdcFields = commaSeparatedBdcFields.Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries);
                foreach (string field in bdcFields)
                    res.Add(field, 0);
            }

            return res;
        }
        #endregion
    }
}

You probably noticed that the functions AddJSCallbackFunctions and AddAdditionalJSFunctions generate and add some JavaScript functions to the page. The exact names of these JavaScript functions depend on the id attribute defined in control declaration. For example, if control id is “myItemPicker“, the functions’ name will be GetBdcFieldValuesAsync_myItemPicker, ClearItemPicker_myItemPicker and GetDisplayText_myItemPicker.

Let’s take a look at the functions. The main function is GetBdcFieldValuesAsync_myItemPicker, which extracts the encoded id of selected item from ItemPicker and then makes the Ajax-like client callback to the server. The rest two functions are auxiliary, they are not used by MyItemPicker directly, but they are very useful for developing an interaction between user and MyItemPicker. As their names imply, the ClearItemPicker_myItemPicker clears the ItemPicker, and the GetDisplayText_myItemPicker returns the text displayed to user in ItemPicker. The listing below demonstrates the functions within page:

<script type="text/javascript">

    function GetBdcFieldValuesAsync_myItemPicker(context) {
        var pickerEntityKeyHidden = document.getElementById('myItemPicker_ItemPicker_HiddenEntityKey');
        if (pickerEntityKeyHidden != null) {
            var arg = pickerEntityKeyHidden.value;
            WebForm_DoCallback('myItemPicker', arg, ClientCallback, context, ClientCallbackError, true);
        }
    }

    function ClearItemPicker_myItemPicker() {
        var upLevelDiv = document.getElementById('myItemPicker_ItemPicker_upLevelDiv');
        if (upLevelDiv != null) {
            upLevelDiv.innerHTML = '';
            updateControlValue('myItemPicker_ItemPicker');
        }
    }

    function GetDisplayText_myItemPicker() {
        var hiddenEntityDisplayTextControl = document.getElementById('myItemPicker_ItemPicker_HiddenEntityDisplayText');
        return hiddenEntityDisplayTextControl != null ? hiddenEntityDisplayTextControl.value : '';
    }
</script>

The SetExtendedDataOnPicker and GetJSResult methods of MyItemPicker employ the classes BDCMetaRequest and BDCRequestById that are described in my post SharePoint: How to get value from BDC.

The FindControlRecursive method is mentioned in another my post.

C#: Html Stripper

February 13th, 2012 No comments

    Some time ago I developed the project comparing the visible content of two html-pages and displaying all found differences. The main goal of project was to detect whether page was updated through the time, so usually comparison was between different versions of one html-page. To fetch the visible content and throw away the hidden one I created the Regex-based Html stripper embodied in the HtmlStripper class. The HtmlStripper follows the next three steps:

  1. to remove all service and auxiliary Html tags. In most cases that means to remove the <head>-tag with all its content, along with the <style>, <script> and other tags residing within <body>;
  2. to remove all the rest Html tags;
  3. to replace escape sequences found in Html with its text analogs. In practice that means that, for example,
    • the &nbsp; sequence should be replaced with the space symbol ‘ ‘;
    • &lt; – with ‘<‘;
    • &gt – with ‘>’ and so on

    There is a huge amount of escape sequences (or HTML codes), so the HtmlStripper operates with the most popular in my opinion;

So, the code of HtmlStripper is shown below:

HtmlStripper Source

using System;
using System.Text.RegularExpressions;

namespace Helpers
{
    public static class HtmlStripper
    {
        #region fields
        /// <summary>
        /// Allows to find the HTML tags hidden from view (style, script code and so on)
        /// </summary>
        private static Regex _findHtmlTagsWithInvisibleTextRegex = new Regex
            (@"<head[^>]*?>.*?</head> | <style[^>]*?>.*?</style> | <script[^>]*?.*?</script> | 
               <object[^>]*?.*?</object> | <embed[^>]*?.*?</embed> | <applet[^>]*?.*?</applet> |
               <noframes[^>]*?.*?</noframes> | <noscript[^>]*?.*?</noscript> | <noembed[^>]*?.*?</noembed>",
                RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace | 

RegexOptions.Singleline);

        /// <summary>
        /// Allows to find all HTML tags
        /// </summary>
        private static Regex _findHtmlTagsRegex = new Regex
            (@"<(?:(!--) |(\?) |(?i:( TITLE  | SCRIPT | APPLET | OBJECT | STYLE )) | ([!/A-Za-z]))(?(4)(?:(?![\s=][""`'])

[^>] |[\s=]`[^`]*`
             |[\s=]'[^']*'|[\s=]""[^""]*"")*|.*?)(?(1)(?<=--))(?(2)(?<=\?))(?(3)</(?i:\3)(?:\s[^>]*)?)>",
             RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace | 

RegexOptions.Singleline);

        private static string _escapeGroupName = "escSeq";
        /// <summary>
        /// Allows to find escape sequences that are used in HTML
        /// </summary>
        private static Regex _findHtmlEscapeSequencesRegex = new Regex
            (string.Format(@"[&] (([#](?<{0}>\d+)) | ([#](?<{0}>x[\dabcdef]+)) | (?<{0}>\w+));?", _escapeGroupName),
            RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace | 

RegexOptions.Singleline);
        #endregion

        public static string Strip(string htmlSrc)
        {
            string xTmpStr = _findHtmlTagsWithInvisibleTextRegex.Replace(htmlSrc, "");
            xTmpStr = _findHtmlTagsRegex.Replace(xTmpStr, "");
            return RemoveHtmlEscapeSequences(xTmpStr);
        }        

        private static string RemoveHtmlEscapeSequences(string htmlStr)
        {
            return _findHtmlEscapeSequencesRegex.Replace(htmlStr, delegate(Match match)
            {
                string replacement = GetHtmlEscapeSequenceSubstitution(match.Groups[_escapeGroupName].Value);
                return replacement == null ? match.Value : replacement;
            });
        }

        private static string GetHtmlEscapeSequenceSubstitution(string htmlEscape)
        {
            htmlEscape = htmlEscape.TrimStart('0'); // sometimes number contains leading zeros

            bool? isNumber = null;

            // if it's a hex number, convert it into string containing decimal number
            if (htmlEscape.StartsWith("x"))
            {
                int num = Int32.Parse(htmlEscape.TrimStart('x'), System.Globalization.NumberStyles.HexNumber);
                htmlEscape = num.ToString();
                isNumber = true;
            }

            // check if it's a number
            if(isNumber == null)
            {
                int tmpOut;
                isNumber = int.TryParse(htmlEscape, out tmpOut);
            }

            // find substitution for either the peeled number or the original string
            string res = FindSubstitution(htmlEscape);

            // if it's a number, any further attempts with different string variations are senseless
            if (!isNumber.Value && res == null)
            {                
                // there are a few Html codes consisting of all capital letters,
                // try to find substitution for them
                htmlEscape = htmlEscape.ToUpperInvariant();
                res = FindSubstitution(htmlEscape);               

                if (res == null)
                {
                    // try to find substitution for string with the first capital letter
                    htmlEscape = htmlEscape.Substring(0, 1).ToUpperInvariant() + htmlEscape.Substring(1).ToLowerInvariant();
                    res = FindSubstitution(htmlEscape);
                }

                if (res == null)
                {
                    // try to find substitution for the lower string
                    htmlEscape = htmlEscape.ToLowerInvariant();
                    res = FindSubstitution(htmlEscape);
                }
            }

            return res;
        }

        private static string FindSubstitution(string htmlEscape)
        {
            switch (htmlEscape)
            {
                // All browsers support

                /* space */                             /* zero */
                case "32": return " ";                  case "48": return "0";
                /* exclamation point */                 /* one */
                case "33": return "!";                  case "49": return "1";
                /* double quotes */                     /* two */
                case "34": case "quot": return "\"";    case "50": return "2";
                /* number sign */                       /* three */
                case "35": return "#";                  case "51": return "3";
                /* dollar sign */                       /* four */
                case "36": return "$";                  case "52": return "4";
                /* percent sign */                      /* five */
                case "37": return "%";                  case "53": return "5";
                /* ampersand */                         /* six */
                case "38": case "amp": return "&";      case "54": return "6";
                /* single quote */                      /* seven */
                case "39": return "'";                  case "55": return "7";
                /* opening parenthesis */               /* eight */
                case "40": return "(";                  case "56": return "8";
                /* closing parenthesis */               /* nine */
                case "41": return ")";                  case "57": return "9";
                /* asterisk */                          /* colon */
                case "42": return "*";                  case "58": return ":";
                /* plus sign */                         /* semicolon */
                case "43": return "+";                  case "59": return ";";
                /* comma */                             /* less than sign */
                case "44": return ",";                  case "60": case "lt": return "<";
                /* minus sign - hyphen */               /* equal sign */
                case "45": return "-";                  case "61": return "=";
                /* period */                            /* greater than sign */
                case "46": return ".";                  case "62": case "gt": return ">";
                /* slash */                             /* question mark */
                case "47": return "/";                  case "63" : return "?";
                
                /* at symbol */             case "83": return "S";
                case "64": return "@";      case "84": return "T";
                case "65": return "A";      case "85": return "U";
                case "66": return "B";      case "86": return "V";
                case "67": return "C";      case "87": return "W";
                case "68": return "D";      case "88": return "X";
                case "69": return "E";      case "89": return "Y";
                case "70": return "F";      case "90": return "Z";
                case "71": return "G";      /* opening bracket */
                case "72": return "H";      case "91": return "[";
                case "73": return "I";      /* backslash */
                case "74": return "J";      case "92": return "\\";
                case "75": return "K";      /* closing bracket */
                case "76": return "L";      case "93": return "]";
                case "77": return "M";      /* caret - circumflex */
                case "78": return "N";      case "94": return "^";
                case "79": return "O";      /* underscore */
                case "80": return "P";      case "95": return "_";
                case "81": return "Q";      /* grave accent */
                case "82": return "R";      case "96": return "`";                                            
                         
                case "97": return "a";      case "114": return "r";
                case "98": return "b";      case "115": return "s";
                case "99": return "c";      case "116": return "t";
                case "100": return "d";     case "117": return "u";
                case "101": return "e";     case "118": return "v";
                case "102": return "f";     case "119": return "w";
                case "103": return "g";     case "120": return "x";
                case "104": return "h";     case "121": return "y";
                case "105": return "i";     case "122": return "z";
                case "106": return "j";     /* opening brace */
                case "107": return "k";     case "123": return "{";
                case "108": return "l";     /* vertical bar */
                case "109": return "m";     case "124": return "|";
                case "110": return "n";     /* closing brace */
                case "111": return "o";     case "125": return "}";
                case "112": return "p";     /* equivalency sign - tilde */
                case "113": return "q";     case "126": return "~";

                /* non-breaking space */                    /* degree sign */
                case "160": case "nbsp" : return " ";       case "176": case "deg": return "°";
                /* inverted exclamation mark */             /* plus-or-minus sign */
                case "161": case "iexcl": return "¡";       case "177": case "plusmn": return "±";
                /* cent sign */                             /* superscript two - squared */
                case "162": case "cent" : return "¢";       case "178": case "sup2": return "²";
                /* pound sign */                            /* superscript three - cubed */
                case "163": case "pound": return "£";       case "179": case "sup3": return "³";
                /* currency sign */                         /* acute accent - spacing acute */
                case "164": case "curren": return "¤";      case "180": case "acute": return "´";
                /* yen sign */                              /* micro sign */
                case "165": case "yen": return "¥";         case "181": case "micro": return "µ";
                /* broken vertical bar */                   /* pilcrow sign - paragraph sign */
                case "166": case "brvbar": return "¦";      case "182": case "para": return "¶";
                /* section sign */                          /* middle dot - Georgian comma */
                case "167": case "sect": return "§";        case "183": case "middot": return "·";
                /* spacing diaeresis - umlaut */            /* spacing cedilla */
                case "168": case "uml": return "¨";         case "184": case "cedil": return "¸";
                /* copyright sign */                        /* superscript one */
                case "169": case "copy": return "©";        case "185": case "sup1": return "¹";
                /* feminine ordinal indicator */            /* masculine ordinal indicator */
                case "170": case "ordf": return "ª";        case "186": case "ordm": return "º";
                /* left double angle quotes */              /* right double angle quotes */
                case "171": case "laquo": return "«";       case "187": case "raquo": return "»";
                /* not sign */                              /* fraction one quarter */
                case "172": case "not": return "¬";         case "188": case "frac14": return "¼";
                /* soft hyphen */                           /* fraction one half */
                case "173": case "shy": return "­";          case "189": case "frac12": return "½";
                /* registered trade mark sign */            /* fraction three quarters */
                case "174": case "reg": return "®";         case "190": case "frac34": return "¾";
                /* spacing macron - overline */             /* inverted question mark */
                case "175": case "macr": return "¯";        case "191": case "iquest": return "¿";

                /* latin capital letter A with grave */         /* latin capital letter ETH */
                case "192": case "Agrave": return "À";          case "208": case "ETH": return "Ð";
                /* latin capital letter A with acute */         /* latin capital letter N with tilde */
                case "193": case "Aacute": return "Á";          case "209": case "Ntilde": return "Ñ";
                /* latin capital letter A with circumflex */    /* latin capital letter O with grave */
                case "194": case "Acirc": return "Â";           case "210": case "Ograve": return "Ò";
                /* latin capital letter A with tilde */         /* latin capital letter O with acute */
                case "195": case "Atilde": return "Ã";          case "211": case "Oacute": return "Ó";
                /* latin capital letter A with diaeresis */     /* latin capital letter O with circumflex */
                case "196": case "Auml": return "Ä";            case "212": case "Ocirc": return "Ô";
                /* latin capital letter A with ring above */    /* latin capital letter O with tilde */
                case "197": case "Aring": return "Å";           case "213": case "Otilde": return "Õ";
                /* latin capital letter AE */                   /* latin capital letter O with diaeresis */
                case "198": case "AElig": return "Æ";           case "214": case "Ouml": return "Ö";
                /* latin capital letter C with cedilla */       /* multiplication sign */
                case "199": case "Ccedil": return "Ç";          case "215": case "times": return "×";
                /* latin capital letter E with grave */         /* latin capital letter O with slash */
                case "200": case "Egrave": return "È";          case "216": case "Oslash": return "Ø";
                /* latin capital letter E with acute */         /* latin capital letter U with grave */
                case "201": case "Eacute": return "É";          case "217": case "Ugrave": return "Ù";
                /* latin capital letter E with circumflex */    /* latin capital letter U with acute */
                case "202": case "Ecirc": return "Ê";           case "218": case "Uacute": return "Ú";
                /* latin capital letter E with diaeresis */     /* latin capital letter U with circumflex */
                case "203": case "Euml": return "Ë";            case "219": case "Ucirc": return "Û";
                /* latin capital letter I with grave */         /* latin capital letter U with diaeresis */
                case "204": case "Igrave": return "Ì";          case "220": case "Uuml": return "Ü";
                /* latin capital letter I with acute */         /* latin capital letter Y with acute */
                case "205": case "Iacute": return "Í";          case "221": case "Yacute": return "Ý";
                /* latin capital letter I with circumflex */    /* latin capital letter THORN */
                case "206": case "Icirc": return "Î";           case "222": case "THORN": return "Þ";
                /* latin capital letter I with diaeresis */     /* latin small letter sharp s - ess-zed */
                case "207": case "Iuml": return "Ï";            case "223": case "szlig": return "ß";
                
                /* latin small letter a with grave */           /* latin small letter eth */
                case "224": case "agrave": return "à";          case "240": case "eth": return "ð";
                /* latin small letter a with acute */           /* latin small letter n with tilde */
                case "225": case "aacute": return "á";          case "241": case "ntilde": return "ñ";
                /* latin small letter a with circumflex */      /* latin small letter o with grave */
                case "226": case "acirc": return "â";           case "242": case "ograve": return "ò";
                /* latin small letter a with tilde */           /* latin small letter o with acute */
                case "227": case "atilde": return "ã";          case "243": case "oacute": return "ó";
                /* latin small letter a with diaeresis */       /* latin small letter o with circumflex */
                case "228": case "auml": return "ä";            case "244": case "ocirc": return "ô";
                /* latin small letter a with ring above */      /* latin small letter o with tilde */
                case "229": case "aring": return "å";           case "245": case "otilde": return "õ";
                /* latin small letter ae */                     /* latin small letter o with diaeresis */
                case "230": case "aelig": return "æ";           case "246": case "ouml": return "ö";
                /* latin small letter c with cedilla */         /* division sign */
                case "231": case "ccedil": return "ç";          case "247": case "divide": return "÷";
                /* latin small letter e with grave */           /* latin small letter o with slash */
                case "232": case "egrave": return "è";          case "248": case "oslash": return "ø";
                /* latin small letter e with acute */           /* latin small letter u with grave */
                case "233": case "eacute": return "é";          case "249": case "ugrave": return "ù";
                /* latin small letter e with circumflex */      /* latin small letter u with acute */
                case "234": case "ecirc": return "ê";           case "250": case "uacute": return "ú";
                /* latin small letter e with diaeresis */       /* latin small letter u with circumflex */
                case "235": case "euml": return "ë";            case "251": case "ucirc": return "û";
                /* latin small letter i with grave */           /* latin small letter u with diaeresis */
                case "236": case "igrave": return "ì";          case "252": case "uuml": return "ü";
                /* latin small letter i with acute */           /* latin small letter y with acute */
                case "237": case "iacute": return "í";          case "253": case "yacute": return "ý";
                /* latin small letter i with circumflex */      /* latin small letter thorn */
                case "238": case "icirc": return "î";           case "254": case "thorn": return "þ";
                /* latin small letter i with diaeresis */       /* latin small letter y with diaeresis */
                case "239": case "iuml": return "ï";            case "255": case "yuml": return "ÿ";

                // Browser support: Internet Explorer > 4, Netscape > 4   
                
                /* latin capital letter OE */                   /* single low-9 quotation mark */    
                case "338": return "Œ";                         case "8218": return "‚";             
                /* latin small letter oe */                     /* left double quotation mark */     
                case "339": return "œ";                         case "8220": return "“";             
                /* latin capital letter S with caron */         /* right double quotation mark */
                case "352": return "Š";                         case "8221": return "”";
                /* latin small letter s with caron */           /* double low-9 quotation mark */
                case "353": return "š";                         case "8222": return "„";
                /* latin capital letter Y with diaeresis */     /* dagger */
                case "376": return "Ÿ";                         case "8224": return "†";
                /* latin small f with hook - function */        /* double dagger */
                case "402": return "ƒ";                         case "8225": return "‡";
                                                                /* bullet */
                /* en dash */                                   case "8226": return "•";
                case "8211": return "–";                        /* horizontal ellipsis */
                /* em dash */                                   case "8230": return "…";
                case "8212": return "—";                        /* per thousand sign */
                /* left single quotation mark */                case "8240": return "‰";
                case "8216": return "‘";                        /* euro sign */
                /* right single quotation mark */               case "8364": case "euro": return "€";
                case "8217": return "’";                        /* trade mark sign */
                                                                case "8482": return "™";
            }

            return null;
        }        
    }
}

Applying the HtmlStripper to the simple piece of Html like this

 
<div>text in the first div&nbsp;
     <div><text in the nested div><div/>
     &nbsp;&#8364;text in the first div again
</div>

the following result is got

text in the first div 
    <text in the nested div>
     €text in the first div again

The next code sample demonstrates how to use HtmlStripper in field conditions:

using System;
using System.Text;
using System.Web;
using System.Net;
using System.IO;
using Helpers;

namespace HtmlStripperConsoleApp
{
    class Program
    {
        static void Main(string[] args)
        {
            String responseString = null;

            WebRequest request = HttpWebRequest.Create("http://dotnetfollower.com/wordpress/2011/12/sharepoint-understanding-businessdata-column-bdc-field/");            
        
            using(WebResponse response = request.GetResponse())
                using (Stream stream = response.GetResponseStream())
                {
                    StreamReader reader = new StreamReader(stream, Encoding.UTF8);
                    responseString = reader.ReadToEnd();
                }
            
            string strippedText = HtmlStripper.Strip(responseString);
            File.WriteAllText(@"c:\text.txt", strippedText);            
        }
    }
}
Categories: C#, Regex Tags: ,

WinForms: Show progress on long-running operations

January 30th, 2012 No comments

    When I have a long-running operation to execute I always use the best practice demonstrated here. Keeping the UI responsive to user interaction, I start a lengthy task in a separate thread and then update a ProgressBar control step by step from the executing operation. The progress data is passed to UI thread through the BeginInvoke or Invoke methods of the ProgressBar control or any other UI control, since all UI controls are created in UI thread and must be updated from it. Much time has passed since the time when the article was written. So now we have Anonymous Methods, Actions, Funcs, the upgraded ThreadPool class and so on. Using these innovations I’ve rewritten the approach described by Chris Sells. Look at the code sample below. This code sample simulates a treatment of group of files:

private void startBtn_Click(object sender, EventArgs e)
{
    // the array of file paths
    string[] filePaths = ... 

    UpdateProgressBar(0);            

    // start lengthy task in separate thread
    ThreadPool.QueueUserWorkItem(new WaitCallback(new Action<object>(delegate(object state)
    {                
        DoSomethingWithFiles(filePaths);
    })));
}

private void DoSomethingWithFiles(string[] filePaths)
{
    for (int i = 0; i < filePaths.Length; i++)
    {
        string fileText = File.ReadAllText(filePaths[i]);

        // do something with the read file content
        ...
                
        // set refreshed value to ProgressBar
        UpdateProgressBar((i + 1) * 100 / filePaths.Length);                
    }

    // set refreshed value to ProgressBar
    UpdateProgressBar(100);
}

private void UpdateProgressBar(int currentValue)
{
    // if it's UI thread, simply update ProgressBar as usual. 
    // Otherwise, make the current method be called in UI thread.
    if (progressBar1.InvokeRequired)
        progressBar1.BeginInvoke(new Action(delegate() { UpdateProgressBar(currentValue); }));
    else
        progressBar1.Value = currentValue;
}

It’s assumed that startBtn and progressBar1 are added to the form. When startBtn is clicked, the startBtn_Click method is called and starts the DoSomethingWithFiles long-running operation in separate thread. The DoSomethingWithFiles calls the UpdateProgressBar from time to time to show the current progress. The UpdateProgressBar checks whether it’s called from UI thread. If so, it just updates the ProgressBar, otherwise, initiates its own invoking from the UI thread.

As you can see, the obtained code does the same but is more compact.

Related posts:
Categories: C#, WinForms Tags: ,