Notes on MSHTML editing and empty elements

We ran into a problem recently with the MSHTML editor where empty paragraphs would collapse when the user saved or printed the document.  If  the user loaded the document again, the empty elements seem to disappear entirely.  Logically, this makes sense: an empty element has 0x0 dimensions, so will take up no space.  But if the user adds a blank line, they would not expect it to disappear after saving: MSHTML gives these new empty elements dimensions as if they contained a non breaking space, until the document is saved and reloaded.

Let’s look at that visually.  We’d type ONETWOTHREE into the editor, and all would be fine, as shown on the left.  After reloading, it would display as shown on the right:

Source HTML editor and collapsing HTML result

So what is going on, and what’s the solution?

The basic solution is to add an   entity (non-breaking space) to empty elements to give them a non-zero width and height.  MSHTML does this.  But there’s a heap of complexity around this.  From my analysis of the problem, it seems that the core issue is that element.innerHTML or domnode.childNodes.length is returning “” or 0 respectively when viewing the full source shows that there is actually an   in the element.  This happens only when the MSHTML editor is active.

The complexity arises when one looks at the different edit modes and methods of loading documents, because each mode has slightly different symptoms.  Whilst juggling the tangle of symptoms that these issues present, we also need to consider the following requirements:

  1. A document may contain both empty

    elements and

     

    elements, and the editor must not conflate the two when loading.

  2. When the user inserts a blank line, it must not collapse.  Nevertheless, the user does not want to learn about non-breaking spaces, so the editor must transparently manage this. Ideally, the non-breaking space would be hidden from the user but managed in the back end.  I must say that MSHTML is very close to this ideal.

A diversion: this is not a new problem, and many solutions have been proposed. One solution suggested in various forums online is to use
to break lines instead of the paragraph model with

or

.  This is not a great answer: it means the whole document has a single paragraph style.  I do note however that this is what Blogger and some other blog editors do, but then they do dynamically insert a

when the user changes the paragraph style.  Still not pretty.

Now where I talk about

elements, feel free to imagine that I am talking about

elements.  The behaviour appears to be much the same, and there’s just a flag to switch between the two elements. Also, I’ll be using Delphi for the code samples because it hides a lot of the necessary COM guff and makes the examples much easier to read.

When a new blank line is inserted into the editor, behind the scenes MSHTML will add an   entity to prevent the element from collapsing.  When you type the first letter, the   is deleted.  MSHTML also makes the   itself invisible to the user.  This is great.  It’s exactly what we want.

So let’s look at the activation of the editor and what is happening there. It turns out that there are three ways of making a document editable — four if you include the undocumented IDM_EDITMODE command that some editor component wrappers use.  So what are these four methods?

  1. Set document.designMode to “On”.

      D := WebBrowser.Document as IHTMLDocument2;
      D.designMode := ‘On’;
     

  2. Set document.body.contentEditable to “true” (or anyElement.contentEditable).

      D := WebBrowser.Document as IHTMLDocument2;
      (D.body as IHTMLElement3).contentEditable := ‘true’;
     

  3. The DISPID_AMBIENT_USERMODE ambient property.  See the link for an example.
     
  4. The aforementioned IDM_EDITMODE command ID.  I’m not condoning this method, just documenting it because some editor wrappers use it.

      D := WebBrowser.Document as IHTMLDocument2;
      (D as IOleWindow).GetWindow(hwnd_);

      SendMessage(hwnd_, WM_COMMAND, IDM_EDITMODE, 0);

To make things even more complicated, there are different ways of loading content into the HTML editor, and different methods have different outcomes.  The three methods we explored were using Navigate, document.write, and IPersistFile.

  1. Using editor.Navigate to load either a local or remote document.

      WebBrowser.Navigate(DocFileName);
     

  2. Using document.write to write a complete document.

      D := WebBrowser.Document as IHTMLDocument2;
      VarArray := VarArrayCreate([0, 0], varVariant);
      VarArray[0] := DocText;
      D.write(PSafeArray(TVarData(VarArray).VArray));
      D.close;
     

  3. Accessing the editor’s IPersistFile interface to load a document.

      D := WebBrowser.Document;
      PersistFile := D as IPersistFile;
      PersistFile.Load(PWideChar(DocFileName), 0);

It turns out that if you use either Navigate or contentEditable, then MSHTML will not hide   from the end user for elements already in the document.  New empty elements typed by the user will still have the default behaviour described previously.  This is inconsistent and confusing to both me (the poor developer) and the end user.

The following table shows how   are treated in otherwise empty elements when loaded in the various ways:

Navigate document.write IPersistFile
designMode
visible
invisible
invisible
contentEditable
visible
visible
visible
IDM_EDITMODE
visible
invisible
invisible
DISPID_AMBIENT_USERMODE
not tested

Now, it turns out that the visible results on that matrix are not going to do what we are looking for.  That’s because, as mentioned, any new empty elements typed by the user will still have the invisible behaviour for that empty element  .

Here’s a final example to clarify the situation.  Given the following document loaded into the MSHTML editor using document.write, and designMode = “On”.

 

 

With this result, we get the following results.

Code Result
doc.getElementById(“a”).innerHTML
doc.getElementById(“a”).outerHTML
doc.getElementById(“a”).parentElement.innerHTML
 
doc.getElementById(“a”).parentElement.outerHTML
 
doc.documentElement.outerHTML


 

So. How do we determine if an element is empty and collapsed, or is a blank line?  MSHTML isn’t consistent with its innerHTML, outerHTML or DOM text node properties of the element.

But wait, all hope is not lost!  It turns out that IHTMLElement3 has a little buried property called inflateBlock.  This property tells you whether or not an empty element will be ‘inflated’ to appear as though it has content.  This little known property (I found no discussions or blogs about it!) should solve our problem neatly:

isElementTrulyEmpty := (element.innerHTML = ”) and not (element as IHTMLElement3).inflateBlock;

isElementJustABlankLine := (element.innerHTML = ‘ ‘) or ((element.innerHTML = ”) and (element as IHTMLElement3).inflateBlock);

Now I just have to push this fix into the HTML editor component wrapper we are using.  At least I’ve already written the documentation around the fix!

Final note: even the Blogger editor that I’m using to write this post has trouble with consistency with new lines.  Here’s an example — look at the spacing around the code samples.

A screenshot of this blog post, in the editor (Firefox)

A screenshot of this blog post, previewing (Firefox)

1 thought on “Notes on MSHTML editing and empty elements

Leave a Reply to PB Cancel reply

Your email address will not be published. Required fields are marked *