Category Archives: Windows

Debugging a Windows Service

This is a set of notes on how to debug a Windows service starting up, mostly for my reference. Building on https://www.sysadmins.lv/retired-msft-blogs/alejacma/how-to-debug-windows-services-with-windbg.aspx with command line steps where possible.

In this example, we’ll be debugging mycool.exe, which has the service name mycoolservice.

Enabling debugging

  1. Find the path to cdb.exe, windbg.exe, gflags.exe. (e.g. C:\Program Files (x86)\Windows Kits\10\Debuggers\x86).

  2. Start an elevated command prompt. Set the service to manual start (and stop it if it is currently running, … duh):

    sc config mycoolservice start=demand
    sc stop mycoolservice
    
  3. Find the short path for cdb.exe (pasting the path from point 1 as appropriate):

    for %A in ("C:\Program Files (x86)\Windows Kits\10\Debuggers\x86\cdb.exe") do @echo %~sA
    
  4. Enable the debug hook for the service, using gflags, replacing the path as necessary:

    C:\PROGRA~2\WI3CF2~1\10\DEBUGG~1\x86\gflags /p /enable mycool.exe /debug "C:\PROGRA~2\WI3CF2~1\10\DEBUGG~1\x86\cdb.exe -server tcp:port=9999"
    
  5. Change the service startup timeout to 1 hour to avoid Windows killing the service on startup:

    reg add HKLM\System\CurrentControlSet\Control /v ServicesPipeTimeout /t REG_DWORD /d 3600000
    
  6. Reboot, start an elevated command prompt again.

  7. Start the service, which will appear to hang:

    sc start mycoolservice
    
  8. Open Windbg, Ctrl+R tcp:server=localhost,port=9999

  9. Go forth and debug.

Disable debugging

  1. Start an elevated command prompt, and enter the following commands:

    C:\PROGRA~2\WI3CF2~1\10\DEBUGG~1\x86\gflags /p /disable mycool.exe
    reg delete HKLM\System\CurrentControlSet\Control /v ServicesPipeTimeout
    
  2. Reset the service startup parameters to your preferred startup type.

  3. Reboot to reset the service control timeout.

The Case of the Overly Busy Process

The other day, I was running a routine Process Monitor (Procmon) trace to debug an issue in Keyman, when I noticed something strange: over 50% of the events displayed with the default filter (which excludes a lot of system-level noise and procmon-related feedback) were coming from a single process: services.exe.

You can see in the image below I’ve added services.exe to the filter (Process Name is services.exe), and then the status bar shows 52% of events belonging to it.

Puzzled, I set aside some time to dig a little further (which means I went to bed late one evening). Watching Process Explorer, I could see that services.exe and wmiprvse.exe were between them consuming about 10% of my CPU. This did not seem normal. Nor did it seem to be a good thing for my battery life.

Deciding to examine the trace a little, I filtered out common registry keys and events, such as RegCloseKey, which made it easier to spot a pattern. It became obvious that every 5 seconds, services.exe, with the help of wmiprvse.exe, would enumerate the list of services from the registry, sending about 120,000 events to the Procmon trace in the process. Nearly 80% of the events captured each minute by Procmon were generated by either services.exe or wmiprvse.exe!

Nearly 80% of the events captured each minute by Procmon were generated by either services.exe or wmiprvse.exe!

Given that wmiprvse.exe, the Windows Management Instrumentation (WMI) provider host, was involved, it seemed likely that there was a process issuing WMI queries against the Services provider, such as you can do with PowerShell:

Get-WmiObject Win32_Service | Format-Table Name, DisplayName, State, StartMode, StartName

It was just a matter of figuring out which one.

I started off by trying to dig into WMI logging. I don’t know if you’ve ever dug into that, but it’s huge, complex and somewhat impenetrable. It is likely that with the right knowledge I could have issued a command that gave me a list of queries being issued and who was issuing them. But I have not yet acquired that knowledge, sadly, and late at night my brain did not feel up to the attempt.

It seemed easier to instead to use a process of elimination of processes (yeah, I did that on purpose). I started the CPU monitor in Process Explorer for the services.exe process, which showed lovely 5 second spikes.

Then I started to stop various services, watching to see if the spiking stopped. It didn’t. Once I was down to a handful of critical services (do I really need to run the Firewall service?) I started looking at background user-level processes, such as the icons sitting in the System Notification Area.

And here I hit gold. After shutting down a few, including my own programs, with no noticeable change, I shutdown MySQL Notifier 1.1.7.

All of a sudden, CPU activity dropped to zero on the services.exe process, and the next Procmon trace showed a mere 85 events in a minute for the services.exe and wmiprvse.exe pair.

Success!

I checked the MySQL Notifier forums and saw no discussion of this issue, but I found a closed bug report in the bug database. I’ll have to add my comment to the bug report.

Once again, Procmon comes to the rescue 🙂 I’m looking forward to the increased battery life already!

I know it’s not the most elegant way to debug a problem, but sometimes it is quicker and easier than the alternatives. It’s especially easy to use process of elimination like this late at night, without having to think hard about it. 😉

Substituted layouts in Text Services Framework

There is a tantalising parameter in the ITfInputProcessorProfileMgr::RegisterProfile function called hklSubstitute.  The description (used to) read “The substitute keyboard layout of this profile.   When this profile is active, the keyboard will use this keyboard layout instead of the default keyboard layout.  This can be NULL if your text service doesn’t change the keyboard layout.”

This functionality also surfaces in ITfInputProcessorProfileSubstituteLayout::GetSubstituteKeyboardLayout and ITfInputProcessorProfiles::SubstituteKeyboardLayout.

From the minimal description in MSDN, this sounds like a feature that we want to use in Keyman Desktop (allowing us to, for instance, use kbdus.dll as a base layout for some Keyman keyboard layouts).  However, we have been unable to get it to work.

  1. It seems the parameter type in the documentation is not quite right.  This parameter is not a HKL that you can obtain from LoadKeyboardLayout or some such.  Instead it needs to be a KLID, a keyboard layout ID, such as 0x00010409 for US Dvorak.
  2. The base layout is limited to one linked to the language the TIP is being installed for.  This means you cannot, for instance, substitute German QWERTZ for a French TIP, even if you wanted to. (And we want to!)
  3. Despite these restrictions, the RegisterProfile function does not check the validity of the parameter and happily tells you that the profile registered OK.
  4. While GetSubstituteKeyboardLayout returns the exact value that you registered, ITfInputProcessorProfileMgr::GetProfile returns 0 for the hklSubstitute member unless you meet the preconditions described above.  Furthermore, hklSubstitute, when available, is not a HKL but a KLID.
  5. Finally, even after all this, it seems that the substitute keyboard layout does not apply.

I received some clarification from Microsoft (edited slightly, thank you Vladimir Smirnov):

In fact, the feature is not that generic as you apparently expect. Its whole purpose was to enable smooth deprecation of IMM32 IMEs and their replacement (substitution) with TSF IMEs (aka TIPs). It doesn’t support substituting an arbitrary plain keyboard layout with another layout or IME (let alone of a different language).

The hklSubstitute is supposed to be a real HKL, but for IMM32 IMEs it actually matches the IME’s KLID. There’s old verification code there which looks for a matching KLID in HKLM for a given hklSubstitute and ignores the latter if there’s no match. That’s why you get 0 from GetProfile for your invalid or unsupported hklSubstitute.

Generic substitution was just never meant to be supported in TSF.

It’s a shame, but we work with what we’ve got. So for Keyman Desktop 9, we wrote a mnemonic layout recompiler that recompiles a Keyman mnemonic layout keyboard, merging it with a given Windows system base layout, at keyboard installation time in Keyman Desktop.

Let’s Encrypt on Windows, redux

Jan 2020, please note: This approach is now deprecated. Let’s Encrypt will stop accepting ACMEv1 requests in June 2020. Have a look at https://letsencrypt.org/docs/client-options/ for alternatives to this process.

A couple of months ago I wrote a script to automatically renew Let’s Encrypt certificates in PowerShell on Windows. The renewal process works really well, however, there is one wrinkle that I did not cover. In this blog post I smooth out that wrinkle!

The wrinkle

When it came time to renew my certificate, a few days ago (well within the 90 day limit for the certificate, mind you), I discovered that the identifier for the certificate had expired. Why was this a problem? Well, I had originally used a manual DNS challenge to validate the identifier with Let’s Encrypt. This worked fine, but of course, manually creating a new identifier and challenge every 90 days completely undoes the benefit of automatic certificate renewal.

Smoothing out the wrinkle

In order to resolve this, I needed to automatically generate and validate an identifier at the same time as I generated a new certificate. The only automatic challenge provider at this time with ACMESharp is the http-01 provider with the IIS handler.

So I updated the script to generate the identifier automatically and validate with the http-01 challenge provider. However, the http-01 challenge provider requires a HTTP request, not a HTTPS request, to the hostname in question, because, and I quote the IETF draft here:

… Because many webservers allocate a default HTTPS virtual host to a particular low-privilege tenant user in a subtle and non-intuitive manner, the challenge must be completed over HTTP, not HTTPS.

I love subtle and non-intuitive computing!

But this was a problem for me, because the secure site I was working on did not have a http endpoint, only a https endpoint, which is the whole reason I used a DNS challenge in the first place.

I finally threw in the towel and I decided to setup a http endpoint, and configure automatic redirection to https (yes, I could go further here). You may find you need to setup an exception for automatic redirection for the .well-known/ root folder (where http challenges are kept as static files). I’ll leave that tweak for you to figure out (it’s just another line or two in your site root web.config).

A deeper wrinkle

The wrinkles get a little deeper, when you look at what happens if you already have a valid challenge response for an identifier. This could happen if you had manually validated another alias for the same identifier using e.g. dns, as I had in the past, or if you are renewing before your existing identifier expires. Because the challenge responses are matched to the hostname, and not the aliases, the previously valid responses continue to be acceptable. In this situation, the existing challenge response was used by the Let’s Encrypt server, and it never checked my shiny new web server challenge endpoint. This meant we needed to see if any existing challenge responses were considered valid, rather than relying on checking the challenge we’d just setup. The script changes handle this scenario.

Just read me the script

The full PowerShell script is shown below; the changes begin at the start of the Try block, and finish after the New-ACMECertificate call. More detail on how the script works is provided in the previous blog post.

import-module ACMESharp

#
# Script parameters
#

$domain = "my.example.com"
$iissitename = "my.example.com"
$certname = "my.example.com-$(get-date -format yyyy-MM-dd--HH-mm)"

#
# Environmental variables
#

$PSEmailServer = "localhost"
$LocalEmailAddress = "[email protected]"
$OwnerEmailAddress = "[email protected]"
$pfxfile = "c:\Admin\Certs\$certname.pfx"
$CertificatePassword = "PASSWORD!"

#
# Script setup - should be no need to change things below this point
#

$ErrorActionPreference = "Stop"
$EmailLog = @()

#
# Utility functions
#

function Write-Log {
  Write-Host $args[0]
  $script:EmailLog  += $args[0]
}

Try {
  Write-Log "Generating a new identifier for $domain"
  New-ACMEIdentifier -Dns $domain -Alias $certname
  
  Write-Log "Completing a challenge via http"
  Complete-ACMEChallenge $certname -ChallengeType http-01 -Handler iis -HandlerParameters @{ WebSiteRef = $iissitename }
  
  Write-Log "Submitting the challenge"
  Submit-ACMEChallenge $certname -ChallengeType http-01
  
  # Check the status of the identifier every 6 seconds until we have an answer; fail after a minute
  $i = 0
  do {
    $identinfo = (Update-ACMEIdentifier $certname -ChallengeType http-01).Challenges | Where-Object {$_.Status -eq "valid"}
    if($identinfo -eq $null) {
      Start-Sleep 6
      $i++
    }
  } until($identinfo -ne $null -or $i -gt 10)

  if($identinfo -eq $null) {
    Write-Log "We did not receive a completed identifier after 60 seconds"
    $Body = $EmailLog | out-string
    Send-MailMessage -SmtpServer $PSEmailServer -From $LocalEmailAddress -To $OwnerEmailAddress -Subject "Attempting to renew Let's Encrypt certificate for $domain" -Body $Body
    Exit
  }
  
  # We now have a new identifier... so, let's create a certificate
  Write-Log "Attempting to renew Let's Encrypt certificate for $domain"

  # Generate a certificate
  Write-Log "Generating certificate for $domain"
  New-ACMECertificate $certname -Generate -Alias $certname

  # Submit the certificate
  Submit-ACMECertificate $certname

  # Check the status of the certificate every 6 seconds until we have an answer; fail after a minute
  $i = 0
  do {
    $certinfo = Update-AcmeCertificate $certname
    if($certinfo.SerialNumber -eq "") {
      Start-Sleep 6
      $i++
    }
  } until($certinfo.SerialNumber -ne "" -or $i -gt 10)

  if($i -gt 10) {
    Write-Log "We did not receive a completed certificate after 60 seconds"
    $Body = $EmailLog | out-string
    Send-MailMessage -SmtpServer $PSEmailServer -From $LocalEmailAddress -To $OwnerEmailAddress -Subject "Attempting to renew Let's Encrypt certificate for $domain" -Body $Body
    Exit
  }

  # Export Certificate to PFX file
  Get-ACMECertificate $certname -ExportPkcs12 $pfxfile -CertificatePassword $CertificatePassword

  # Import the certificate to the local machine certificate store 
  Write-Log "Import pfx certificate $pfxfile"
  $certRootStore = "LocalMachine"
  $certStore = "My"
  $pfx = New-Object System.Security.Cryptography.X509Certificates.X509Certificate2
  $pfx.Import($pfxfile,$CertificatePassword,"Exportable,PersistKeySet,MachineKeySet") 
  $store = New-Object System.Security.Cryptography.X509Certificates.X509Store($certStore,$certRootStore) 
  $store.Open('ReadWrite')
  $store.Add($pfx) 
  $store.Close() 
  $certThumbprint = $pfx.Thumbprint

  # Bind the certificate to the requested IIS site (all https bindings)
  Write-Log "Bind certificate with Thumbprint $certThumbprint"
  $obj = get-webconfiguration "//sites/site[@name='$iissitename']"
  for($i = 0; $i -lt $obj.bindings.Collection.Length; $i++) {
    $binding = $obj.bindings.Collection[$i]
    if($binding.protocol -eq "https") {
      $method = $binding.Methods["AddSslCertificate"]
      $methodInstance = $method.CreateInstance()
      $methodInstance.Input.SetAttributeValue("certificateHash", $certThumbprint)
      $methodInstance.Input.SetAttributeValue("certificateStoreName", $certStore)
      $methodInstance.Execute()
    }
  }

  # Remove expired LetsEncrypt certificates for this domain
  Write-Log "Remove old certificates"
  $certRootStore = "LocalMachine"
  $certStore = "My"
  $date = Get-Date
  $store = New-Object System.Security.Cryptography.X509Certificates.X509Store($certStore,$certRootStore) 
  $store.Open('ReadWrite')
  foreach($cert in $store.Certificates) {
    if($cert.Subject -eq "CN=$domain" -And $cert.Issuer.Contains("Let's Encrypt") -And $cert.Thumbprint -ne $certThumbprint) {
      Write-Log "Removing certificate $($cert.Thumbprint)"
      $store.Remove($cert)
    }
  }
  $store.Close() 

  # Finished
  Write-Log "Finished"
  $Body = $EmailLog | out-string
  Send-MailMessage -SmtpServer $PSEmailServer -From $LocalEmailAddress -To $OwnerEmailAddress -Subject "Let's Encrypt certificate renewed for $domain" -Body $Body
} Catch {
  Write-Host $_.Exception
  $ErrorMessage = $_.Exception | format-list -force | out-string
  $EmailLog += "Let's Encrypt certificate renewal for $domain failed with exception`n$ErrorMessage`r`n`r`n"
  $Body = $EmailLog | Out-String
  Send-MailMessage -SmtpServer $PSEmailServer -From $LocalEmailAddress -To $OwnerEmailAddress -Subject "Let's Encrypt certificate renewal for $domain failed with exception" -Body $Body
  Exit
}

Side note: I discovered it’s important to let ACMESharp do its thing in the script and not try and do anything with it in another process because it tends to fall over with an exception if some other process is accessing its vault when it wants to.

Another minor update (18 Feb 2017): my $alias in my live script happened to be the same as my $domain; if they differed, then the script would fail. The $alias variable is no longer needed and has been removed from the script above. Thank you to BdN3504 for reporting this in the comments!

Yet another update (15 Apr 2018): The $PSEmailServer variable in my script was defined but never used. Send-MailMessage calls updated to use it with -SmtpServer. Thank you to Roger for reporting.

 

Automating certificate renewal with Let’s Encrypt and ACMESharp on Windows

Jan 2020, please note: This approach is now deprecated. Let’s Encrypt will stop accepting ACMEv1 requests in June 2020. Have a look at https://letsencrypt.org/docs/client-options/ for alternatives to this process.

 

UPDATE: 10 February 2017! I’ve updated the script in a new blog post to handle identifer expiry. 

 

Let’s Encrypt is a free, automated, and open Certificate Authority. And it is awesome. It is being used by over 15 million domains already to date.

le-logo-standard

Let’s Encrypt is a certificate authority. So that means that they issue certificates, specifically for secure https (TLS) websites. Lots of other organisations do this as well. But two things stand out about Let’s Encrypt. First, it’s free! Given that I was paying over $100/year for a certificate for one of my sites until recently, that’s a big win already. The second is, it’s automated!

The automated bit cannot be understated. It means that the first time I use Let’s Encrypt, I have to do a bunch of setup. But from then on, I no longer have to remember the arcane and complicated process of generating a certificate request, uploading it to a CA, waiting for the CA to process the request, and finally importing the certificate along with all the incidentals such as intermediate certificates.

However, the one thing about Let’s Encrypt that has stopped me using it so far is that I run some of my sites on IIS on Windows, but Let’s Encrypt is very *nix-focused. While there are clients for Windows, none of them are very complete and so it’s been a bit of hit and miss using them.

The most up-to-date client/library that I have found appears to be ACMESharp. This library is a PowerShell module, and while there is a GUI front end available, I haven’t used the GUI. I’ve worked entirely with the PowerShell module. ACMESharp is pretty flexible and covers everything I need, except one thing: renewals. It has no built-in automated renewal support. So I rolled my own.

I did most of my work in the Let’s Encrypt staging environment, after foolishly starting in the live environment and rapidly hitting the duplicate certificate rate limit. I recommend you do your testing in the staging environment also!

The easiest way to work in the staging environment is to setup a separate vault profile for ACMESharp. I ended up using my :user profile for staging and my :sys profile for the live host. To specify which vault profile you want to use, it’s best to use an environment variable, as otherwise you’ll inevitably forget to append the -VaultProfile parameter to one of your setup commands and leave yourself in a bit of a mess:

$env:ACMESHARP_VAULT_PROFILE=":user"

What follows is a script setup for my servers, but which should work for most Windows PowerShell scenarios. I have set it up to email me on success or failure; I’ll be watching it over the next little while to ensure that it gets things right. I have set it up as a scheduled task to run every 60 days, per Let’s Encrypt’s recommendation.

The script assumes you have already followed the ACMESharp Quick Start to configure your environment. The variables at the top of the script could be configured as script parameters, but for simplicity I’ve just put them at the top of the script. The script will request a new certificate, import the newly issued certificate into the localmachine certificate store, then assign it to all https bindings on the specified web site instance. Finally, it will delete all expired certificates associated with the domain in question.

UPDATE: 10 February 2017! Please see the revised script!

import-module ACMESharp

#
# Script parameters
#

$domain = "my.example.com"
$alias = "my.example.com-01"
$iissitename = "my.example.com"
$certname = "my.example.com-$(get-date -format yyyy-MM-dd--HH-mm)"

#
# Environmental variables
#

$PSEmailServer = "localhost"
$LocalEmailAddress = "[email protected]"
$OwnerEmailAddress = "[email protected]"
$pfxfile = "c:\Admin\Certs\$certname.pfx"
$CertificatePassword = "PASSWORD!"

#
# Script setup - should be no need to change things below this point
#

$ErrorActionPreference = "Stop"
$EmailLog = @()

#
# Utility functions
#

function Write-Log {
  Write-Host $args[0]
  $script:EmailLog  += $args[0]
}

Try {
  Write-Log "Attempting to renew Let's Encrypt certificate for $domain"

  # Generate a certificate
  Write-Log "Generating certificate for $alias"
  New-ACMECertificate ${alias} -Generate -Alias $certname

  # Submit the certificate
  Submit-ACMECertificate $certname

  # Check the status of the certificate every 6 seconds until we have an answer; fail after a minute
  $i = 0
  do {
    $certinfo = Update-AcmeCertificate $certname
    if($certinfo.SerialNumber -ne "") {
      Start-Sleep 6
      $i++
    }
  } until($certinfo.SerialNumber -ne "" -or $i -gt 10)

  if($i -gt 10) {
    Write-Log "We did not receive a completed certificate after 60 seconds"
    $Body = $EmailLog | out-string
    Send-MailMessage -From $LocalEmailAddress -To $OwnerEmailAddress -Subject "Attempting to renew Let's Encrypt certificate for $domain" -Body $Body
    Exit
  }

  # Export Certificate to PFX file
  Get-ACMECertificate $certname -ExportPkcs12 $pfxfile -CertificatePassword $CertificatePassword

  # Import the certificate to the local machine certificate store 
  Write-Log "Import pfx certificate $pfxfile"
  $certRootStore = "LocalMachine"
  $certStore = "My"
  $pfx = New-Object System.Security.Cryptography.X509Certificates.X509Certificate2
  $pfx.Import($pfxfile,$CertificatePassword,"Exportable,PersistKeySet,MachineKeySet") 
  $store = New-Object System.Security.Cryptography.X509Certificates.X509Store($certStore,$certRootStore) 
  $store.Open('ReadWrite')
  $store.Add($pfx) 
  $store.Close() 
  $certThumbprint = $pfx.Thumbprint

  # Bind the certificate to the requested IIS site (all https bindings)
  Write-Log "Bind certificate with Thumbprint $certThumbprint"
  $obj = get-webconfiguration "//sites/site[@name='$iissitename']"
  for($i = 0; $i -lt $obj.bindings.Collection.Length; $i++) {
    $binding = $obj.bindings.Collection[$i]
    if($binding.protocol -eq "https") {
      $method = $binding.Methods["AddSslCertificate"]
      $methodInstance = $method.CreateInstance()
      $methodInstance.Input.SetAttributeValue("certificateHash", $certThumbprint)
      $methodInstance.Input.SetAttributeValue("certificateStoreName", $certStore)
      $methodInstance.Execute()
    }
  }

  # Remove expired LetsEncrypt certificates for this domain
  Write-Log "Remove old certificates"
  $certRootStore = "LocalMachine"
  $certStore = "My"
  $date = Get-Date
  $store = New-Object System.Security.Cryptography.X509Certificates.X509Store($certStore,$certRootStore) 
  $store.Open('ReadWrite')
  foreach($cert in $store.Certificates) {
    if($cert.Subject -eq "CN=$domain" -And $cert.Issuer.Contains("Let's Encrypt") -And $cert.Thumbprint -ne $certThumbprint) {
      Write-Log "Removing certificate $($cert.Thumbprint)"
      $store.Remove($cert)
    }
  }
  $store.Close() 

  # Finished
  Write-Log "Finished"
  $Body = $EmailLog | out-string
  Send-MailMessage -From $LocalEmailAddress -To $OwnerEmailAddress -Subject "Let's Encrypt certificate renewed for $domain" -Body $Body
} Catch {
  Write-Host $_.Exception
  $ErrorMessage = $_.Exception | format-list -force | out-string
  $EmailLog += "Let's Encrypt certificate renewal for $domain failed with exception`n$ErrorMessage`r`n`r`n"
  $Body = $EmailLog | Out-String
  Send-MailMessage -From $LocalEmailAddress -To $OwnerEmailAddress -Subject "Let's Encrypt certificate renewal for $domain failed with exception" -Body $Body
  Exit
}

I guess I’ll find out in 60 days if the script still works! With many thanks to the users of StackOverflow, and other bloggers, for working code samples which saved me a lot of time reading reference documentation for so many of the different bits of glue here, from exception management, through to sending emails with PowerShell, through to assigning certificates to IIS website bindings, and more…

Update 2 December 2016: The original script had a bug that didn’t affect use with IIS. However, when I tried to use the Let’s Encrypt certificate with another program, in this case MailEnable, I found that the program did not have access to the certificate, even though it seemed it should have had.

When I started the relevant MailEnable service, it would display an error such as:

12/02/16 20:13:00 **** Error 0x8009030d returned by AcquireCredentialsHandle
12/02/16 20:13:00 **** Error creating credentials object for SSL session
12/02/16 20:13:00 Unable to locate or bind to certificate with name "my.example.com"

I checked the permissions and various other factors but only when I did a deep comparison of the details of a working certificate against the one that was failing, using certutil, as suggested in that blog linked above, did I spot the problem. The problem lay in the following CRYPT_KEY_PROV_INFO structure:

CERT_KEY_PROV_INFO_PROP_ID(2):
    Key Container = {XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
  Unique container name: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX_XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
    Provider = Microsoft Enhanced Cryptographic Provider v1.0
    ProviderType = 1
    Flags = 0
    KeySpec = 1 -- AT_KEYEXCHANGE

When I compared this against a working certificate, I saw that the Flags member had a value of 20 (hex), not 0. 0x20 turns out to be CRYPT_MACHINE_KEYSET. Because I was missing the MachineKeySet flag in the certificate import call, this meant that the key was stored under the Administrator user’s keyset instead of the machine keyset. IIS coped with this, but not MailEnable, which runs under a more restricted user’s credentials.

I have also updated the script to delete all Let’s Encrypt certificates for the domain that have been obsoleted by the new certificate, rather than just certificates that have expired (-And $cert.NotAfter -lt $date), mostly to avoid the risk of accidentally selecting an old certificate manually when doing configuration via UI.

On my servers, I’ve also added a section to the end of the script that restarts various services that depend on the certificate and will not use a new certificate until after restarting.

Don’t forget to navigate to about:blank when embedding IWebBrowser2

Today I spent several hours trying to figure out why an embedded web browser component (in this case TEmbeddedWB) in a Delphi test app never received the appropriate IHttpSecurity and IWindowForBindingUI QueryService requests.

I was doing this in order to provide more nuanced handling of self-signed certificates in an intranet context. We all do this, right? Here the term “nuanced” means “Of course I trust self signed certificates on my intranet, don’t you?” Feel free to rant and rave on this. 😉

But no matter what I did, what incantations I tried, or what StackOverflow posts I perused, I was unable to find an answer. Until finally I stumbled on a side comment in a thread from 2010. Igor Tandetnik notes that:

Right after creating the control, navigate it to about:blank. Right after that, navigate it to the page you wanted to go to. It’s a known problem that IServiceProvider doesn’t work for the very first navigation.

And this was something that I kinda knew in the back of my head, but of course had forgotten. Thank you Igor.

This post would not be complete without some splendiferous code. Just for reference, it’s so simple if you don’t blank out and forget about:blank.

unit InsecureBrowser;

interface

uses
  Winapi.Windows,
  Winapi.Messages,
  Winapi.Urlmon,
  Winapi.WinInet,
  System.SysUtils,
  System.Variants,
  System.Classes,
  Vcl.Graphics,
  Vcl.Controls,
  Vcl.Forms,
  Vcl.Dialogs,
  Vcl.OleCtrls,
  Vcl.StdCtrls,
  SHDocVw_EWB,
  EwbCore,
  EmbeddedWB;

type
  TInsecureBrowserForm = class(TForm, IHttpSecurity, IWindowForBindingUI)
    web: TEmbeddedWB;
    cmdGoInsecure: TButton;
    procedure webQueryService(Sender: TObject; const [Ref] rsid,
      iid: TGUID; var Obj: IInterface);
    procedure FormCreate(Sender: TObject);
    procedure cmdGoInsecureClick(Sender: TObject);
  private
    { IWindowForBindingUI }
    function GetWindow(const guidReason: TGUID; out hwnd): HRESULT; stdcall;

    { IHttpSecurity }
    function OnSecurityProblem(dwProblem: Cardinal): HRESULT; stdcall;
  end;

var
  InsecureBrowserForm: TInsecureBrowserForm;

implementation

{$R *.dfm}

function TInsecureBrowserForm.GetWindow(const guidReason: TGUID;
  out hwnd): HRESULT;
begin
  Result := S_FALSE;
end;

function TInsecureBrowserForm.OnSecurityProblem(dwProblem: Cardinal): HRESULT;
begin
  if (dwProblem = ERROR_INTERNET_INVALID_CA) or
     (dwProblem = ERROR_INTERNET_SEC_CERT_CN_INVALID)
    then Result := S_OK
    else Result := E_ABORT;
end;

procedure TInsecureBrowserForm.webQueryService(Sender: TObject;
  const [Ref] rsid, iid: TGUID; var Obj: IInterface);
begin
  if IsEqualGUID(IID_IWindowForBindingUI, iid) then
    Obj := Self as IWindowForBindingUI
  else if IsEqualGUID(IID_IHttpSecurity, iid) then
    Obj := Self as IHttpSecurity;
end;

procedure TInsecureBrowserForm.cmdGoInsecureClick(Sender: TObject);
begin
  web.Navigate('https://evil.intranet.site/');
end;

procedure TInsecureBrowserForm.FormCreate(Sender: TObject);
begin
  web.Navigate('about:blank');
end;

end.

 

Windbg and Delphi – a quick reference

This is a list of my blog posts on using WinDBG with Delphi apps, mostly for my reference. Internally, I use a version of tds2dbg with some private modifications, but you should have reasonable results with the public version.

WinDBG and Delphi exceptions

WinDBG and Delphi exceptions in x64

Locating Delphi exceptions in a live session or dump using WinDbg

Debugging a stalled Delphi process with Windbg and memory searches

Finding class instances in a Delphi process using WinDbg

More windbg tricks with Delphi – how to ignore specific exceptions (Jan 2016)

I also have some other posts that talk about WinDBG and/or Delphi which can be helpful for illustrative purposes:

Another partial malware diagnosis

Detecting the Citadel Trojan with an Application Failure

When characters go astray: diagnosing missing characters when printing with IE9

WaitForSingleObject. Why you should never use it.

IE11, Windbg, JavaScript = joy

 

 

 

 

Windows 10 shows a blank screen after login

Quick post for future reference and in case it helps other users.

Got to work this morning, and couldn’t login to my Windows 10 machine. Windows Updates had been installed overnight (oh dear).

The symptoms were slightly different to most of the other reports I’ve seen online (Google Windows 10 blank screen after login). I could login on one account all fine, but my primary account would just go to a black screen (with cursor). Waited quite some time (10 minutes or more) with no change. Rebooted. No change.

Pressing Ctrl+Alt+Del would bring up Windows Security, and from there I could switch user or go back to the lock screen, but attempting to log out would fail: the “wait for logout” spin appeared but never logged out. Other options from that screen had no effect.

I vaguely tried multi monitor options (Windows+P key combination), typing password into the blank screen, and other magic mummery without effect (and without expectation of success).

Eventually I logged in on the working account, turned off Windows firewall, enabled Remote Registry, and got to work with SysInternals tools. I logged back out of that account and used the command pslist \\machine -t from a working computer to see what processes were running.


pslist v1.3 - Sysinternals PsList
Copyright (C) 2000-2012 Mark Russinovich
Sysinternals - www.sysinternals.com

Process information for <machine>:

Name                             Pid Pri Thd  Hnd      VM      WS    Priv
Idle                               0   0   4    0      64       4       0
  System                           4   8 141 1217  393012  271644    1560
    smss                         316  11   3   49    4864     592     376
csrss                            428  13  11  415  163348    3240    1620
wininit                          500  13   4   91   43984    2792    1232
  services                       572   9   9  294   35020    7900    4648
    svchost                      304   8  35  956  184400   23016   13240
    svchost                      356   8  17  321   82168    6432    3248
    svchost                      496   8  26 1043  204540   24268   20616
    svchost                      748   8  30  773   87432   16040   10172
      ShellExperienceHost       7032   8  44  876  369236   35880   58392
      WmiPrvSE                  8876   8   6  144   33264    8172    1880
      SearchUI                  9820   8  27  900  362948   37788   48896
    svchost                      812   8  18  528   59364   11544    9764
    svchost                      936   8  52 1346 1417096   26592   22524
    svchost                      968   8  78 4722 1321944   71912  797808
      taskhostw                 3952   8   8  291  275572   10268    6492
    svchost                     1064   8  37 1130  719392   25880   12960
      WUDFHost                  1816   8   8  476   38964    3852    2212
      dasHost                   2084   8  13  302   37880    7968    3764
    mvbtrcsvcx64                1108   8   3  107   58624    1608    1124
    svchost                     1344   8  14  527  217200   29352   56716
    svchost                     1348   8  29  605  167772   23016   18484
    spoolsv                     1792   8  24  576   91808   13972   11384
    svchost                     2156   8  48  485 1200004   19240    9308
    MsMpEng                     2268   8  45 1122  523848  127360  144964
    svchost                     2684   8  15 3980  303696   53068   95100
    NisSrv                      3084   8  10  287   71416   10276   23812
    SearchIndexer               5800   8  50 1045  434516   77688   95752
      SearchProtocolHost        9568   4  10  353   74552   12456    2240
      SearchFilterHost          9856   4   4  114   35096    6560    1208
    officeclicktorun            8500   8  18 3570  213028   27932   48116
    svchost                    11928   8   5  128   23372    6500    1276
  lsass                          580   9  14 1272   58540   17008    9876
csrss                           3356  13  15  634  160220   16276    2520
winlogon                        3396  13   6  229   65460   12380    2472
  dwm                           3520  13  12  332  282440   34416   62088

Nothing obviously wrong that I could see. I then ran pskill \\machine 3396 to kill winlogon, in the hope that at least that would cause the account to log out.

And of course it did.

The surprise was that after that I was able to login…

Unfortunately, I have not yet traced what caused the lock but at least this was an answer for me, which hopefully may help someone else one day.

Possibly related: I did find this in the Event Log:

Application pop-up: ShellExperienceHost.exe – Application Error : The instruction at 0x00007FFE569D50CB referenced memory at 0x0000000000000000. The memory could not be read.

And, an unhelpful error from win32k:

The description for Event ID 267 from source Win32k cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

The specified resource type cannot be found in the image file

The case of the unexplained: When no Windows apps (aka Windows Store apps) will start

Yes, I’m shamelessly stealing @MarkRussinovich‘s blog series title for this post!

One of my machines running Windows 10 here would not run any Windows apps (formerly known as Universal apps, Metro, Modern UI and I’m not sure if I’ve missed any names). Classic desktop apps would work fine.

Finding Microsoft Edge in the Start Menu

I’d click the link to the app in the Start Menu (those missing names may be another case to chase!), and the app would flash onto the screen and then almost immediately disappear.

Microsoft Edge starts and immediately exits

Of course this was more than a little bit frustrating, with no hints as to how to resolve the problem. Checking event logs and reliability provided no pointers towards solutions.

Reliability Monitor is aware of the problem but doesn't know how to fix it

After a couple of pointless web searches (“Edge won’t start”, what was I thinking?), and a bizarre side trip into deep conspiracy theories on Microsoft forums, I realised it was time to break out Procmon to try and trace the problem.

Procmon to the rescue

Procmon lets you watch and log events happening on your file system, registry and network in real time. Running Procmon for even just a minute will often generate hundreds of thousands of events, so it’s fantastic that it includes a powerful set of filtering tools to help you locate specific events.

I started Procmon, and then started, or tried to start Microsoft Edge. After it fell over again, I went back into Procmon, stopped the trace (Ctrl+E), and started to filter the 452,626 events that had been captured in those few seconds.

Initial results in procmon

Procmon’s initial setup includes some filters that exclude events that are of less interest to mere mortals, such as reading and writing to the pagefile, or events caused by Procmon itself. Those default filters cut the results down by 55% to begin with!

While you can use the Filter dialog (Ctrl+L) to manually enter filters, and I often do this for complex filtering, it’s often faster to simply right-click on a cell that you don’t want to see again, and select Exclude <value> from the popup menu. Conversely, if you want to focus on that particular value, select Include <value>.

First, I excluded some processes I wasn’t interested in, such as Explorer.exe, and then excluded a number of different values from the Result column. I was really looking for the ACCESS DENIED result, because that’s probably the most common result that causes apps to crash. I ended up with the following filters on the Result column:

Filtering results to find the errors I am looking for in Procmon

Now, there were few enough events (only 1,625 of them) that I could scan through quickly and hopefully spot something going wrong. And, again Procmon found the answer!

Finding the point where Microsoft Edge encounters an error in Procmon

There you can see MicrosoftEdge.exe receiving an ACCESS DENIED result when trying to read the folder C:\Users\mcdurdin\AppData\Local\Packages\Microsoft.MicrosoftEdge_8wekyb3d8bbwe\AC\Microsoft\Windows\1605653898. Shortly thereafter, we see the WerFault.exe process which is the Windows Error Reporting process that started after Edge decided to crash.

Note: it’s merely serendipitous that WerFault.exe is visible in the filtered results; remember that there are probably thousands of additional events between the highlighted ACCESS DENIED event and the start of the WerFault.exe process, and the only reason it is visible at all is that WerFault itself had received ACCESS DENIED and other results from its own events!

I could alternatively have looked at the process tree (Ctrl+T) to find when the WerFault.exe process started (or the MicrosoftEdge.exe process had stopped) and traced back from there. But usually I find filtering to be a faster way of finding the specific issue.

What’s wrong with this folder?

Now I wanted to figure out what was wrong with this folder. Here’s what I saw on this machine:

C:\Users\mcdurdin\AppData\Local\Packages>icacls Microsoft.MicrosoftEdge_8wekyb3d8bbwe\AC
Microsoft.MicrosoftEdge_8wekyb3d8bbwe\AC NT AUTHORITY\SYSTEM:(I)(OI)(CI)(F)
                                         BUILTIN\Administrators:(I)(OI)(CI)(F)
                                         TAVULTESOFT\mcdurdin:(I)(OI)(CI)(F)

Successfully processed 1 files; Failed processing 0 files

And this is what I saw on a machine where Edge was working:

C:\Users\mcdurdin\AppData\Local\Packages>icacls Microsoft.MicrosoftEdge_8wekyb3d8bbwe\AC
Microsoft.MicrosoftEdge_8wekyb3d8bbwe\AC S-1-15-2-3624051433-2125758914-1423191267-1740899205-1073925389-3782572162-737981194:(OI)(CI)(F)
                                         NT AUTHORITY\SYSTEM:(I)(OI)(CI)(F)
                                         BUILTIN\Administrators:(I)(OI)(CI)(F)
                                         TAVULTESOFT\mcdurdin:(I)(OI)(CI)(F)
                                         Mandatory Label\Low Mandatory Level:(OI)(CI)(NW)

Successfully processed 1 files; Failed processing 0 files

I saw two differences: a missing S-1-15-2-… entry and a missing Low Mandatory Level entry. Now, that S-1-15-2-… entry is an App Package SID. I checked a few other installed app packages, and they were all missing the relevant security settings on this machine. So it wasn’t specific to Edge, but was a general issue on my computer.

At this point, I did find a relevant discussion on Microsoft’s forums that had some answers, but did not solve the general case that I was experiencing.

I  have not been able to find the root cause of this. Lost in the deep dark mists of time it is.

Fixing the problem

To fix the Low integrity level was a pretty straightforward command, run from a command prompt in the %LOCALAPPDATA%\Packages folder:

for /d %d in (*) do icacls %d\AC /setintegritylevel (OI)(CI)L

However, determining the correct SID to add to each folder was a little more work. It turns out that in the registry, there is a mapping between the app’s moniker (so, in this case, the folder names) and the relevant SID at HKEY_CLASSES_ROOT\Local Settings\Software\Microsoft\Windows\CurrentVersion\AppContainer\Mappings. Learn more.
The AppContainer Mappings registry, showing the relationship between SID and Moniker
I sucked a list of those SIDs into a text file with the following command:

reg query "HKCR\Local Settings\Software\Microsoft\Windows\CurrentVersion\AppContainer\Mappings" > m.txt

That gave me a file that looked like this, shown as an image just because:

The contents of m.txt

From there, I wanted to extract just the last part of each key:

for /f "tokens=9 delims=\" %i in (m.txt) do echo %i >> n.txt

Now n.txt looked like:

The contents of n.txt

Now to take each of those and map it to its moniker, and from there update the security on the folder accordingly. That command turned out to be a bit more hoopy.

for /f %a in (n.txt) do for /f "tokens=2*" %b in ('reg query "HKCR\Local Settings\Software\Microsoft\Windows\CurrentVersion\AppContainer\Mappings\%a" /V "Moniker" 2^>NUL ^| FIND "REG_SZ"') DO icacls "%c\AC" /grant *%a:(OI)(CI)(F)

Putting that all together, in a batch file (I’ve combined the integrity level setting and SID grant in this script):

@echo off
del n.txt
reg query "HKCR\Local Settings\Software\Microsoft\Windows\CurrentVersion\AppContainer\Mappings" > m.txt
for /f "tokens=9 delims=\" %%i in (m.txt) do echo %%i >> n.txt
for /f %%a in (n.txt) do for /f "tokens=2*" %%b in ('reg query "HKCR\Local Settings\Software\Microsoft\Windows\CurrentVersion\AppContainer\Mappings\%%a" /V "Moniker" 2^>NUL ^| FIND "REG_SZ"') DO icacls "%%c\AC" /grant *%%a:(OI)(CI)(F) /setintegritylevel (OI)(CI)L

And yay, now Edge starts!

Yay, Microsoft Edge starts now!

One of these days I’ll have to get more into PowerShell, which would probably make some of these scripts a lot easier!

With thanks to examples from http://www.robvanderwoude.com/ntregquery.php which saved a lot of fuss (but warning if you copy and paste: the examples use the wrong caret character, ˆ instead of ^).

GUI Info Utility for Windows

A very quick one today. GUIInfo is a tiny little utility that has a narrow purpose: it shows, in realtime, the current active, focus, foreground and capture window information for Windows.

Why this utility? Debugging focus issues is always frustrating. Attempting to observe current window focus information in a debugger results in the focus changing (to the debugger, of course). You can work around this – use remote debugging, or add logging in the debugger – but it’s a hassle.

Debugging focus issues would make Werner Heisenberg feel right at home.

Screenshot

GUIInfo Screenshot

Features

  • Updates 10 times a second so changes are reflected promptly.
  • Changes are highlighted in green, and slowly fade back to black.
  • Hovering over bold window labels will highlight the relevant window on the screen.
  • Topmost, so you can see the information without needing to try and keep the window visible.
  • Open Source – written in Delphi, MIT license.

Download and Source