It’s a good question especially after a recent downtime of LE that lasted more than a day, with 16 hours of the service basically unavailable. This is quite a serious outage, which will probably put LE behind most of other certificate provides as it makes it uptime just 99.8% on its own for this year.
As LE assumes that renewals are automated, you should never assume that LE services will be always available and design your own production automation accordingly.
Figure: Let's Encrypt planned updates and their actual duration.
Having said that, I don’t believe there is a real possibility of an outage, which would last more than a week so in terms of issuance and certificate renewals LE is likely to remain reliable if you plan and renew according to recommended timelines.
Other than that, you should consider operational limits of LE. I have written up many of them into one blog post here.
You may also be interested in assurance in terms of potential damage caused by incidents. I haven’t found any authoritative reference for any such assurance yet.
I have analyzed reliability of Let’s Encrypt using their own data in Spring 2018.
Figure: Duration of "full" disruptions per month with incident detail at the bottom.
I used a couple of charts above and the write-up is at Let’s Encrypt uptime is 99.9% — or 98.8% without defects in 2017. It shows that it was fully up and running only for 98.8% of time in 2017. Some disruptions were “partial” so only a subset of users would experience them.
Another important aspect are changes to the protocol and its implementation. Many users experienced problem in Q3 as LE set IPv6 as the primary protocol. It was used whenever it detected existence of an IPv6 address, regardless of whether it actually worked.
It certainly makes sense to deploy an external end-to-end monitoring to detect any potential issues early on - something like our KeyChest service.