Nothing out of the box that I know of.
There are quite a few options out there, always have been, but in something like a more complex DAW, you'll want to render dozens of waveforms at once with 60 fps (zooming, scrolling, etc), but e.g. also not lose precision when zoomed out.
So you can't just take every nth sample from the raw audio data, or it'll look unintelligible when zoomed out. This in turn will need you to pre-compute the accurate waveform "shape", because you can't construct it from the raw audio data on every single render, per clip, when you have 20 of them. Especially with longer audio files.
And that in turn is something you'll quickly realize you need to be doing on a worker thread (or rather, a few of them in parallel), and... yeah, at that point it probably becomes a lot easier to engineer your own approach, than to try to get something ready and make attempts at getting it performant enough.
And then you may want to reduce memory use eventually by file streaming, to avoid loading 1 hour long audio files into memory (especially with Web Audio API's float32 representation of audio buffers).
So... yeah, I guess it depends on your requirements, but good chance you'll have to roll your own unless you are doing relatively simpler things.