<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel>
        <title>3545100301</title>
        <link>https://0xd34df00d.me/</link>
        <description><![CDATA[Some random Haskell and C++ (mostly)]]></description>
        <atom:link href="https://0xd34df00d.me//rss.xml" rel="self"
                   type="application/rss+xml" />
        <lastBuildDate>Thu, 31 Oct 2024 00:00:00 UT</lastBuildDate>
        <item>
    <title>What's up with cross-module optimizations?</title>
    <link>https://0xd34df00d.me//posts/2024/10/xmod-inlining.html</link>
    <description><![CDATA[
<p><a href="/posts/2024/09/naive-nfas.html">Last time</a>,
we implemented a simple regexp engine and looked at how it could be optimized.
Truth is, I cheated a little: all the code,
from the regexp definition to calling the regexp engine,
was in the same module.
However, it’s unlikely you’d write your production code like this.
You’d probably separate different functionality into different modules:
low-level memory representation-related things go into one,
NFA matching goes into another, and so on.
So, when productionizing our code, you’d do a similar refactoring.
But even such a simple change turns out to have quite an effect on the performance.</p>
<p>Today, we’ll look at the magnitude of this effect,
identify its sources,
and try to learn to reason about it.</p>
]]></description>
    <pubDate>2024-10-31</pubDate>
    <guid>https://0xd34df00d.me//posts/2024/10/xmod-inlining.html</guid>
    <dc:creator>0xd34df00d</dc:creator>
</item>
<item>
    <title>Let's run some NFAs</title>
    <link>https://0xd34df00d.me//posts/2024/09/naive-nfas.html</link>
    <description><![CDATA[
<p>Lately, I’ve been playing around with memoized NFAs for optimized regular expression matching,
with features like lookahead and atomic groups, based on <a href="https://arxiv.org/abs/2401.12639">this paper</a>.
The original authors have their code in Scala,
and I thought it’d be fun to code something in Haskell
to see how it stacks up against their new implementation
and the prior art.</p>
<p>But before diving into memoization and the more complex features,
let’s start with the basics.
In this post, we’ll focus on a simple, naive backtracking NFA implementation.
We’ll start with the simplest, regexp 101 code and then make it significantly faster,
step by step.
We’ll also inevitably face some dead ends — that’s part of learning and experimentation, too!</p>
<p>To ground our work in reality,
I’ll also implement some of the algorithms in C++,
praised for its performance advantages over pretty much everything else.
Is the praise deserved here?
Let’s find out.</p>
]]></description>
    <pubDate>2024-09-12</pubDate>
    <guid>https://0xd34df00d.me//posts/2024/09/naive-nfas.html</guid>
    <dc:creator>0xd34df00d</dc:creator>
</item>
<item>
    <title>Nubbing lists in C++</title>
    <link>https://0xd34df00d.me//posts/2023/07/nubbing-lists.html</link>
    <description><![CDATA[
<p>It’s been a while since I last used C++ for anything serious,
but once a C++ guy, you’re always a C++ guy, right?
So, I decided to see how modern C++ fares in a seemingly simple task:
eliminating duplicate list elements.</p>
<p>That sounds trivial, so why bother with a whole blog post?
Well, the catch is we’re gonna do this at compile-time.
Moreover, lists will be represented as tuples,
and the elements might have different types.</p>
<p>Hopefully, during this little exercise,
we’ll also learn (or at least reinforce) a pattern or two
of modern metaprogramming.</p>
]]></description>
    <pubDate>2023-07-23</pubDate>
    <guid>https://0xd34df00d.me//posts/2023/07/nubbing-lists.html</guid>
    <dc:creator>0xd34df00d</dc:creator>
</item>
<item>
    <title>Haskell is quite OK for images: encoding QOI</title>
    <link>https://0xd34df00d.me//posts/2022/01/haskell-is-quite-ok-encoding.html</link>
    <description><![CDATA[
<p><a href="/posts/2021/12/haskell-is-quite-ok-decoding.html">Last time</a>
we’ve looked at writing a decoder for the <a href="https://phoboslab.org/log/2021/11/qoi-fast-lossless-image-compression">QOI format</a>.
Today, we’ll look at the inverse: encoding QOI images and all that it entails.</p>
<p>Like the last time, this post describes both the final result and the road there.
So, there will be lots of code and lots of diffs, beware!</p>
]]></description>
    <pubDate>2022-01-29</pubDate>
    <guid>https://0xd34df00d.me//posts/2022/01/haskell-is-quite-ok-encoding.html</guid>
    <dc:creator>0xd34df00d</dc:creator>
</item>
<item>
    <title>Haskell is quite OK for images: decoding QOI</title>
    <link>https://0xd34df00d.me//posts/2021/12/haskell-is-quite-ok-decoding.html</link>
    <description><![CDATA[
<p>I’ve recently come across the new <a href="https://phoboslab.org/log/2021/11/qoi-fast-lossless-image-compression">“Quite OK Image” format</a>
— a fast lossless image compression algorithm.
It’s a very straightforward algorithm that’s a pleasure to work with, so, naturally,
I got curious what would be the performance of a Haskell implementation if:</p>
<ol type="1">
<li>I just write reasonably efficient code without getting too deep into low-level details to get the job done in a couple of hours.</li>
<li>I try to push the envelope and see what could be done if one’s actually willing to go into those details (within some limits, of course, so no GHC hacking!)</li>
</ol>
<p>Turns out that yes, it’s indeed possible to write something with C-level performance in a matter of a couple of hours.
Moreover, Haskell’s type system shines here:
class-constrained parametric polymorphism enables using the same decoder implementation for
pixels with very different representations,
allowing to squeeze as much performance as is reasonably possible without duplicating the code.</p>
<p>In this post, I’ll describe the Haskell implementation of the decoder, and the steps I took to get from (1) to (2) for the decoder.</p>
]]></description>
    <pubDate>2021-12-18</pubDate>
    <guid>https://0xd34df00d.me//posts/2021/12/haskell-is-quite-ok-decoding.html</guid>
    <dc:creator>0xd34df00d</dc:creator>
</item>
<item>
    <title>(neo)vim and Haskell, 2021 edition</title>
    <link>https://0xd34df00d.me//posts/2021/10/vim-and-haskell-in-2021.html</link>
    <description><![CDATA[
<p>In this post, I’ll describe my setup for doing Haskell (which I almost exclusively do with <code>stack</code>-based projects).</p>
<p>Spoiler: it’s much, much more straightforward than a few years ago, almost to the point of “vim and Haskell” posts being no longer necessary.</p>
]]></description>
    <pubDate>2021-10-04</pubDate>
    <guid>https://0xd34df00d.me//posts/2021/10/vim-and-haskell-in-2021.html</guid>
    <dc:creator>0xd34df00d</dc:creator>
</item>
<item>
    <title>Grokking recursion</title>
    <link>https://0xd34df00d.me//posts/2020/09/agda-wf-rec.html</link>
    <description><![CDATA[
<p>If we want to use dependently typed languages as proof checkers, we better be sure they are consistent as a logic,
so that we don’t accidentally prove ⊥ and, as a consequence, any proposition.</p>
<p>One huge source of inconsistency is non-terminating computations;
hence languages like Idris or Agda go to great lengths to ensure that functions indeed do terminate.
But, for <a href="https://en.wikipedia.org/wiki/Rice%27s_theorem">deep reasons</a>,
a purely automated check having neither false positives nor false negatives just does not exist,
so compromises must be made.
Naturally, when talking about proofs, it’s better to be safe than sorry,
so these languages strive to never label a function that doesn’t really terminate for all inputs as terminating.
Consequently, this means that there are terminating functions that the termination checker does not accept.
Luckily, these functions can be rewritten to make the checker happy if all the recursive calls are "smaller" in some sense.</p>
<p>This post emerged from me trying to persuade Agda that a bunch of mutually recursive functions are all terminating.
I went through the Agda’s standard library to figure out how to do this,
taking notes about what different abstractions I encountered mean and expand to.
Then I figured that, if I pour some more words into my notes,
it might turn out to be useful for somebody else, so, well, here it is.</p>
]]></description>
    <pubDate>2020-09-25</pubDate>
    <guid>https://0xd34df00d.me//posts/2020/09/agda-wf-rec.html</guid>
    <dc:creator>0xd34df00d</dc:creator>
</item>
<item>
    <title>Call stacks aren't really call stacks</title>
    <link>https://0xd34df00d.me//posts/2020/08/callstacks.html</link>
    <description><![CDATA[
<p>Haskell is a very special language,
and one of the peculiarities setting it aside is its evaluation model.
In fact, the thing I, for one, find most complicated about Haskell is not monads nor all the countless type system extensions,
but rather reasoning about space and time complexity of whatever I write.
Thus I better have a good mental model about how Haskell code gets to run,
and one of the most fruitful mental models for me is
treating a Haskell program as a set of equations
that some <a href="https://en.wikibooks.org/wiki/Haskell/Graph_reduction">graph reduction engine</a> churns until…
well, the termination criteria are not the point of this post.
The point of this post is that it’s ultimately a graph without any good intrinsic notion of a call stack.</p>
<p>On the other hand,
there is a <a href="https://hackage.haskell.org/package/base-4.14.0.0/docs/GHC-Stack.html"><code>GHC.Stack</code></a> module
(by the way, described as <q>Access to GHC’s call-stack <em>simulation</em></q>, italics ours)
as well as some mechanism for capturing something called <code class="sourceCode haskell"><span class="dt">CallStack</span></code>s.
How do those call stacks connect with the graph reduction model?
Let’s maybe carry out a few <em>computational</em> experiments all while keeping track of the obstacles we hit, shall we?</p>
]]></description>
    <pubDate>2020-08-29</pubDate>
    <guid>https://0xd34df00d.me//posts/2020/08/callstacks.html</guid>
    <dc:creator>0xd34df00d</dc:creator>
</item>
<item>
    <title>The joys and perils of beating C with Haskell: productionizing wc</title>
    <link>https://0xd34df00d.me//posts/2020/03/the-joys-and-perils.html</link>
    <description><![CDATA[
<p><a href="/posts/2020/02/beating-c-with-20-lines-of-haskell.html">Last time</a>
we’ve looked at implementing a toy <code>wc</code>-like program
and we’ve also compared its performance against the full-blown Unix <code>wc</code>.
The results were quite interesting:
our implementation managed to beat <code>wc</code> by a factor of 5.
Of course, that’s quite an unfair comparison:
our implementation is hardcoded to count just the bytes, lines and words.
<code>wc</code>, on the other hand, has command-line options to select specific statistics,
it supports some additional ones like maximum line length,
it treats Unicode spaces properly (in an Unicode-aware locale, of course),
and so on.
In other words, it’s better to consider what we’ve done last time
as a proof-of-concept showing that it’s possible to achieve (and overcome)
C-like performance on this task, even if with all those concessions.</p>
<p>Today we’ll look at ways of productionizing the toy program from the previous post.
Our primary goal here is allowing the user to select various statistics,
computing just what the user has selected to compute.
We’ll try to do this in a modular and composable way,
striving to isolate each statistic into its own unit of some sorts.</p>
<p>Indeed, if we look at the <a href="https://github.com/coreutils/coreutils/blob/master/src/wc.c">C version</a> —
well, personally I wouldn’t call that as a prime example of readable and maintainable code,
as different statistics are computed in a single big 370-lines-long function.
This is something we’ll try to avoid here.</p>
<p>Moreover, we’ll try to express that certain statistics like byte count or lines count
can be computed more efficiently if we don’t have to look at each byte,
while other statistics like word count or max line length just <em>need</em> to look at each byte one by one
(unless one does some clever and non-trivial broadword programming or SIMD-enabled things,
which is beyond the scope of this post).
For instance, byte count can be computed in <code>O(1)</code> if we know we’re reading from a file —
we can just take the file size and call it a day!</p>
<p>In addition to that, we will, among other things:</p>
<ul>
<li>implement more statistics with ease, enjoying local reasoning;</li>
<li>throw up some tests, enjoying local reasoning once more;</li>
<li>try out some kinda-dependently-typed techniques,
successfully obtaining working code but failing spectacularly on the performance side of things;</li>
<li>play around with Template Haskell;</li>
<li>marvel at the (un)predictability and (un)reproducibility of the resulting code performance.</li>
</ul>
]]></description>
    <pubDate>2020-03-10</pubDate>
    <guid>https://0xd34df00d.me//posts/2020/03/the-joys-and-perils.html</guid>
    <dc:creator>0xd34df00d</dc:creator>
</item>
<item>
    <title>Further beating C with 20 lines of Haskell: wc</title>
    <link>https://0xd34df00d.me//posts/2020/02/beating-c-with-20-lines-of-haskell.html</link>
    <description><![CDATA[
<p>tl;dr: today we’ll look at implementing a toy <code class="shell">wc</code> command
that is about 4-5 times faster than the corresponding GNU Coreutils implementation.</p>
<p>So I’ve recently come across <a href="https://chrispenner.ca/posts/wc">a post</a> by Chris Penner
describing a Haskell implementation of the Unix <code class="shell">wc</code> command.
Chris did a great job optimizing the Haskell version as well as
showing how some high-level primitives (monoids and streaming, for one) turn out to be useful here,
although the result was still a bit slower than C.
There’s also a parallel version that relies on the monoidal structure of the problem a lot, and that one actually beats C.</p>
<p>But that post left me wondering: is it possible to do better without resorting to parallel processing?</p>
<p>Turns out the answer is yes.
With some quite minor tweaks, the Haskell version manages to beat the hell out
of the C version that presumably has decades of man-hours put into it.</p>
]]></description>
    <pubDate>2020-02-02</pubDate>
    <guid>https://0xd34df00d.me//posts/2020/02/beating-c-with-20-lines-of-haskell.html</guid>
    <dc:creator>0xd34df00d</dc:creator>
</item>

    </channel>
</rss>
