Following-up on the leetcode stuff I was doing recently, I decided to pick up another Leetcode problem. This time, text justification algorithm.
Quick rundown of the problem:
- you're given an array of words (eg
["This", "is", "an", "example", "of", "text", "justification."]
) - and a max-length per line (eg
16
) - and you have to space the words in such a way that the resulting text/paragraph is "justified".
The rule for spacing is simple:
- one space between words
- if that does not fit the line (length of line = 16), introduce more spaces between words
- more spaces get distributed left to right so spaces on the left would get filled / incremented first and then the right.
- the last line can be left-aligned so just one space between the words is fine.
I spent a few minutes trying to find out how to extract the collection of words that would fit a line.
That is, if this is the text:
["This", "is", "an", "example", "of", "text", "justification."]
and the max length per line is 16,
the first line can only have ["This", "is", "an"]
.
This is because if I include the word "example", then the length of this line exceeds 16. Remember that I have to include spaces between the words as well in the calculation of line length.
I decided to park this and assume there's a function that gives me a "valid" line already and work on calculating the number of spaces to distribute to justify a "valid" line.
(A valid line is basically an array of words that can fit within the max-length per line including at least 1 space between each word).
In this example, some valid lines would be:
valid lines:
["This", "is", "an"]
["example", "of", "text"]
I decided to use a newtype
to differentiate between any random array of words and a valid line:
newtype ValidLine = ValidLine (Array String)
What I have to get to, as a first step, is this function:
validLineToParaLine :: Int -> ValidLine -> String
validLineToParaLine maxLen validLine = ? -- to be implemented
After a while of thinking about this, here's one logic I came up with:
- the actual length of the line is length of each word in the valid line, and spaces between them.
- the total number of spaces (single-space) in a valid line array is simply one less than the total number of words in the valid line array.
- if I subtract the actual length of the valid line from the deficit, I get the number of extra spaces I have to distribute.
This gives me the number of spaces to distribute.
Here's an example of the logic at work:
valid line -> ["This", "is", "an"]
spaces -> total words in valid line (3) minus 1 = 3-1 = 2
total characters -> sum of length of each word in valid line (4+2+2=7) + spaces(2) = 8+2 = 10
deficit -> max length (16) minus total chars (10) = 16-10 = 6
Okay, so now I have "extra number of spaces to distribute" in the line.
But it got tricky here as I had to find out how to distribute x
number of spaces across y
number of space-slots.
That is:
deficit = 6
6 spaces have to be distributed in this line: "This is an".
Visually, it looks easy. There are just 2 space-slots. Divide 6 by 2 = 3. So each slot gets 3 extra spaces. (Will replace space with hyphen to be more clear in the representation)
This----is----an
But what is the general logic here?
After much thought, I came up with what looks like a slightly-complicated solution.
Here's the logic:
- First off, instead of using the "deficit" (that is, how many more spaces need to be distributed besides the usual number of spaces), I combined the deficit with existing spaces so that now, I just have to worry about how many spaces to distribute in total between the words.
- Using the "total spaces to distribute" number, I construct another array that represents the number of spaces to actually put between each word in the valid line.
Here's an example of point #2:
valid line -> ["This", "is", "an"]
total spaces to -> usual spaces (2) + deficit (6) = 8
distribute
space distribution -> ["----","----"]
array
If the total spaces to distribute was 9, then the array would look like this:
["-----","----"]
That is, the first item will have 5 spaces and the second will have 4.
If I have this array, I could simply merge these two arrays in some way to get the final result:
valid line -> ["This", "is", "an"]
spaces array -> ["----","----"]
justified -> ["This","----","is","----","an"]
The trick is to find out how to go from a "number of spaces to distribute" to a "special spaces array".
I'll post about that in the second part.