Jan 9, 2012

Node: Reading Every Line in a File

In Groovy this would be concise: new File(filename).eachLine { ... }. Here's what it looks like in Node using CoffeeScript:

fs = require('fs')
EM = require('events').EventEmitter
ev = new EM()
stream = fs.createReadStream(filename)
buffer = new Buffer(0)
stream.on 'data', (data) ->
nextBuffer = new Buffer(buffer.length+data.length)
buffer.copy(nextBuffer)
data.copy(nextBuffer, buffer.length)
start = 0
offset = buffer.length
buffer = nextBuffer

for i in [1..data.length]
if data[i] == 10
end = i+offset+1
line = buffer.slice start, end
start = end
ev.emit "line", line.toString()

buffer = buffer.slice start

stream.on 'end', () ->
ev.emit 'line', buffer.toString() unless buffer.length == 0

ev.on 'line', (line) ->
...
Do you have a smarter way of doing this?

Edit: I'm reading a very large file with this code. That's why this isn't synchronous.

4 comments:

Peter Ledbrook said...

Vert.x has a Pump class for this, which may be of interest: https://github.com/purplefox/vert.x/blob/master/src/tests/java/org/vertx/tests/core/streams/PumpTest.java

Trevor Burnham said...

There's a good Quora question on exactly this: http://www.quora.com/What-is-the-best-way-to-read-a-file-line-by-line-in-node-js

Groovy has many more high-level constructs baked in than Node does. But Node has an astonishing number of high-quality libraries for such a young ecosystem, and several of them can do what you want quite easily.

Steve Howell said...

The CoffeeScript code seems reasonable to me, given your constraints. Obviously, it would be pretty easy in CoffeeScript to encapsulate the pattern, so you'd never have to write it again. Also, allow me to emphasize that your approach is only necessary for large files; for smaller files, you can do "readFileSync(fn).toString().split '\n'".

Curious Attempt Bunny said...

@Peter
Cheers

@Trevor
I'd seen that same Quora page. There certainly are a number of them. They're not all such quality though, or perhaps I just been unlucky.

@Steve
My file was 600Mb. I *could* have read it all into memory. The readFileSync route is definitely more concise.