capturing and uploading a whole new PNG for each screencap is not what I would call 'efficient', and to meet the use case of Rewind.ai in the first place it should have some OCR mechanism to pull up the relevant screencaps.
The thing that enabled rewind.ai and MS Recall is storing the series of screenshots more like a HEIF, allowing for massive compression ratio and on-device storage + OCR provided by the OS (Live Text since Monterey 2021 [0], Microsoft introduced it last year for Snapdragon based AI-PCs [1])
I guess this is a good starting point if the goal is to fill S3 buckets with screencaps of multiple users, but then we're just back to corporate spyware, not tools for helping individuals use their machine more effectively.
That said, if I was using my own minio backend, it would be neat to archive my screen captures but I would change it so it captures after, say, every keystroke, and every time my mouse stops moving, and after every click. That way I have high density capture of taking actions, and low density otherwise. In any case collecting the data is not the issue, making an interface where that data becomes useful to help me remember something is.
If I was to build one of these, which I'm not, I would try for a RTSP to bucket uploader. That way you could do the actual capture and compression with OBS like any other streamer - or use it with IP security cameras etc. You'd probably end up with a pile of video-ts pieces which could be replayed later using HLS.
ffmpeg works well, especially on apple silicon using video toolbox. That's how I approached it.
Also, automatically doesn't cost storage for identical screenshots (no activity) and very cheap for just moving your mouse around or typing a few characters.
rem looks very intriguing, I'll give it a try, cross platform would be even better ofc. You're doing it without funding?
I wish the LLMs were tuned into the open source landscape more, so when someone has an idea for a POC and asks the bot to write it for them, it would go clippy mode and say "it looks like you're trying to build an open source rewrite of Rewind, would you like to clone the rem repo and contribute to that instead?" lol
> Hypothesis: the world's most valuable data is screen captures of outlier competent people going about their work. But very little of this data is recorded, let alone made publicly available.
It's not quite screen captures, but the way in which any given email is responded to by competent users in your own organization is highly relevant in this context, especially if you place original+reply email pairs into a RAG framework and add function calls for structured domain knowledge.
Unified APIs like https://www.nylas.com/ which an admin can unilaterally connect across an entire org can make this quite viable - assuming you've done the work to build a culture where radical transparency is seen as an opportunity rather than a threat.
There's a lot of nuance required to avoid hallucinations, but organizations that are merely training chatbots on explicit Q&A documents are just scratching the surface of the depth of their semi-structured data.
I pay for Rewind, and honestly, it’s one of the best investments I've made in software. After each Zoom meeting, I receive a summary of everything discussed, including action items to add to my to-do list. Every Monday, I also ask it to remind me of what I accomplished the previous week to help me prepare for my 1:1 meeting. Everything is recorded locally, allowing me to search for anything I did earlier in the day quickly.
Your readme states "MIT License - See LICENSE file for details" but there is no such license file. I've been seeing this a lot lately, did you use an LLM to generate this part of the readme? If so, was MIT a concious choice of yours?
> Attempt to create an Open Source Privacy Focused Rewind.ai Alternative for data capture
I'd assume this was something local or at least for your local network. But this exclusively sends the data over to S3. And based on the lack of encryption keys or even passwords, I'm assuming this is even unecrypted?
Love your enthusiasm! Our plan is to subsume Rewind functionality into Limitless. Sorry it has taken longer than I wanted. The pendant has taken a lot of our time and focus.
A FOSS alternative to Rewind that works on both MacOS and Linux would be a dream come true tbh. Thanks for working on this, I'll be trying it out sometime next week
I took a crack at this, but had trouble building a community. It's all open source.
Native MacOS in swift (the popular one with OCR / text selection from history), and cross platform (rust) without text selection from history and very much POC.
Nice! This is needed as it seems rewind.ai still stores locally but limitless the product they seem to put more energy into goes to the cloud.
I really like the rewind.ai retrieval mechanism. I believe their recording mechanism is highly broken. It often fails to sync to the os calendar and will ask you to record meetings you deleted months ago.
I don’t understand the webcam recording need. I’m not sure what signal you get from that since if you are in a web meeting you already have that on screen. Or if you are coding you might get a few WTF frown faces if working on a hard bug. But you made it optional, so that’s good.
Thank you all so much for chming in about rewind. I’ve been ruminating about what to do about my subscription. To see that I’m not alone in paying for this app that the founder ditched… I finally feel heard. Thank you!
While we’re here, has anyone been able to export audio from Rewind.ai’s local storage?
It's useful to me in that you've identified an interesting niche! I like the idea! As for the implementation, eh, I'd probably rather code my own so it can be in bash or c++ ;-)
It's also useful to me in that it's a solid example of what can be done with LLMs these days, wow!
Also, tangentially, a long long time ago I had a similar system set up, except for packets, not screencaps or audio. A 24h ringbuffer on my router to log _everything_ was a cool-to-have that made debugging network issues easier.
Looks like many open alternatives of Rewind.ai already exist in various levels of completion.[1]
The issue with this one is that it misses the most important feature, the searchability. But you could probably focus on the low overhead aspect of your version.
The thing that enabled rewind.ai and MS Recall is storing the series of screenshots more like a HEIF, allowing for massive compression ratio and on-device storage + OCR provided by the OS (Live Text since Monterey 2021 [0], Microsoft introduced it last year for Snapdragon based AI-PCs [1])
I guess this is a good starting point if the goal is to fill S3 buckets with screencaps of multiple users, but then we're just back to corporate spyware, not tools for helping individuals use their machine more effectively.
That said, if I was using my own minio backend, it would be neat to archive my screen captures but I would change it so it captures after, say, every keystroke, and every time my mouse stops moving, and after every click. That way I have high density capture of taking actions, and low density otherwise. In any case collecting the data is not the issue, making an interface where that data becomes useful to help me remember something is.
[0] https://support.apple.com/guide/preview/interact-with-text-i...
[1] https://learn.microsoft.com/en-us/windows/ai/apis/text-recog...
If I was to build one of these, which I'm not, I would try for a RTSP to bucket uploader. That way you could do the actual capture and compression with OBS like any other streamer - or use it with IP security cameras etc. You'd probably end up with a pile of video-ts pieces which could be replayed later using HLS.
I've also explored Swift AVFoundation to drop frames and colours at the moment of recording, but won't be implementing it at this time.
This was just a POC, and couldn't hack it in a day.
Also, automatically doesn't cost storage for identical screenshots (no activity) and very cheap for just moving your mouse around or typing a few characters.
I wish the LLMs were tuned into the open source landscape more, so when someone has an idea for a POC and asks the bot to write it for them, it would go clippy mode and say "it looks like you're trying to build an open source rewrite of Rewind, would you like to clone the rem repo and contribute to that instead?" lol
And yeah- lots of similar projects and/or possible startups have popped up as well
Most feel that Recall is also this.
It's not quite screen captures, but the way in which any given email is responded to by competent users in your own organization is highly relevant in this context, especially if you place original+reply email pairs into a RAG framework and add function calls for structured domain knowledge.
Unified APIs like https://www.nylas.com/ which an admin can unilaterally connect across an entire org can make this quite viable - assuming you've done the work to build a culture where radical transparency is seen as an opportunity rather than a threat.
There's a lot of nuance required to avoid hallucinations, but organizations that are merely training chatbots on explicit Q&A documents are just scratching the surface of the depth of their semi-structured data.
> Attempt to create an Open Source Privacy Focused Rewind.ai Alternative for data capture
I'd assume this was something local or at least for your local network. But this exclusively sends the data over to S3. And based on the lack of encryption keys or even passwords, I'm assuming this is even unecrypted?
I'm happy to think about e2e encryption.
Also I'd buy a pendant if I can send it to my own S3!!!!
Native MacOS in swift (the popular one with OCR / text selection from history), and cross platform (rust) without text selection from history and very much POC.
https://github.com/jasonjmcghee/rem
https://github.com/jasonjmcghee/xrem
The repo looks like it has quite a few stars and a smattering of issues which I thought meant real usage.
I think it should be done in Swift tbh to get battery impact under 10%
I really like the rewind.ai retrieval mechanism. I believe their recording mechanism is highly broken. It often fails to sync to the os calendar and will ask you to record meetings you deleted months ago.
I don’t understand the webcam recording need. I’m not sure what signal you get from that since if you are in a web meeting you already have that on screen. Or if you are coding you might get a few WTF frown faces if working on a hard bug. But you made it optional, so that’s good.
Must one set up a S3 compatible stack on a home server somewhere?
While we’re here, has anyone been able to export audio from Rewind.ai’s local storage?
It's also useful to me in that it's a solid example of what can be done with LLMs these days, wow!
Also, tangentially, a long long time ago I had a similar system set up, except for packets, not screencaps or audio. A 24h ringbuffer on my router to log _everything_ was a cool-to-have that made debugging network issues easier.
The issue with this one is that it misses the most important feature, the searchability. But you could probably focus on the low overhead aspect of your version.
[1] * Screenpipe https://github.com/mediar-ai/screenpipe * Memento https://github.com/apirrone/Memento * Rem https://github.com/jasonjmcghee/rem