A few days ago, I wrote a post about a Twitter data collection project that I've slowly been working on:
^ that one.
Now that it has been a few weeks, where have I been? Well, not too far. Unfortunately, life has been hectic over here (I work for a living, man). Over here, we have been collecting infosec related tweets to not only provide data for our teams, but also to see how things are going in the infosec realm.
Our first goal was to successfully hit 100,000 tweets collected. After a few weeks, 100,000 became 250,000. Soon, we decided that 1 Million collected tweets would be our 'major' project milestone. Sure, in the full scope of things, 1 Million isn't a MASSIVE number. It is, however, a lot of infosec.
Release the tweets!
We decided that once we hit one million that we would dump our collected tweets out on the internet. Why? Why the heck not? On a serious note, we're hoping that someone can do something useful with the data contained in them. At a glance, data includes:
- Date/Time post was made
- LAT/LON (if available)
- Specified country (if available)
- Country code of tweet (if available)
That's a good chunk of data for so many different uses.
Public drop the dump on Twitter (ironic):
Looking to the future
We hit that first milestone. What the heck now? For the tweet side, probably not a whole lot. There are plans to start harvesting some data from PasteBin and monitor for specific keywords. Scraping Twitter and PasteBin isn't something new and a lot of people/places do it. We are, however, going to do SMS alerts to our little data team (three people). This will allow us to be alerted on our phones, should specific parameters be met.
Down the road:
- Optimizing Twitter collection
- Finalizing PasteBin collector
- Adding our Onion scraper (we love onions)
I'm not sure how sustainable this project is going to be, however, I hope that we can continue collecting, expanding, getting better, and giving away more free data ;).
Here's to the future my friends.