SWIPR - DATA CLEANING

Back in part 3, we said: Of course, pictures of beaches, food, and dogs are all common Instagram subjects, even on profiles of stereotypically self centered selfie-obsessed young twenties girls. We will come back to this point in detail later on, but for now, let’s assume that each profile is pure and ideal for its category. Now it is time to come back to that. A bunch of Instagram photos, no matter from whose profile, are of course not pure and ideal for their category, and pictures of beaches, foods, and boyfriends abound.

SWIPR - DATA COLLECTION (PART 2 OF 2)

Downloading from Instagram There are a number of solutions for scraping Instagram, and it’s mostly a pick-your-poison kind of affair. we picked the first well-supported google result for “Instagram scaper” and came up with Instalooter. It seemed to come with good documentation and sufficient automation facilities. Rate Limits We are told that prior to April 2018, downloading from Instagram was a much more lackadaisical affair. There was allegedly a generous rate limit for a given Instagram login token, upwards of 5,000 page requests per hour.

SWIPR - DATA COLLECTION (PART 1 OF 2)

Data Collection In previous posts we laid out what we expect our end result to look like, but underlying all of that is of course the ML model making our auto-tinder more than just a simple always-swipe-right bot. Like all deep learning projects, before we can consider training our model or even what our model architecture will be, we need to consider the problem of data. What, exactly do we need to collect?

SWIPR - SCOPE

From the outset we wanted to have, at the most basic level, the ability to use a CPU-only machine to judge whether or not we should swipe right on a given photograph. Additionally, it does us no good to just have a model that does this, it needs to be useful in some form beyond merely existing, meaning it has to exist as some kind of accessible application. Finally, it would be preferable to have this system be accessible from anywhere.

SWIPR - OVERVIEW

The Goal Continuing the quest to learn modern Machine Learning, I thought it would be fun to create a Tinder auto-swiper that could potentially reject matches I don’t want and accept matches I do want. One of my friends had the brilliant idea of naming this tinderbot “Swiper”, like the fox from Dora the Explorer, famous for being repelled by the intrepid protagonists with the incantation of “Swiper, no swiping!”

MACHINE LEVINE MK. II

Machine Levine Mk. II is the second of the Matt Bots, and the first of which can produce any kind of coherent output. A brief description of his capabilities are described at his homepage over at http://machinelevine.winetech.com/bots/ml2, but we’ll deal with some deeper ideas behind his creation and operation here. Data Collection Mk. II uses the same initial corpus that Mk. I uses. For more details on data collection, refer to Mk.

MACHINE LEVINE MK. I

This is a description of how the first of the Matt Bots came into existence. Few things in this document should be interpreted as the best or even a correct way to do anything - this bot is an academic exercise. That being said, here is how he was made. Data Collection Like the general post describes, the data used to train the Matt model was derived from Matt Levine’s at-the-time 571-count archive of Money Stuff articles.

MACHINE LEVINE

The Goal In order to better understand machine learning, I decided to see if I could get a neural network to write articles like Matt Levine. The original goal was to sort Spotify songs by male or female vocals, but I had been learning all of this stuff by following along with the fast.ai courses, and that specific use case was a bit afield of the coursework and forum discussions happening in the MOOC.

GIT CHANGE TRACKING CAUSES VS CODE FREEZES

Tracking lots of changes in git will freeze VS Code tl;dr: If you workspace is git tracked and you have many (1,000+) outstanding files being change-tracked, your VS Code windows may experience severe slowdowns or outright freezing. Like the tl;dr says, if you’re tracking a ton of changes, you’ll get window freezes. Try to keep tracked changes below 1,000, and try to not track so many different items. If it can’t be helped, then try to commit the changes often enough for it to not be a problem.

A SYSTEM FOR APPROACHING SECURITY QUESTIONS

Security questions are user hostile and advantage the attacker Why security questions suck They’re based on publicly available information. They’re based on capricious and changing feelings They’re based on murky and easily misremembered factoids They’re based on certain assumptions of the world that are not universally true What to do? Social Engineering A system for security questions System Requirements System #1 - Design Considerations The MUST-NOT Items The MAY Items Some Example System 1 System #2 - Design Considerations Conclusion Security questions are user hostile and advantage the attacker Security questions are a terrible concept and should be abolished.