Britney starts by mentioning Elon Musk. He started a company called Open AI. The whole point of the company was to further the space of machine learning and AI. They want to propel things quicker and open source everything. But it didn’t really go as planned.
A couple of months ago, they said that their text generator was too good – too dangerous – to release to the public. Is this a PR stunt? Is it legitimate?
Britney took a closer look, because they revealed the stripped down generator. In order to generate text you have to start with a word, sentence or paragraph…the more you feed it, the better. It’s sort of like Smart Compose in Gmail.
The generator isn’t perfect but it can auto-generate some really great content.
If you take one thing away from this it should be that machine learning is becoming more and more accessible. It’s going to free us up to work on a more strategic level.
What about the displacement of jobs? We should actually be talking about jobs being displaced to people who use these tools. So how can we start to navigate these new waters?
At the end of this talk, everyone should be able to use machine learning.
We’ll start with a few examples, then break down the mechanics a little further.
AI can read a lot of text and compress information, allowing humans to work on higher level processes. JP Morgan were able to automate the absorption of hours of financial material.
It can also predict what’s in front of a camera. Sometimes it gets things right…sometimes it’s not so good. However, it’s always interesting.
Britney says that she has absolutely no clue what she is doing. This world is complex and strange. She thinks of herself as the best thief ever. She sees these models and works out how they can be applied.
One example is the Shakespeare model. It looks at loads of Shakespeare text and creates new stories. Britney decided to combine text from Rand Fishkin and Beyonce, combining SEO articles with the entire Lemonade album. Once she’d trained it enough, they made some pretty good raps – bit.ly/rand-b – the model knows how a song is laid out and it knows how to rhyme.
The thing is, we’re experts in what we do on a daily basis and these tools can help us automate these things.
That’s great, but how can we apply this more to SEO?
Frase is a tool for content research. It automates the research of content shared around a particular topic and start to put together questions and answers.
Did you know that you can automate videos? Lumen5 lets you put in a bunch of text for free and it will generate media based on the text that you enter. It’s using natural language processing to understand your topic and find imagery and media that suits the material.
Obviously, none of this is perfect, but it will get you most of the way there.
How about automating transcriptions? The average podcast listener consumes 7 episodes a week. That’s wild! But it’s hard to translate to search. Why would you not want to transcribe audio content to make it work better for you? There are loads of great tools out there that do it quickly and cost-effectively. Amazon Transcribe will transcribe an hour of audio in a couple of minutes for pennies.
You can also automate image optimisation. This is great, especially if you have a website with thousands of images.
Another resource is TensorFlow for Poets.
When you’re playing around with models, Britney challenges us to break this shit. You can customise models to fit your needs and scale your activities.
The ability to automate meta descriptions is incredible. This is awesome for large sites where you just can’t write unique descriptions for every page. To get something unique on every page, you can use models to do it for you. Algorithmia is a great resource for finding models. Moz has a guide. Some of this sounds complex, but it’s getting simpler.
Shout out to JR Oakes and Grayson Parks, who are at the cutting edge of this stuff.
But this is just the tip of the iceberg. It’s an exciting space and we’re just at the very beginning.
How can these different models assist you and your coworkers in this industry?
So what the heck is happening under the hood of machine learning? How can we talk intellectually about this stuff?
Machine learning is a subset of AI. Whenever you hear the word AI, 99.9% of the time it’s talking about AI. It combines statistics and programming to give computers the ability to learn without needing to be programmed. It trains them to identify and compute patterns at a level far beyond what the human mind can accomplish.
A typical model starts with hundreds or thousands of labelled training data points. Whatever you’re trying to get the model to learn, you need clean training data. However, you need to save some of that data to test how well the model performs after training.
How does it learn? It learns through linear regression. It sees how far off it is when it’s incorrect. This is also known as the loss function, so it’s basically trying to minimise its loss.
What starts to happen is that you go from the far left side of the error curve, where it’s not fitting anything. You don’t actually want to fit to every single data point because it doesn’t allow the machine to adapt to new data points. It should be a smooth curve that can predict accurately without overfitting.
If ML was a car, data would be the fuel. ML training is all about finding clean data. That’s exactly what Google does! We label training data for them without knowing it. Who did the 10 year challenge? That’s now being built into a beautiful age predicting model. This stuff is amazing but it can also be used for really scary things. Be mindful of what’s happening. We’re feeding our data to these companies.
At Google IO, they talked about how they use the text from the keyword to be predictive of what everyone else is writing. That’s crazy! This stuff is happening so fast.
Again, if we can use some of this stuff, it’s going to help us be more strategic.
So what are the tools we can use now?
These resources are basically plug and play.
Start with Codelabs. It’s the place to go on Google where they walk you through step by step implementation of machine learning. Britney recommends filtering by TensorFlow or machine learning.
Colaboratory notebooks are also really super powerful. You can literally collaborate with anyone else. Why does Google do this? Probably to check out what people are working on!
MonkeyLearn is another great tool that has pre-baked models and a powerful Google Sheets add on.
Algorithmia, as was mentioned before, packages up models that you can plug and play with to do whatever you might want.
Explore the Natural Language API demo. This is something that most people aren’t even aware of. This is Google’s API…you can literally just paste in your text and see how Google interprets your content. You can also do this for competitors. How is Google categorising the content that outranks you? You can also do this on an entity level, or for syntax and much more.
Britney also wants to squeeze rev.com in as the best audio transcription tool to date. If you need realtime transcription, Google’s cloud speech to text is also an option.
Image Net is the largest online source of labelled images. It’s incredibly robust.
Using g.co/teachablemachine you can build a model without touching one line of code. You just use your computer camera!
Look at Paul Shapiro. He’s doing fantastic stuff in this space. He came up with a way to automate metadata, differently to Moz. Check it out and look at some of his other content. It’s like he’s living in 2040. 20 years ago he created something that can automate 301 redirects. It compares different iterations of your site and can automatically add syntactically-based redirects to your HTAccess file.
Kaggle is the largest platform for data science competitions. It’s interesting to keep an eye on it and see what’s happening in this space. Who is stumping up cash for specific tasks and what are they? For example, the TSA has put up a bid for passenger screening. Remember, these models are only as good as their training data. If this model is the least bit biased, the output will be too. Diversity is paramount to the success of this field. We need to be having these conversations now.
The technology to make this stuff is getting better and cheaper. We’re now talking about disposable AI. You can push entire models through the cloud powered by this disposable AI. There’s no security concerns because nothing is getting saved. It’s all sitting on these machines and vanishing after the fact.
Britney finishes by saying that the Moz data science team is working on new ML tools for SEO. So watch this space.
The key takeaways:
- Machine learning combines statistics and programming.
- Machine learning is only as good as the data.
- YOU can create an ML model today.
- ML will level us up as an industry.
- Diversity is paramount in ML.