Every media leader today is facing a slew of questions about artificial intelligence.
But few media leaders would pass a pop quiz on the basics of AI.
I have a unique point of view on the intersection of news and artificial intelligence because I have served both as a senior masthead journalism leader and also have served as a chief technology officer. This is the third part in my series about AI and News. I’m writing these columns to help the news industry move the discussion of AI away from the abstract into a more tangible place where case studies can be examined and technology explored.
This post is deliberately more technical than my earlier posts. But that doesn’t mean it’s only for technologists. Media leaders in all disciplines — news, opinion, marketing, advertising, human resources and finance — need to understand the technology involved in AI. I will explain things in ways I think media leaders from around the business can understand.
Read this post, and I’ll help you pass the pop quiz.
— — Technologies to Know …
To have any real discussion of AI use cases, you need to know about the main technologies that support the world of AI. Much of this post will explain what those are, grouping them into five categories. Here’s a quick chart with full explanations to follow.
The mathematical concepts underlying all of the AI products you’ve been hearing about are within an area of math called neural networks. Neural networks are models that can be “trained” to make accurate predictions on data. For instance, you can build a neural network to examine a picture and determine whether it is a dog or a cat. Or, you can build a model to read some text and predict the next word, which is the core idea behind products like ChatGPT.
Neural networks are inspired by the structure of the brain: they contain a bunch of “neurons” that are connected to each other. The input data (say, the pixels of an image showing a cat or a dog) are connected to some of the neurons. Each of these neurons is connected to many others, and the strength of those connections are called the “weights”. When a neuron fires, the neurons connected to it are affected in proportion to those weights, and fire if they receive a big enough input. This process proceeds through the network until eventually, an “output neuron” fires with the result of the model — for instance, in our cat vs. dog example, one neuron fires if the image is a cat and a different one fires if it’s a dog. “Training” a model means showing it millions (or trillions) of examples of an input and a correct output and then adjusting the “weights” so that the network does a good job.
You can see how this idea is inspired by our brains — our eyes receive light and connect that data from the light to the neurons in our brain through our optic nerve. The neurons in our brain then “fire” in a way that processes the information we’ve seen. But in the case of a neural network, it’s just math. You could make a small one of your own in a spreadsheet.
These models have been studied by academics since the 50’s. Until recently, though, they were not particularly good. The reason is that to work well, these models need a lot of neurons — millions or even billions. The more neurons they have, the more computing power is required to run them and the more data to train them. The explosion of computing power in the last fifteen years finally enabled them to become powerful enough to be interesting.
Neural Network Architectures
When you hear about “AI research,” a lot of the work has been in finding the right way to wire the neurons in a network together to solve a particular task. These organizing patterns are called “architectures”, and many different architectures have been found over the years that work for particular problems, each one with its own acronym. For instance, “CNNs” are good for recognizing images, and “LSTMs” for timeseries prediction. The most recent architecture you might read about is the “Transformer”, which was developed by researchers at Google in 2018, and which is the foundation of the most recent batch of language models. It seems to be pretty good at everything and represents a major step forward in AI. Some researchers in the AI community believe that Transformers are enough, and AI will just continue to get better as we feed it more data. But many others believe we’ll need another step forward like the Transformer for AI to take the next step.
Though media companies probably won’t be investing in building new architectures, now that you know the different kinds of models that are suited to different tasks (and that the landscape is still rapidly changing), you are better equipped to talk about the AI products that sit above all this.
Once a neural network architecture is chosen, AI models are built using that architecture. To build the AI models, there are some common basic tasks that have to be carried out. Google, Facebook and other companies have built software infrastructure that automate those basic tasks. (An example of one such task is called gradient backpropagation, which is the technique of showing a model an input and a desired output and determining how to adjust the model to make it more accurate on that input.)
The majority of these software infrastructure systems are scripted in Python, so knowledge of Python is needed to interact with them.
The most prominent ones are:
1. Torch — built by Facebook, also sometimes known as PyTorch
2. Tensorflow — built by Google. This came out of Jeff Dean’s work within Google.
3. Jax — this is popular in the cutting-edge AI research community. It comes from computer science labs run by Geoff Hinton and Yoshua Bengio.
4. Keras — This is a super user-friendly layer that is put on top of Tensorflow to make Tensorflow easier to use.
Which should your teams be using? Keras is the best thing to use if you’re just trying to learn. As soon as your teams learn, they would likely move to one of the other systems, likely Torch or Tensorflow.
Torch and Tensorflow are very similar. From what I have seen in my consulting, Torch is preferred by people who are building AI models, and Tensorflow is preferred by people who are deploying models, but both are widely used for both purposes.
Hardware and compute infrastructure
To make a good AI model, you need to use a lot of data. Enormous amounts of computing power are required to process that data into a model. CPUs and GPUs are chips in computers that allow AI to work. Broadly speaking, CPUs are the general purpose chips that process the data in your personal computer, and GPUs are the chips that are used to train AI models.
The history of GPUs is interesting. Graphics Processing Units were originally developed as graphics cards for video games. In the late 2000’s AI researchers realized they could be used to train models hundreds of times more efficiently than regular CPUs — without GPUs the AI revolution could not have happened. Nvidia is a big player in GPU production and made a big bet on AI a few years ago. The rise in AI explains the rapid rise in Nvidia’s stock price.
While it’s good for media companies to understand this, it’s unlikely media companies will be investing the hundreds of millions of dollars into GPUs required to train a new AI model from scratch. Still, you can rent access to GPUs on the cloud from Amazon’s AWS or Microsoft’s Azure, and these may be useful at the small scale for your data science teams. There will surely be use cases your technology leaders will be pitching you about renting GPUs. The cloud computing providers (Amazon, Microsoft, Google) are offering deals right now to many companies of discounted cloud computing space to incentivize companies to build their own AI models. My prediction is these discounts will go away in time, so be careful not to bake them into long-running budgets or your AI decisions.
It’s extremely expensive to train a new AI model from scratch — it can cost hundreds of millions of dollars, as explained above — so most companies in the media and elsewhere will work on top of a pre-trained model.
There are tons of companies selling pre-trained models. The easiest place to find many of them is on Hugging Face, which is a brokerage that allows people to purchase and trade use of models, including language models, computer vision and object recognition. When you get the use of a model, you get a frozen copy, so you’ll have the model as of one point in time, but it won’t automatically incorporate future changes the creator of the model makes. That said, you can update it with your data and you can consider purchasing the creator’s updates to the model.
In general for news, I am a fan of open-source technology. It allows media companies to cost-efficiently use some of the best technology and also allows them to have their engineers working with tech that they generally feel is valuable to learn. Knowledge of open source tech is a marketable skill, and that appeals to engineers.
One open-source model to pay close attention to is Facebook’s Llama models. The fact that Facebook has open-sourced these makes them potentially very attractive to media companies. Your legal office should review the license agreement for Llama. Other open source models include Falcon from a tech company in the United Arab Emirates.
In general, there’s an arms race in making AI models. The companies investing in them stand to gain long leads on slower movers. The media industry is unlikely to create these models from scratch, but news companies can start building on top of some of the pre-trained ones that already exist. Media companies don’t need to push the envelope on AI here — they need to use these models to build new products and services and get them deployed in the real world.
This is the layer of AI technology that you have been hearing the most about.
This is where the organization OpenAI fits in the picture. OpenAI’s product, ChatGPT, is an AI product. It’s important to understand the distinction between being an AI product and an AI Model. There is an AI Model that sits below the ChatGPT product. That model — which is not reviewable or transparent to the public — is akin to the models you can purchase in HuggingFace or the Llama open source model from Facebook. So if your technology teams are talking over whether to purchase ChatGPT or whether to use Facebook’s Llama model, it’s important to be clear that they are considering two different points of entry into AI technology. One, using ChatGPT, is a more superficial entry with less control of what’s happening but also less internal work to execute. The second path, using Llama, offers more control and transparency of the tech to a media company that uses it, but that also requires the media company invest more of its own team members to build upon it.
And it’s good to note that, despite OpenAI’s name, its products are not as “open” and transparent as pre-trained models. But that’s how OpenAI has proprietary value — it ran many trials of its models, used lots of expensive computing power and determined the precise model weights that make its models accurate. Knowing these weights is valuable.
The landscape of AI products is moving rapidly. Just in the last few weeks, Amazon released another called Bedrock which will be attractive to any organization already using AWS. These new models are coming so fast that I have not personally examined Bedrock yet. But it’s on my list to do.
Anthropic is another competing company in this space. Its product that is akin to ChatGPT is called Claude. Anthropic markets it as a safer AI product that’s less likely to produce toxic or dangerous responses, citing a differentiated training technique called “constitutional AI.” Its model is also very high quality, so it can be considered a strong competitor to ChatGPT.
This is the emergence of a whole new industry that the entire economy will likely use to some extent. Already many new products are being built with ChatGPT, Claude, Llama or Falcon embedded within them.
Next Steps for Your Company
As AI technology continues to develop, your company should be figuring out its use cases for AI. Within media, it will be important to work across the company, with cross-disciplinary teams that can determine uses that support great journalism as well as business goals like understanding subscribers better and serving relevant advertisements. There is a bit of alchemy to all this, and you have to be able to deal with some of the failure that will come as you test and deploy AI.
The more time I spend with AI, I am convinced that it won’t be useful without smart people examining its uses and models, and that human guidance of AI will be an area of expertise for most successful companies within a few years.
Check Out My Other Recent Posts on AI and The News Industry: