Patently Mathematical: Picking Partners, Passwords, and Careers by the Numbers

Ebook431 pages6 hours

Patently Mathematical: Picking Partners, Passwords, and Careers by the Numbers

Name: Patently Mathematical: Picking Partners, Passwords, and Careers by the Numbers
Author: Jeff Suzuki
ISBN: 9781421427065

By Jeff Suzuki

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Fascinating facts and stories behind inventions based on mathematics—from search engines to streaming video to self-correcting golf balls.

How do dating sites match compatible partners? What do cell phones and sea coasts have in common? And why do computer scientists keep ant colonies? Jeff Suzuki answers these questions and more in Patently Mathematical, which explores the mathematics behind some of the key inventions that have changed our world.

In recent years, patents based on mathematics have been issued by the thousands—from search engines and image recognition technology to educational software and LEGO designs. Suzuki delves into the details of cutting-edge devices, programs, and products to show how even the simplest mathematical principles can be turned into patentable ideas worth billions of dollars. Discover:

• whether secure credit cards are really secure

• how improved data compression made streaming video services like Netflix a hit

• the mathematics behind self-correcting golf balls

• why Google is such an effective and popular search engine

• how eHarmony and Match.com bring couples together, and much more

Combining quirky historical anecdotes with relatable everyday examples, Suzuki makes math interesting for everyone who likes to ponder the world of numerical relationships.

Praise for Jeff Suzuki’s Constitutional Calculus

“An entertaining and insightful approach to the mathematics that underlies the American system of government.” —Mathematical Reviews

“A breath of fresh air. . . . A reaffirmation that mathematics should be used more often to make general public policy.” —MAA Reviews

Skip carousel

LanguageEnglish

PublisherOpen Road Integrated Media

Release dateDec 14, 2018

ISBN9781421427065

Author

Jeff Suzuki

Related authors

Skip carousel

Related to Patently Mathematical

Related ebooks

Skip carousel

Unlocking Algebra - A Comprehensive Guide
Ebook
Unlocking Algebra - A Comprehensive Guide
byNicholas Robinson
Rating: 0 out of 5 stars
0 ratings
Complex Binary Number System: Algorithms and Circuits
Ebook
Complex Binary Number System: Algorithms and Circuits
byTariq Jamil
Rating: 0 out of 5 stars
0 ratings
Theory of Time
Ebook
Theory of Time
byIntroBooks Team
Rating: 0 out of 5 stars
0 ratings
Automated Theorem Proving in Software Engineering
Ebook
Automated Theorem Proving in Software Engineering
byJohann M. Schumann
Rating: 0 out of 5 stars
0 ratings
The Nature of Mathematics Given Physicalism
Ebook
The Nature of Mathematics Given Physicalism
byMagnus Vinding
Rating: 0 out of 5 stars
0 ratings
Circles of Reasoning
Ebook
Circles of Reasoning
byReza Assadi
Rating: 0 out of 5 stars
0 ratings
Practical MATLAB: With Modeling, Simulation, and Processing Projects
Ebook
Practical MATLAB: With Modeling, Simulation, and Processing Projects
byIrfan Turk
Rating: 0 out of 5 stars
0 ratings
A Geometric Algebra Invitation to Space-Time Physics, Robotics and Molecular Geometry
Ebook
A Geometric Algebra Invitation to Space-Time Physics, Robotics and Molecular Geometry
byCarlile Lavor
Rating: 0 out of 5 stars
0 ratings
The Indian Number System: At the Center of the Mathematical World
Ebook
The Indian Number System: At the Center of the Mathematical World
byRao Prasada Ganti
Rating: 0 out of 5 stars
0 ratings
Quarks to Cosmos: Linking All the Sciences and Humanities in a Creative Hierarchy Through Relationships
Ebook
Quarks to Cosmos: Linking All the Sciences and Humanities in a Creative Hierarchy Through Relationships
byJ. Mailen Kootsey Ph.D.
Rating: 0 out of 5 stars
0 ratings
Mathematical Modeling, Simulations, and AI for Emergent Pandemic Diseases: Lessons Learned From COVID-19
Ebook
Mathematical Modeling, Simulations, and AI for Emergent Pandemic Diseases: Lessons Learned From COVID-19
byEsteban A. Hernandez-Vargas
Rating: 0 out of 5 stars
0 ratings
Digital Image Processing: Fundamentals and Applications
Ebook
Digital Image Processing: Fundamentals and Applications
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
Advances in Time Series Forecasting: Volume 2
Ebook
Advances in Time Series Forecasting: Volume 2
byCagdas Hakan Aladag
Rating: 0 out of 5 stars
0 ratings
The Science of Baseball: Batting, Bats, Bat-Ball Collisions, and the Flight of the Ball
Ebook
The Science of Baseball: Batting, Bats, Bat-Ball Collisions, and the Flight of the Ball
byA. Terry Bahill
Rating: 0 out of 5 stars
0 ratings
Fractal: A Novel of Chaotic Suspense
Ebook
Fractal: A Novel of Chaotic Suspense
byKathryn Eberle Wildgen
Rating: 0 out of 5 stars
0 ratings
Robot Brains: Circuits and Systems for Conscious Machines
Ebook
Robot Brains: Circuits and Systems for Conscious Machines
byPentti O. Haikonen
Rating: 5 out of 5 stars
5/5
Generating eBook Income for Intellectuals: A Comprehensive Guide to Creating and Monetizing Digital Books
Ebook
Generating eBook Income for Intellectuals: A Comprehensive Guide to Creating and Monetizing Digital Books
byMarina Peters
Rating: 0 out of 5 stars
0 ratings
Complex analysis A Complete Guide
Ebook
Complex analysis A Complete Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Interval Mathematics 1980
Ebook
Interval Mathematics 1980
byKarl L. E. Nickel
Rating: 0 out of 5 stars
0 ratings
From Average To K-means
Ebook
From Average To K-means
byBeam van Waardenberg
Rating: 0 out of 5 stars
0 ratings
Building NFTs with Ethereum: Learn how to create, deploy, and sell NFTs on Ethereum (English Edition)
Ebook
Building NFTs with Ethereum: Learn how to create, deploy, and sell NFTs on Ethereum (English Edition)
byYattish Ramhorry
Rating: 0 out of 5 stars
0 ratings
Introduction to Gambling Theory: Know the Odds!
Ebook
Introduction to Gambling Theory: Know the Odds!
byWilliam R. Parks
Rating: 4 out of 5 stars
4/5
Markov Decision Process: Fundamentals and Applications
Ebook
Markov Decision Process: Fundamentals and Applications
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
Mathematical Methods of Statistics (PMS-9), Volume 9
Ebook
Mathematical Methods of Statistics (PMS-9), Volume 9
byHarald Cramér
Rating: 3 out of 5 stars
3/5
Conventions of Thinking: An Introduction to Critical Thinking
Ebook
Conventions of Thinking: An Introduction to Critical Thinking
byDouglas Patterson
Rating: 0 out of 5 stars
0 ratings
The philosophy of mathematics
Ebook
The philosophy of mathematics
byW. M. Gillespie
Rating: 0 out of 5 stars
0 ratings
Dynamic Programming: Sequential Scientific Management
Ebook
Dynamic Programming: Sequential Scientific Management
byA. Kaufmann
Rating: 0 out of 5 stars
0 ratings
Artificial and Mathematical Theory of Computation: Papers in Honor of John McCarthy
Ebook
Artificial and Mathematical Theory of Computation: Papers in Honor of John McCarthy
byVladimir Lifschitz
Rating: 0 out of 5 stars
0 ratings
Mastering MongoDB: A Comprehensive Guide to NoSQL Database Excellence
Ebook
Mastering MongoDB: A Comprehensive Guide to NoSQL Database Excellence
byKameron Hussain
Rating: 0 out of 5 stars
0 ratings
Operator Methods in Quantum Mechanics
Ebook
Operator Methods in Quantum Mechanics
byMartin Schechter
Rating: 0 out of 5 stars
0 ratings

Mathematics For You

Skip carousel

Quantum Physics for Beginners
Ebook
Quantum Physics for Beginners
byMax Thomson
Rating: 4 out of 5 stars
4/5
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
Ebook
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
byS. Deviant
Rating: 4 out of 5 stars
4/5
Geometry For Dummies
Ebook
Geometry For Dummies
byMark Ryan
Rating: 5 out of 5 stars
5/5
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics
Ebook
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics
byDavid Borman
Rating: 4 out of 5 stars
4/5
Algebra I Workbook For Dummies
Ebook
Algebra I Workbook For Dummies
byMary Jane Sterling
Rating: 3 out of 5 stars
3/5
Basic Math & Pre-Algebra For Dummies
Ebook
Basic Math & Pre-Algebra For Dummies
byMark Zegarelli
Rating: 4 out of 5 stars
4/5
The Everything Guide to Pre-Algebra: A Helpful Practice Guide Through the Pre-Algebra Basics - in Plain English!
Ebook
The Everything Guide to Pre-Algebra: A Helpful Practice Guide Through the Pre-Algebra Basics - in Plain English!
byJane Cassie
Rating: 5 out of 5 stars
5/5
Game Theory: A Simple Introduction
Ebook
Game Theory: A Simple Introduction
byK.H. Erickson
Rating: 4 out of 5 stars
4/5
Algebra - The Very Basics
Ebook
Algebra - The Very Basics
byMetin Bektas
Rating: 5 out of 5 stars
5/5
Practice Makes Perfect Algebra II Review and Workbook, Second Edition
Ebook
Practice Makes Perfect Algebra II Review and Workbook, Second Edition
byChristopher Monahan
Rating: 4 out of 5 stars
4/5
Build a Mathematical Mind - Even If You Think You Can't Have One: Become a Pattern Detective. Boost Your Critical and Logical Thinking Skills.
Ebook
Build a Mathematical Mind - Even If You Think You Can't Have One: Become a Pattern Detective. Boost Your Critical and Logical Thinking Skills.
byAlbert Rutherford
Rating: 5 out of 5 stars
5/5
Real Estate by the Numbers: A Complete Reference Guide to Deal Analysis
Ebook
Real Estate by the Numbers: A Complete Reference Guide to Deal Analysis
byJ Scott
Rating: 0 out of 5 stars
0 ratings
Precalculus: A Self-Teaching Guide
Ebook
Precalculus: A Self-Teaching Guide
bySteve Slavin
Rating: 4 out of 5 stars
4/5
The Little Book of Mathematical Principles, Theories & Things
Ebook
The Little Book of Mathematical Principles, Theories & Things
byRobert Solomon
Rating: 3 out of 5 stars
3/5
Mental Math Secrets - How To Be a Human Calculator
Ebook
Mental Math Secrets - How To Be a Human Calculator
byRandy Silverman
Rating: 5 out of 5 stars
5/5
Calculus Made Easy
Ebook
Calculus Made Easy
bySilvanus P. Thompson
Rating: 4 out of 5 stars
4/5
The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English!
Ebook
The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English!
byChristopher Monahan
Rating: 4 out of 5 stars
4/5
Painless Geometry
Ebook
Painless Geometry
byLynette Long
Rating: 4 out of 5 stars
4/5
Relativity: The special and the general theory
Ebook
Relativity: The special and the general theory
byAlbert Einstein
Rating: 5 out of 5 stars
5/5
Is God a Mathematician?
Ebook
Is God a Mathematician?
byMario Livio
Rating: 4 out of 5 stars
4/5
Mathematical Thinking - For People Who Hate Math: Level Up Your Analytical and Creative Thinking Skills. Excel at Problem-Solving and Decision-Making.
Ebook
Mathematical Thinking - For People Who Hate Math: Level Up Your Analytical and Creative Thinking Skills. Excel at Problem-Solving and Decision-Making.
byAlbert Rutherford
Rating: 3 out of 5 stars
3/5
The Golden Ratio: The Divine Beauty of Mathematics
Ebook
The Golden Ratio: The Divine Beauty of Mathematics
byGary B. Meisner
Rating: 5 out of 5 stars
5/5
ACT Math & Science Prep: Includes 500+ Practice Questions
Ebook
ACT Math & Science Prep: Includes 500+ Practice Questions
byKaplan Test Prep
Rating: 3 out of 5 stars
3/5
The Thirteen Books of the Elements, Vol. 1
Ebook
The Thirteen Books of the Elements, Vol. 1
byEuclid
Rating: 0 out of 5 stars
0 ratings
Summary of The Black Swan: by Nassim Nicholas Taleb | Includes Analysis
Ebook
Summary of The Black Swan: by Nassim Nicholas Taleb | Includes Analysis
byInstaread Summaries
Rating: 5 out of 5 stars
5/5
The Math of Life and Death: 7 Mathematical Principles That Shape Our Lives
Ebook
The Math of Life and Death: 7 Mathematical Principles That Shape Our Lives
byKit Yates
Rating: 4 out of 5 stars
4/5
The Elements of Euclid for the Use of Schools and Colleges (Illustrated)
Ebook
The Elements of Euclid for the Use of Schools and Colleges (Illustrated)
byISAAC TODHUNTER
Rating: 0 out of 5 stars
0 ratings
The Math Book: From Pythagoras to the 57th Dimension, 250 Milestones in the History of Mathematics
Ebook
The Math Book: From Pythagoras to the 57th Dimension, 250 Milestones in the History of Mathematics
byClifford A. Pickover
Rating: 3 out of 5 stars
3/5
My Best Mathematical and Logic Puzzles
Ebook
My Best Mathematical and Logic Puzzles
byMartin Gardner
Rating: 5 out of 5 stars
5/5
See Ya Later Calculator: Simple Math Tricks You Can Do in Your Head
Ebook
See Ya Later Calculator: Simple Math Tricks You Can Do in Your Head
byEditors of Portable Press
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

A New Map Traces the Limits of Computation: A major advance in computational complexity reveals deep connections between the classes of problems that computers can — and can’t — possibly do.
Podcast episode
A New Map Traces the Limits of Computation: A major advance in computational complexity reveals deep connections between the classes of problems that computers can — and can’t — possibly do.
byQuanta Science Podcast
0 ratings
0% found this document useful
Julian Havil, "Curves for the Mathematically Curious" (Princeton UP, 2019): You don’t have to be mathematically curious to appreciate Julian’s talent for weaving mathematics and history together...
Podcast episode
Julian Havil, "Curves for the Mathematically Curious" (Princeton UP, 2019): You don’t have to be mathematically curious to appreciate Julian’s talent for weaving mathematics and history together...
byNew Books in Mathematics
0 ratings
0% found this document useful
A Twisted Path to Equation-Free Prediction: Complex natural systems defy standard mathematical analysis, so one ecologist is throwing out the equations.
Podcast episode
A Twisted Path to Equation-Free Prediction: Complex natural systems defy standard mathematical analysis, so one ecologist is throwing out the equations.
byQuanta Science Podcast
0 ratings
0% found this document useful
#104 - Leonard Susskind: Leonard Susskind is a professor of theoretical physics at Stanford University and he’s regarded as one of the fathers of string theory. 
Podcast episode
#104 - Leonard Susskind: Leonard Susskind is a professor of theoretical physics at Stanford University and he’s regarded as one of the fathers of string theory. 
byY Combinator
0 ratings
0% found this document useful
Episode 010 - Intellectual Citizenship (part 1): Bill and Paul dive into a very simple question posed by Bill over email: "Please describe more what is intellectual citizenship?" That of course opens up a question that lurks behind every issue we discuss, and any philosophical or religious question tou...
Podcast episode
Episode 010 - Intellectual Citizenship (part 1): Bill and Paul dive into a very simple question posed by Bill over email: "Please describe more what is intellectual citizenship?" That of course opens up a question that lurks behind every issue we discuss, and any philosophical or religious question tou...
byThat's So Second Millennium
0 ratings
0% found this document useful
Bonus : The Perils of Science Communication: Where do we draw the line between Outreach and Clickbait?
Podcast episode
Bonus : The Perils of Science Communication: Where do we draw the line between Outreach and Clickbait?
byThe Field Guide to Particle Physics
0 ratings
0% found this document useful
135: How the US Navy Sources Breakthrough Innovation To Keep Us Safe: In my latest episode of Innovation Storytellers Show, I welcome Dr. Scott Walper, Science Director for Synthetic Biology at the US Office of Naval Research Global. The conversation took an exciting turn into the world of engineered living materials, a...
Podcast episode
135: How the US Navy Sources Breakthrough Innovation To Keep Us Safe: In my latest episode of Innovation Storytellers Show, I welcome Dr. Scott Walper, Science Director for Synthetic Biology at the US Office of Naval Research Global. The conversation took an exciting turn into the world of engineered living materials, a...
byInnovation Storytellers
0 ratings
0% found this document useful
What is the Turing Rabbit Holes Math and Physics Podcast?
Podcast episode
What is the Turing Rabbit Holes Math and Physics Podcast?
byTuring Rabbit Holes
0 ratings
0% found this document useful
The Lure of Cheap Crap & Will Artificial Intelligence Take Over Your Job?
Podcast episode
The Lure of Cheap Crap & Will Artificial Intelligence Take Over Your Job?
bySomething You Should Know
0 ratings
0% found this document useful
Does Not Compute: Scientific journal articles have a lot of numbers. Scientists are smart people with even smarter computers, so an outsider might think that, if nothing else, you can count on the math checking out. But modern data analysis is complicated, and computation...
Podcast episode
Does Not Compute: Scientific journal articles have a lot of numbers. Scientists are smart people with even smarter computers, so an outsider might think that, if nothing else, you can count on the math checking out. But modern data analysis is complicated, and computation...
byThe Black Goat
0 ratings
0% found this document useful
Episode 104 - Scraping Facts Online: If You Can’t Beat ’Em, Datum: At the time of this taping, Paul was in the middle of the Metis “bootcamp” program learning the capabilities, tools, and insights of data science. This conversation ranged widely in the realm of data analysis and management, examining its relevance to Pa...
Podcast episode
Episode 104 - Scraping Facts Online: If You Can’t Beat ’Em, Datum: At the time of this taping, Paul was in the middle of the Metis “bootcamp” program learning the capabilities, tools, and insights of data science. This conversation ranged widely in the realm of data analysis and management, examining its relevance to Pa...
byThat's So Second Millennium
0 ratings
0% found this document useful
How to Ramp Up Curiosity (Even When Using a Controversial Topic) - Part One: Most of us avoid controversy because it brings up too much pushback. But what if you were able to get your very controversial topic across and delight your clients? Let's find out how to ramp up that curiosity and controversy-level without alienating...
Podcast episode
How to Ramp Up Curiosity (Even When Using a Controversial Topic) - Part One: Most of us avoid controversy because it brings up too much pushback. But what if you were able to get your very controversial topic across and delight your clients? Let's find out how to ramp up that curiosity and controversy-level without alienating...
byThe Three Month Vacation Podcast
0 ratings
0% found this document useful
Max Bennett, "A Brief History of Intelligence: Evolution, Ai, and the Five Breakthroughs That Made Our Brains" (Mariner Books, 2023): An interview with Max Bennett
Podcast episode
Max Bennett, "A Brief History of Intelligence: Evolution, Ai, and the Five Breakthroughs That Made Our Brains" (Mariner Books, 2023): An interview with Max Bennett
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
Nuclear Fusion III: Semi-Empirical Mass Formula: There’s a famous saying that’s attributed to the statistician Henry L Box – but it’s one of the most important sayings for thinking about physics, and science in general. “All models are wrong, but some models are useful.” What does this...
Podcast episode
Nuclear Fusion III: Semi-Empirical Mass Formula: There’s a famous saying that’s attributed to the statistician Henry L Box – but it’s one of the most important sayings for thinking about physics, and science in general. “All models are wrong, but some models are useful.” What does this...
byPhysical Attraction
0 ratings
0% found this document useful
#62 - Paul Christiano on messaging the future, increasing compute, & how CO2 impacts your brain: Imagine that – one day – humanity dies out. At some point, many millions of years later, intelligent life might well evolve again. Is there any message we could leave that would reliably help them out?
Podcast episode
#62 - Paul Christiano on messaging the future, increasing compute, & how CO2 impacts your brain: Imagine that – one day – humanity dies out. At some point, many millions of years later, intelligent life might well evolve again. Is there any message we could leave that would reliably help them out?
by80,000 Hours Podcast
0 ratings
0% found this document useful
MAX BENNETT - Author of A Brief History of Intelligence: Evolution, AI, and the Five Breakthroughs That Made Our Brains - CEO of Alby: AI, intelligence, brains, minds, human development, artificial intelligence, machine learning, ChatGPT, OpenAI, technology, neuroscience, neocortex
Podcast episode
MAX BENNETT - Author of A Brief History of Intelligence: Evolution, AI, and the Five Breakthroughs That Made Our Brains - CEO of Alby: AI, intelligence, brains, minds, human development, artificial intelligence, machine learning, ChatGPT, OpenAI, technology, neuroscience, neocortex
byPhilosophy, Ideas, Critical Thinking, Ethics & Morality: The Creative Process: Philosophers, Writers, Educators, Creative Thinkers, Spiritual Leaders, Environmentalists & Bioethicists
0 ratings
0% found this document useful
An AI Hammer in Search of a Nail
Podcast episode
An AI Hammer in Search of a Nail
byHow to Fix the Internet
0 ratings
0% found this document useful
STEPHEN WOLFRAM 2.0 - Resolving the Mystery of the Second Law of Thermodynamics
Podcast episode
STEPHEN WOLFRAM 2.0 - Resolving the Mystery of the Second Law of Thermodynamics
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
Max Bennett, "A Brief History of Intelligence: Evolution, Ai, and the Five Breakthroughs That Made Our Brains" (Mariner Books, 2023): An interview with Max Bennett
Podcast episode
Max Bennett, "A Brief History of Intelligence: Evolution, Ai, and the Five Breakthroughs That Made Our Brains" (Mariner Books, 2023): An interview with Max Bennett
byNew Books in Science
0 ratings
0% found this document useful
Vaporware skepticism
Podcast episode
Vaporware skepticism
byRiskgaming
0 ratings
0% found this document useful
110. Alex Turner - Will powerful AIs tend to seek power?
Podcast episode
110. Alex Turner - Will powerful AIs tend to seek power?
byTowards Data Science
0 ratings
0% found this document useful
Literary Festival 2015: A Magna Carta for Humanity: homing in on human rights
Podcast episode
Literary Festival 2015: A Magna Carta for Humanity: homing in on human rights
bySpring 2015 | Public lectures and events | Audio and pdf
0 ratings
0% found this document useful
MAX BENNETT - Author of A Brief History of Intelligence: Evolution, AI, and the Five Breakthroughs That Made Our Brains - CEO of Alby: AI, intelligence, brains, minds, human development, artificial intelligence, machine learning, ChatGPT, OpenAI, technology, neuroscience, neocortex
Podcast episode
MAX BENNETT - Author of A Brief History of Intelligence: Evolution, AI, and the Five Breakthroughs That Made Our Brains - CEO of Alby: AI, intelligence, brains, minds, human development, artificial intelligence, machine learning, ChatGPT, OpenAI, technology, neuroscience, neocortex
bySustainability, Climate Change, Renewable Energy, Politics, Activism, Biodiversity, Carbon Footprint, Wildlife, Regenerative Agriculture, Circular Economy, Extinction, Net-Zero · One Planet Podcast
0 ratings
0% found this document useful
MAX BENNETT - Author of A Brief History of Intelligence: Evolution, AI, and the Five Breakthroughs That Made Our Brains - CEO of Alby: AI, intelligence, brains, minds, human development, artificial intelligence, machine learning, ChatGPT, OpenAI, technology, neuroscience, neocortex
Podcast episode
MAX BENNETT - Author of A Brief History of Intelligence: Evolution, AI, and the Five Breakthroughs That Made Our Brains - CEO of Alby: AI, intelligence, brains, minds, human development, artificial intelligence, machine learning, ChatGPT, OpenAI, technology, neuroscience, neocortex
byOne Planet Podcast · Climate Change, Politics, Sustainability, Environmental Solutions, Renewable Energy, Activism, Biodiversity, Carbon Footprint, Wildlife, Regenerative Agriculture, Circular Economy, Extinction, Net-Zero
0 ratings
0% found this document useful
???‍?? 215 - Social Science & Collective Intelligence with Brigham Adams of Goodly Labs
Podcast episode
???‍?? 215 - Social Science & Collective Intelligence with Brigham Adams of Goodly Labs
byFUTURE FOSSILS
0 ratings
0% found this document useful
Thirty years after the Earth Summit: Thirty years on, we ask what has the Earth Summit in Rio achieved?
Podcast episode
Thirty years after the Earth Summit: Thirty years on, we ask what has the Earth Summit in Rio achieved?
byUnexpected Elements
0 ratings
0% found this document useful
#157 Computer lawyer takes first court case; brains speed up with age
Podcast episode
#157 Computer lawyer takes first court case; brains speed up with age
byNew Scientist Podcasts
0 ratings
0% found this document useful
Max Bennett, "A Brief History of Intelligence: Evolution, Ai, and the Five Breakthroughs That Made Our Brains" (Mariner Books, 2023): An interview with Max Bennett
Podcast episode
Max Bennett, "A Brief History of Intelligence: Evolution, Ai, and the Five Breakthroughs That Made Our Brains" (Mariner Books, 2023): An interview with Max Bennett
byNew Books in Psychology
0 ratings
0% found this document useful
Why There Is So Much Bullsh*t in Science
Podcast episode
Why There Is So Much Bullsh*t in Science
byPlain English with Derek Thompson
0 ratings
0% found this document useful
Which technologies get cheaper over time, and why?
Podcast episode
Which technologies get cheaper over time, and why?
byVolts
0 ratings
0% found this document useful

Skip carousel

Smaller Is Better: Why Finite Number Systems Pack More Punch
Quanta
Article
Smaller Is Better: Why Finite Number Systems Pack More Punch
Feb 11, 2019
4 min read
Why Blind People Are Better at Math
Nautilus
Article
Why Blind People Are Better at Math
Oct 4, 2016
4 min read
What Question Will You Be Remembered For?
Nautilus
Article
What Question Will You Be Remembered For?
Feb 7, 2018
3 min read
Silicon Valley’s New Obsession
The Atlantic
Article
Silicon Valley’s New Obsession
Jan 20, 2022
7 min read
How Do We Stop The Big Robot Takeover?
You South Africa
Article
How Do We Stop The Big Robot Takeover?
Mar 10, 2023
9 min read
The Easy Questions That Stump Computers
The Atlantic
Article
The Easy Questions That Stump Computers
May 3, 2020
What happens when you stack logs in a fireplace and drop a match? Some of the smartest machines have no idea.
10 min read
How to Make Sense of Contradictory Science Papers
Nautilus
Article
How to Make Sense of Contradictory Science Papers
Jun 2, 2021
The science you can come across today can often appear to be full of contradictory claims. One study tells you red wine is good for your heart; another tells you it is not. Over the past year, COVID-19 research has offered conflicting reports about t
6 min read
The Big Idea Behind Big Data
NPR
Article
The Big Idea Behind Big Data
Nov 17, 2017
As we find our way in a world shaped by Big Data, it's not the reams of information we gather but the networks they illuminate that's the newest addition to science's index of things, says Adam Frank.
6 min read
How Do We Reverse the Tide of an Anti-Science America?
Literary Hub
Article
How Do We Reverse the Tide of an Anti-Science America?
May 21, 2019
8 min read
Can Scientific Discovery Be Automated?
The Atlantic
Article
Can Scientific Discovery Be Automated?
Apr 25, 2017
4 min read
The Logic of the Filing Cabinet Is Everywhere
The Atlantic
Article
The Logic of the Filing Cabinet Is Everywhere
Jun 8, 2021
7 min read
Is America Really Running Out of Original Ideas?
The Atlantic
Article
Is America Really Running Out of Original Ideas?
Dec 20, 2021
4 min read
Lessons For A Young Scientist
Nautilus
Article
Lessons For A Young Scientist
Jan 12, 2022
I sometimes worry that many who would enjoy a scientific career are put off by a narrow and outdated conception of what’s involved. The word “scientist” still conjures up an unworldly image of an Einstein lookalike (male and elderly) or else a youthf
8 min read
Lessons For A Young Scientist
Nautilus
Article
Lessons For A Young Scientist
Jan 9, 2020
I sometimes worry that many who would enjoy a scientific career are put off by a narrow and outdated conception of what’s involved. The word “scientist” still conjures up an unworldly image of an Einstein lookalike (male and elderly) or else a youthf
7 min read
When The State Sinks, Science Is The Anchor
NPR
Article
When The State Sinks, Science Is The Anchor
Oct 26, 2017
4 min read
In Fermat’s Library, No Margin Is Too Narrow
Nautilus
Article
In Fermat’s Library, No Margin Is Too Narrow
Oct 16, 2017
4 min read
Most Tech Today Would be Frivolous to Ancient Scientists
Nautilus
Article
Most Tech Today Would be Frivolous to Ancient Scientists
Apr 19, 2019
5 min read
5 Reasons Why the Regulatory Accountability Act is Bad for Science
Union of Concerned Scientists
Article
5 Reasons Why the Regulatory Accountability Act is Bad for Science
May 9, 2017
Last week, Senator Rob Portman introduced his version of the Regulatory Accountability Act (RAA), a bill that would significantly disrupt our science-based rulemaking process. A version of this inherently flawed, impractical proposal has been floatin
7 min read
Are There Barbarians at the Gates of Science?: The Increasingly Complex Border Between Science and Society Is Changing Both.: The increasingly complex border between science and society is changing both.
Nautilus
Article
Are There Barbarians at the Gates of Science?: The Increasingly Complex Border Between Science and Society Is Changing Both.: The increasingly complex border between science and society is changing both.
Apr 28, 2016
Alessandro Baricco paints a lively portrait of the modern-age barbarian in his 2014 book, The Barbarians. He initially frames the issue as a simple struggle between the shallow and the profound: Mickey Mouse versus Flaubert, Big Mac against the bouil
5 min read
Science and Politics Are Inseparable
Union of Concerned Scientists
Article
Science and Politics Are Inseparable
Jan 24, 2023
4 min read
Breakthroughs Of The Year
The Atlantic
Article
Breakthroughs Of The Year
Dec 8, 2022
Pictures of the beginning of the universe, medicine that can (kind of) reverse death, and other leaps of human ingenuity
10 min read
Humans Evolve?
Techfastly
Article
Humans Evolve?
Dec 1, 2022
6 min read
You Can’t Stop Pirate Libraries
Reason
Article
You Can’t Stop Pirate Libraries
Jun 16, 2022
6 min read
Here’s Our Cutting-Edge, Crowdsourced Imaginary Time Capsule
Nautilus
Article
Here’s Our Cutting-Edge, Crowdsourced Imaginary Time Capsule
Jan 31, 2014
2 min read
The Problem With Science Writing
Nautilus
Article
The Problem With Science Writing
Sep 23, 2016
4 min read
Google Isn’t Grad School
The Atlantic
Article
Google Isn’t Grad School
Jul 6, 2023
4 min read
It’s Time To Stop The Rot
The Critic Magazine
Article
It’s Time To Stop The Rot
Feb 29, 2024
8 min read
Who Will Design the Future?
Nautilus
Article
Who Will Design the Future?
Aug 1, 2019
Ada Lovelace was an English mathematician who lived in the first half of the 19th century. (She was also the daughter of the poet Lord Byron, who invited Mary Shelley to his house in Geneva for a weekend of merriment and a challenge to write a ghost
17 min read
To Spur Innovation, Teach A.I. to Find Analogies
Futurity
Article
To Spur Innovation, Teach A.I. to Find Analogies
Aug 18, 2017
A method for teaching artificial intelligence analogies through crowdsourcing could allow a computer to search data for comparisons between disparate problems and solutions, highlighting important—but potentially unrecognized—underlying similarities.
2 min read
The Secrets of Google’s Moonshot Factory
The Atlantic
Article
The Secrets of Google’s Moonshot Factory
Oct 10, 2017
24 min read

Related categories

Skip carousel

Reviews for Patently Mathematical

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Patently Mathematical - Jeff Suzuki

INTRODUCTION

My Billion-Dollar Blunder

In my opinion, all previous advances in the various lines of invention will

appear totally insignificant when compared with those which

the present century will witness. I almost wish that I might live my life

over again to see the wonders which are at the threshold.

CHARLES HOLLAND DUELL (1850–1920), US COMMISSIONER OF PATENTS

Interview August 9, 1902

I could have invented Google.

Lest the reader think that this is a statement of monstrous vanity, let me qualify it. Google’s search engine is based on mathematics principles that are among the first taught to math majors. Any mathematician could have invented Google.

This observation underscores a crucial fact about the modern world. The motto of every inventor is Build a better mousetrap, and the world will beat a path to your door. But people have been improving mousetraps for thousands of years, so all of the obvious solutions have been found. The garage workshop, with a lone genius struggling to create a world-changing invention, is mostly a thing of the past. Applying the laws of science to develop a more effective painkiller, a faster microchip, or a more efficient solar cell takes the resources of an industrial giant and a laboratory where hundreds or even thousands of scientists and technicians work together.

Without some hope of ultimate compensation, few would make the effort. The patent system exists to promote innovation by giving inventors an opportunity to recover their expenses and to profit from their work. This is reasonable for patents based on physical discoveries. The production of electricity from sunlight relies on a fine balance of a large number of factors, and simultaneous independent discovery of the same mechanism is extremely unlikely. If a company finds a competitor is producing a solar cell identical to one of its own products, it is more reasonable to suspect industrial espionage or reverse engineering than it is to assume their research division found exactly the same solution.

But what of inventions based on mathematics? Applying the laws of mathematics requires nothing more sophisticated than a pencil and nothing more expensive than a paper notebook. History shows that simultaneous discovery of an idea is common in mathematics, and while a physical laboratory can provide all manner of paperwork to establish priority of invention, a mathematician’s laboratory is in their head and rarely well documented. Thus, it is a long-standing principle of patent law that mathematical discoveries cannot be patented.

Yet, in recent years, patents based on mathematics have been issued by the thousands, based on a guideline that, if the mathematical process is connected to a device, the whole is patentable. You can’t patent a mathematical algorithm, but you can patent a machine that runs the algorithm.

As a mathematician, I look on these patents with mixed emotions. After all, I could have invented Google. At the same time, mathematics is so broad a field that surely there are patentable—and profitable!—inventions just waiting to be made. However, allowing patents based on mathematics raises new problems.

Patent policy is directed toward promoting innovation. Patents must be for new and non-obvious inventions and cover the invention, not its uses: this encourages inventors to create new things and not find new uses for old things. At the same time, they must be limited in scope; otherwise, a too-broad patent could potentially be used in attempts to extort licensing fees from other innovators, stifling their creativity.

But the heart and soul of mathematics is its generalizability: the finding of new uses for old things. Therefore, granting a patent for an invention based on mathematics runs the risk of stifling innovation, not promoting it. It follows that patents based on mathematics must be carefully constructed.

In the following pages, I will guide the reader through the mathematics behind some recent patents. We’ll see how some very simple mathematical ideas form the basis for a very broad range of patentable inventions. We’ll also see how these same ideas can be retasked to be used in different fields. In the end, we hope to offer some insight into the question, Under what conditions should a device based on a mathematical algorithm be patentable?

1

The Informational Hokey Pokey

This World Encyclopedia . . . would be alive and growing and changing

continually under revision, extension and replacement from the

original thinkers in the world everywhere. Every university and research

institution should be feeding it. Every fresh mind should be brought

into contact with its standing editorial organization. And on the other

hand its contents would be the standard source of material for the

instructional side of school and college work, for the verification of

facts and the testing of statements—everywhere in the world.

H. G. WELLS

Lecture delivered at the Royal Institution of Great Britain, November 20, 1936

Near the end of World War II, Vannevar Bush (1890–1974), the wartime head of the Office of Research and Scientific Development, described a future device for individual use,* which he called the memex. The memex could store and present an unlimited amount of data, and it allowed the user to permanently link information, so that (to use Bush’s examples) someone interested in the history of why the Turkish bow proved so superior to the English longbow during the Crusades would be able to find and link relevant articles on the subject. Bush described how the memex would change civilization: the lawyer would have at his instant disposal all his own experiences, as well as all those of his colleagues; the patent attorney would have immediate access to all issued patents, and information relevant to all of them; the physician would have access to millions of case histories and so on.

The modern internet has largely fulfilled Bush’s dream of a web of knowledge and experience: the lawyer, through services like Lexis Nexis, can research cases dating back to the 1800s; the patent attorney can consult several databases of patents and related technical reports; the physician has MEDLINE. And so the layman interested in why the Turkish bow proved superior to the English longbow can consult the Wikipedia article on these weapons, whereupon he would discover that the English did not actually adopt the longbow until after the Crusades: Bush, while a brilliant engineer, was no historian.

Bags of Words

Bush realized that having the information wasn’t enough: The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.* While Bush’s memex or the hyperlink structure of the web allows us to link related documents, how do we find a document on a subject?

One solution is to create a directory or an index covering all known documents or at least as many as it is practical to index. In January 1994, Jerry Yang and David Filo, two graduate students at Stanford University, published a web page named Jerry and David’s Guide to the World Wide Web. This directory was sorted by content and contained links to other websites. The next year they incorporated as Yahoo!

Producing an index or directory is a challenging task. Consider a similar but far simpler problem: creating an index for this book. To do that, we need to create a list of index entries, which are the terms index users are likely to use. Each index entry then includes a reference to a page. Someone interested in looking something up finds a suitable index term and then goes to the referenced pages to find information about that term.

For example, this page seems relevant to someone looking for information about Yahoo!, so we’d make Yahoo! an index entry, and that index entry would include a reference to this page. Likewise, because Yahoo! was developed by Jerry Yang and David Filo, Jerry Yang and David Filo should also be index entries, as should Stanford students and Jerry and David’s Guide to the World Wide Web.

The problem indexers face is they must anticipate every possible use of the index by persons they have never met. For example, someone throwing a birthday party for someone born in January might want to know what happened in that month. Then, the fact that Yahoo! began in that month is relevant, so there should be an index entry January and a reference to this page. By a similar argument, index entries should also exist for 1994 and even for index.

From this example, it should be clear that even a short piece of text will produce many index entries; the index entries for any given piece of text will generally exceed the length of the text itself. Because the index must anticipate the needs of everyone who will ever use the index, an indexer must be thorough.

The indexer needs to be consistent. While we introduced Jerry Yang and David Filo by giving their full names, how should we index them? The standard method would be to use index entries Yang, Jerry and Filo, David. But someone unfamiliar with this convention might instead produce index entries Jerry Yang and David Filo. And if Yang and Filo are mentioned later, it will almost certainly be by their last names only, so a human indexer would have to recognize that their names should not be indexed as a unit —Yang and Filo; instead, additional separate index entries for Yang, Jerry and for Filo, David are needed.

An indexer must be fast. While Yang and Filo were able to do the initial indexing of the web by hand, soon they had to employ an army of human indexers. But the web grew so rapidly that the human indexers fell farther and farther behind. Yahoo! stopped using human indexing in 2002, when there were about 40 million websites worldwide. Today, there are more than a billion websites and reportedly more than half a million are created every day. More than a trillion documents can be accessed through the internet.

Although humans are bad at thoroughness, consistency, and speed, computers excel at these things. Unfortunately, computers are bad at one crucial feature of indexing: in many cases, what something is about might not be evident from the text itself. Thus, in our discussion of Yahoo!, there was not one word or phrase (until this sentence) referencing that Yahoo! became one of the first search engines. A human indexer would recognize this, produce an index entry for search engine, and include a reference to the preceding pages. But how can a computer be designed to do the informational hokey pokey, and find what a document’s all about?

In the 1960s, a group led by Gerard Salton (1927–1995), a Cornell University professor, took the first steps in this direction. Salton’s group developed SMART (System for the Mechanical Analysis and Retrieval of Text), which helped users locate articles on specific topics from a database from such diverse sources as journals on aeronautics and information theory and Time magazine.

Salton’s team used vectors. Mathematically, a vector consists of an ordered sequence of values called components. Any systematic set of information about an object can be written as a vector. For example, when you visit a physician, several pieces of information are recorded. To record this information in vector form, we choose an order to list the components. We might decide that the first component is a patient’s sex, the second component is the patient’s height in inches, the third component is the patient’s age in years, the fourth is the patient’s weight in pounds, and the fifth and sixth components give the patient’s blood pressure. Once we know this ordering, then we know vector [Male, 71, 38, 135, 115, 85] corresponds to a male patient who is 71 inches tall and 38 years old, who weighs 135 pounds and has a blood pressure of 115 over 85.*

Turning a document into a vector relies on the bag of words model. In this approach, we choose a set of tokens, usually keywords (Chicago) or phrases (the windy city), and agree on some order. Roughly speaking, the tokens will be the keywords or phrases that we will be able to search for, so the tokens will be determined by how we intend to search our collection and limited only by our willingness and ability to deal with large vectors. For example, we might pick 10,000 commonly used words as our tokens and then convert every document into a 10,000-component document vector whose components correspond to the number of times a given token appears in the document.

For example, we might choose the tokens I, welding, physics, school, and position. Then we could describe each document as a five-component vector, with each component giving the number of appearances of the corresponding keyword.

Now take three documents, say, the work histories of three people. Mercedes might write the following:

After I graduated from welding school, I took a job welding light- and medium-duty vehicles for a local company. My next position included welding shafts, cams, spindles, hubs, and bushings on autos, playground equipment, and trucks. I’m now an automotive welder at Ford.

In contrast, Alice might have written this paragraph:

Graduated from Stockholm University with an undergraduate degree in physics, then went on to earn a PhD in high energy physics, then took a postdoc in physics at the Max Planck Institute for Physics in Munich. I then joined CERN as a Marie Curie fellow working on problems in high energy physics, then later took a senior position with the center’s particle physics staff studying the physics of condensed matter.

Finally, Bob might write the following:

Dropped out of welding school; got an undergraduate degree in physics; went to graduate school, also for physics; taught high school physics for several years before obtaining a position as a physics postdoc; currently seeking research job in physics.

In Mercedes’s paragraph, the word I appears 3 times; the word welding appears 3 times; the word physics appears 0 times; the word school appears 1 time; and the word position appears 1 time, so the document vector for her paragraph would be [3, 3, 0, 1, 1]. Meanwhile, in Alice’s paragraph, the word I appears 1 time; the word welding appears 0 times; the word physics appears 7 times; the word school appears 0 times; the word position appears 1 time, so the document vector for her paragraph would be [1, 0, 7, 0, 1]. Finally, in Bob’s paragraph, the word I appears 0 times; the word welding appears 1 time; the word physics appears 5 times; the word school appears 3 times; the word position appears 1 time, so the document vector for his paragraph would be [0, 1, 5, 3, 1].

Cosine Similarity

Suppose we construct the document vectors for everything in our library. Our next step would be to determine which documents are similar to one another.

A common way to compare two vectors is to find the dot product (also known as the scalar product, or the inner product): the sum of the products of the corresponding components of the two vectors. If two documents have no words in common, then the dot product will be 0 (because no matter how many times a keyword appears in one document, this will be multiplied by 0, the number of times the keyword appears in the other). In general, the higher the dot product, the more similar the two documents.

For example, if we take the document vectors for the work history paragraphs of Mercedes and Alice, which are [3, 3, 0, 1, 1] and [1, 0, 7, 0, 1], respectively, the dot product would be

3 × 1 + 3 × 0 + 0 × 7 + 1 × 0 + 1 × 1 = 4

Meanwhile, the dot product of Alice’s document vector work history paragraph with Bob’s, which was [0, 1, 5, 3, 1], will be

1 × 0 + 0 × 1 + 7 × 5 + 0 × 3 + 1 × 1 = 36

This suggests that Alice’s work history has more in common with Bob’s than with Mercedes’s.

There’s one complication. Suppose we were to take Mercedes’s paragraph and put 10 copies of it into a single document. Because of the document vector counts the number of times a given keyword appears, then every keyword will be counted 10 times as often. The new document will correspond to the vector [30, 30, 0, 10, 10], and the dot product between Mercedes’s revised paragraph and Alice’s original paragraph is now

30 × 1 + 30 × 0 + 0 × 7 + 10 × 0 + 10 × 1 = 40

Now it appears that Alice’s work history has more in common with Mercedes’s than with Bob’s. But, in fact, Mercedes’s work history hasn’t changed; it’s just been repeated.

To avoid this problem, we can normalize the vectors. There are several ways to do this. First, because the document vector records the number of times a given keyword appears, we can normalize it by dividing by the total number of times the keywords appear in the document. In Mercedes’s repeated work history, with vector [30, 30, 0, 10, 10], the keywords appear 30 + 30 + 0 + 10 + 10 = 80 times. The value 80 is called the L1-norm, and we can normalize the vector by dividing each component by the L1-norm. Thus, [30, 30, 0, 10, 10] becomes L1-normalized as [0.375, 0.375, 0, 0.125, 0.125]; we call this a frequency vector.

There are two advantages to using the frequency vector. First, it’s easy to compute: we need to find the total number of tokens in the document and then divide each component of the vector by this total. Second, the frequency vector has a familiar interpretation: they are the fractions of the tokens corresponding to any individual keyword. Thus, 37.5% of the keyword words in Mercedes’s work history document are I, while 12.5% are school.

However, what is easy and familiar is not always what is useful. If we want to use the dot product to gauge the similarity of two vectors, then the dot product of a normalized vector with itself should be higher than the dot product of the normalized vector with any other normalized vector.

Consider Bob’s vector [0, 1, 5, 3, 1], with L1-norm 0 + 1 + 5 + 3 + 1 = 10. The L1-normalized vector is [0, 0.1, 0.5, 0.3, 0.1], and the dot product of this vector with itself will be

0 × 0 + 0.1 × 0.1 + 0.5 × 0.5 + 0.3 × 0.3 + 0.1 × 0.1 = 0.36

Now consider Alice’s vector [1, 0, 7, 0, 1], with L1-norm 1 + 0 + 7 + 0 + 1 = 9 and corresponding L1-normalized vector [0.11, 0, 0.78, 0, 0.11]. The dot product of this vector with the first will be

0 × 0.11 + 0.1 × 0 + 0.5 × 0.78 + 0.3 × 0 + 0.1 × 0.11 = 0.4

The dot product is higher, which suggests that Bob’s work history is more similar to Alice’s than it is to Bob’s own.

To avoid this paradox, we can use the L2-norm: the square root of the sum of the squares of the components. This originates from a geometric interpretation of a vector. If you view the vector as a pair of numbers giving a set of directions, so that a vector [3, 7] would be Go 3 feet east and 7 feet north, then the square root of the sum of the squares of the components gives you the straight line distance between your starting and ending points. In this case, , so you’ve gone 7.616 feet from your starting point. The normalized vector is found by dividing each component by the L2-norm: thus, the vector [3, 7] becomes L2-normalized as [0.3939, 0.9191]. As before, if we treat the vector as a set of directions, these directions would take us 1 foot toward the ending point. Likewise, the vector [1, 9] becomes L2-normalized as [0.1104, 0.9939].

In general, the dot product of a L2-normalized vector with itself will be 1, which will be higher than the dot product of the L2-normalized vector with any other L2-normalized vector. If we want the dot product to be a meaningful way of comparing two vectors, we’ll need to use the L2-normalized vectors. In this case, the dot product will give us the cosine similarity of the two vectors.*

The geometric interpretation becomes somewhat strained when we move beyond three-component vectors; however, we can still compute the L2-norm using the formula. For Bob’s work history vector, we’d find the L2-norm to be giving us L2-normalized vector [0, 0.1667, 0.8333, 0.5, 0.1667]. Meanwhile Alice’s work history vector would have L2-norm giving us L2-normalized vector [0.1400, 0, 0.9802, 0, 0.1400]. The dot product of Bob’s normalized vector with itself will be 1 (this will always be true with the L2-normalized vectors); meanwhile, the dot product of Bob’s L2-normalized vector with Alice’s L2-normalized vector will be

0 × 0.1400 + 0.1667 × 0 + 0.8333 × 0.9802 + 0.5 × 0 + 0.1667 × 0.1400 = 0.8402

This fits our intuition that Bob’s work history should be more similar to Bob’s work history than to Alice’s work history.

Returning to the work histories of Mercedes, Alice, and Bob, we’d find that every term of Mercedes’s repeated work history, with document vector [30, 30, 0, 10, 10], would be divided by

giving normalized vector [0.6708, 0.6708, 0, 0.2236, 0.2236].

Now, if we compute the dot products of the normalized vectors, we get (between Mercedes and Alice)

0.6708 × 0.1400 + 0.6708 × 0 + 0 × 0.9802 + 0.2236 × 0 + 0.2236 × 0.1400 ≈ 0.1252

which confirms our sense that Alice’s work history is more similar to Bob’s than it is to Mercedes’s.

It’s tempting to proceed as follows. First, find the L2-normalized document vector for every item in our collection. Next, compute the dot product between each normalized vector and every other normalized vector; if the cosine similarity exceeds a certain value, identify the two documents as being on the same topic.

Could we have obtained a patent for this method of comparing documents? Strictly speaking, a patent cannot be granted for a mathematical idea, such as the dot product. Nor can you receive a patent for an application of the dot product, like using it to compare two documents to determine whether they are similar. However, a patent could be granted if that mathematical idea is incorporated into some machine. Thus, you could obtain a patent for a machine that compares documents using the dot product.

Unfortunately, there are two problems. First, the number of components of the normalized vectors will equal the number of distinct keywords, and it’s not uncommon to use 100,000 or more keywords. Meanwhile, the number of document vectors will equal the number of documents. For indexing purposes, a document could be anything from a book to a single article in a newspaper magazine; with this in mind, it wouldn’t be unreasonable for a physical library to have more than 10,000,000 documents. This means that we’ll need to store and manipulate 100,000 × 10,000,000 = 1,000,000,000,000 numbers: roughly 1,000 gigabytes of data. Moreover, we’d need to find the dot product of every document vector with every other document vector: this requires finding 10,000,000 × 10,000,000 dot products, each requiring 100,000 × 100,000 multiplications: a staggering 1,000,000,000,000, 000,000,000,000 calculations. A computer that could perform a trillion computations each second would require more than 30,000 years to find all these dot products.

To be sure, computers are getting faster all the time. But there’s a second problem: the dot product doesn’t work to compare two documents. In particular, simple cosine similarity of document vectors does not adequately classify documents.

td-idf

Part of the problem is that not all keywords have the same discriminatory value. For example, suppose we make our keywords the, and, of, to, bear, and Paleolithic. One document vector might be [420, 268, 241, 141, 255, 0] and another might be [528, 354, 362, 174, 0, 213]. The dot product of the normalized vectors would be 0.88, suggesting the two documents are similar. However, if we look closer, we might discover that the first document mentions the word bear (the fifth keyword) hundreds of times and the word Paleolithic not at all, while the second document uses the word Paleolithic hundreds of times but omits the word bear entirely. Any sensible person looking at this information would conclude the two documents are very different.

Why does the cosine similarity fail? You might suspect that, because it ignores word order, cosine similarity would fail to detect the nuances of a text. However, the more fundamental problem is that words such as the, and, of, and to are ubiquitous, and make up about 6%, 4%, 3%, and 1% of the words in a typical English document, so in some sense, most of the similarity between the two documents comes from the fact that both are in English. Remember our goal is to try and classify documents based on their word content. A word that appears frequently in a document may give us insight into the content of the document, but if the word appears frequently in all documents, it’s less helpful: if you want to find a particular car, it’s more useful to know its color than the number of tires it has.

In the early days of information retrieval, ubiquitous words like the, and, of, to, and so on were compiled into stop lists. A word on the stop list would be ignored when compiling the document vectors. However, creating a stop list is problematic: When do we decide a word is sufficiently common to include it? And when does a word become sufficiently uncommon to make it worth indexing? The difficulty is that common and uncommon depend on the documents we’re indexing: hyponatremia is an uncommon word in a collection of Victorian literature but not in a journal on vascular medicine.

Moreover, excluding stop words from consideration limits the uses of the document vectors. Thus, while stop words are useless for indexing, several studies suggest that they provide a literary fingerprint that helps identify authorship. And because the internet links the entire world together, the prevalence of the word the on a web page is a clue that the web page is in English and not some other language.

Thus, instead of ignoring common words, we can limit their impact by replacing the frequency with the term frequency-inverse document frequency. This is the product of two numbers: the term frequency (tf) and the inverse document frequency (idf).

There are several ways of calculating the term frequency, but one of the more common is to use the augmented frequency. If a keyword appears k times in a document, and no keyword appears more than M times in the document, then the augmented frequency of that keyword will be . In the first document, the keyword the appears 420 times, more than any other keyword; thus, its augmented frequency will be = 1. Meanwhile, the word bear appears 255 times, so its augmented frequency will be = 0.804. Note that a word that doesn’t appear at all would have an augmented frequency of (hence, augmented).

What about the inverse document frequency? Suppose we have a library of N documents, and a particular word appears in k of the documents. The most common way to find the value of the inverse document frequency is to use log(N/k), where log is a mathematical function. For our purposes, it will be convenient to define the log of a number as follows: Suppose we begin with 1, and double repeatedly:

1 → 2 → 4 → 8 → 16 → 32 → . . .

The log of a number is equal to the number of times we must double to get the number. Thus, if a word appears in 5 out of 40 documents, its idf will be log(40/5) = log(8), and because 8 is the third number in our sequence of doubles, log(8) = 3. Note that since we’re basing our logs on the number of doublings, we say that these are logs to base 2.*

What about log(10)? Since 3 doublings will take us to 8, and 4 doublings will take us to 16, then log(10) is between 3 and 4. With calculus, we can generalize the concept of doubling; this allows us to determine log(10) ≈ 3.322.

In the previous example, the word the appears in 2 out of 2 documents, so its idf would be log(2/2) = log(1) = 0 (because we need to double 1 zero times to get 1). Meanwhile, bear appears in 1 of the 2 documents, so their idf would be log(2/1) = log(2) = 1.

Now we can produce a vector using the tf-idf vector for our keywords. Consider our first document, with document vector [420, 268, 241, 141, 255, 0]. The largest component is the first, at 420: no keyword appeared more than 420 times. The tf for the first keyword will then be = 1. Meanwhile, its idf, since it appeared in both documents, will be log(2/2) = 0. Thus, the first component of the tf-idf vector will be 1 × 0 = 0. The tf for the second keyword, which appeared 268 times, will be ≈ 0.82, but it also appeared in both documents, so its idf will also be 0. The second component of the tf-idf vector will be 0.82 × 0 = 0. Continuing in this fashion, we obtain the tf-idf vector for the first document: [0, 0, 0, 0, 0.80, 0]. If we perform a similar set of computations for the second document, we find the weighted document vector will be [0, 0, 0, 0, 0, 0.70]. Normalizing and taking the dot product, we find the cosine similarity of the two documents is 0: they are nothing alike. Using tf-idf helps to limit the impact of common words and makes the comparison of the documents more meaningful. This is the basis for US Patent 5,943,670, System and Method for Categorizing Objects in Combined Categories, invented by John Martin Prager of IBM and granted on August 24, 1999.

Latent Semantic Indexing

However, there’s a problem. Suppose we have three documents in our library: first, an article on bears (the animal) that uses the common name bear throughout. The second article is identical to the first, but every occurrence of the word bear has been changed to ursidae, the scientific name. The third article is about the Chicago Bears (a sports team). Here we see two problems with natural languages:

1. The problem of synonymy, where the same concept is expressed using two different words.

2. The problem of polysemy, where the same word is used to express two different concepts.

A giant leap forward in tackling the problems of synonym and polysemy occurred in the late 1980s when Scott Deerwester of the University of Chicago, Susan Dumais of Bell Communications, and Richard Harshman of the University of Western Ontario in Canada developed latent semantic indexing (LSI; but also known as latent semantic analysis, or LSA). Together with Bell scientists George Furnas, Richard Harshman, Thomas K. Landauer (all psychologists), and Karen E. Lochbaum and Lynn A. Streeter (both computer scientists), they received US Patent 4,839,853 for Computer Information Retrieval Using Latent Semantic Structure on June 13, 1989. Again, the patent is not on the process of LSI/LSA, which is a purely mathematical construct, but on a device that applies LSI/LSA to a set of documents.

Consider the vectors for the documents in our library. We can compile all these document vectors into a term document matrix. We can view this as a table whose entries indicate the number of times a given keyword appears in a given document, where our rows

Enjoying the preview?

Page 1 of 1

Patently Mathematical: Picking Partners, Passwords, and Careers by the Numbers

About this ebook

Jeff Suzuki

Related authors

Related to Patently Mathematical

Related ebooks

Mathematics For You

Related podcast episodes

Related articles

Related categories

Reviews for Patently Mathematical

What did you think?

Book preview

Patently Mathematical - Jeff Suzuki

INTRODUCTION

My Billion-Dollar Blunder

1

The Informational Hokey Pokey

Bags of Words

Cosine Similarity

td-idf

Latent Semantic Indexing