🤹

sqanything (AlaSQL Chrome Extension)

Get it

What is this thing?

Data tables are all over the internet, especially Wikipedia. But HTML tables can be difficult to download or copy-and-paste cleanly. sqanything lets you query the HTML tables in all your open Chrome tabs with SQL. You can also query tables in Notion, CSV or Excel files, and tables in Google Sheets. You can send the results of your query to Google Sheets or just send all the tables on a web page to Google Sheets.

Blocks

Each individual query, including the web pages that it queries from and the destination it sends to (if applicable), is called a "Block". When you launch the extension, you can see your past Blocks by clicking "View History". When you click on one of your blocks, it will automatically launch the web pages used in that query (including Wikipedia pages, CSV's, Notion Pages, public Google Spreadsheets, etc.), so you can run the same query and get the same results. You can create a new Block by clicking "New SQL Block" after you launch the extension.

Schema

When you start or launch a Block, you will see a tab for "Schema" underneath the query text. This shows you the active HTML tables currently in your Chrome tabs. If you don't see any tables, you can click "Refresh Schema" at the bottom.

Each web page in your Chrome tabs is assigned an "alias" such as "851nb3". All the tables on that web page are given the same alias, plus the number that they appear on the web page (e.g. the 5th table on the page would be "851nb3-5"). When you click on a web page under your Schema, you can see all the tables on that page. When you click on a specific table, you can see the number of columns, an example query, and the first 5 rows of that data table. The example query will look something like this "SELECT * FROM HTML("#t851nb3-5", {headers: true}) LIMIT 5", and you can copy and paste that into your query text in order to start querying from that table.

If you wish to remove a web page or data table from your Schema for a certain block (so that web page will no longer automatically launch when you click on that block), you can click on the trash can icon to the right of the web page or table.

Destination

Under the query text of a Block there is also a tab for "Destination". This is where you can enter a Spreadsheet ID and Sheet name (and optionally cell location) to send your query result to. The Spreadsheet ID can be found in the Google Sheet URL. When you click "Run" after entering your destination, your query result will be sent to that Sheet.

If you want to simply send all the tables on a web page (or web pages) to Google Sheets, you can click "Extract HTML tables to Sheets" after launching the extension. Here you can enter the same information as above, and also a prepend to the Sheet name (e.g. "test" so each Sheet will be called "test_" + that table alias).

AlaSQL

This app uses AlaSQL under the hood. Their docs are great! Check them out for syntax questions.

Example queries


SELECT 
[ISO 3166 Country Code] as code,
Country as country,
Latitude as lat,
Longitude as long

FROM CSV("https://cdn.rawgit.com/albertyw/avenews/master/old/data/average-latitude-longitude-countries.csv",{headers:true})
WHERE Latitude BETWEEN 0 AND 25
ORDER BY Longitude DESC

SELECT * FROM CSV('https://cdn.rawgit.com/albertyw/avenews/master/old/data/average-latitude-longitude-countries.csv',{headers:true}) 
WHERE Latitude BETWEEN 0 AND 25

SELECT * FROM CSV('https://drive.google.com/u/0/uc?id=1F-B8Z3YUizsOaBcdSPk-0lWVrC0mxYYs&export=download',{headers:true})

SELECT * FROM CSV('https://docs.google.com/spreadsheets/d/1zhfO8ANC-odz7IDMw065yHbQVPBjtrlo2ukysNlqLlo/gviz/tq?tqx=out:csv&sheet=Sheet1',{headers:true})

SELECT sum(Wins) FROM CSV('https://docs.google.com/spreadsheets/d/1zhfO8ANC-odz7IDMw065yHbQVPBjtrlo2ukysNlqLlo/gviz/tq?tqx=out:csv&sheet=Sheet1',{headers:true})

SELECT * FROM CSV('https://docs.google.com/spreadsheets/d/1zhfO8ANC-odz7IDMw065yHbQVPBjtrlo2ukysNlqLlo/gviz/tq?tqx=out:csv&gid=975578095',{headers:true})

Google Sheets

SELECT * 
FROM CSV(
	'https://docs.google.com/spreadsheets/d/1zhfO8ANC-odz7IDMw065yHbQVPBjtrlo2ukysNlqLlo/gviz/tq?tqx=out:csv&gid=975578095',
	{headers: true}
)

Tips, Tricks, and Workarounds

String to Number

REPLACE([2017], ",", "")::NUMBER as tomatoes,

Selecting columns with spaces

SELECT [Column Name With Space] AS new_col_name FROM table_name

Tables with unusual headers

When you have tables with misaligned headers and columns, try using {headers:false}, e.g.in this url https://en.wikipedia.org/wiki/Victory_Bowl

image

There are fewer headers than columns. Setting the headers = false, results in gathering the correct data

image

SELECT 
`0` as date,
`1` as winner,
`2` as wscore,
`3` as loser,
`4` as lscore,
`5` as stadium
FROM HTML("#REPLACE_WITH_YOUR_ID-1", {headers: false}) LIMIT 5

Things to Try

  • HTML tables
    • Wikipedia
    • Jupyter Notebook
  • Notion
  • Excel
  • Sheets
  • CSV

Examples

How many acquisitions do Google and Yahoo make per year?

If you wanted to compare the number of acquisitions Google and Yahoo have made over time, you could pull up the Wikipedia pages on Google's (or technically Alphabet's) acquisitions and Yahoo's acquisitions. Then, you could run the query below after replacing "#yahoo-table-alias" and "#google-table-alias" with their respective table aliases under "Schema" in the extension.

with google as (
SELECT 
case when [Acquisition date] like '%,%' then
substr(([Acquisition date]), instr(([Acquisition date]),',')+1, 
length(([Acquisition date])))
else substr(([Acquisition date]), instr(([Acquisition date]),' ')+1, 
length(([Acquisition date])))
end as year,'google' as acquirer, Company as company
 FROM HTML("#google-table-alias-0", {headers: true})
where [Acquisition date] <> "[to be determined]"
),
yahoo as (
SELECT
case when Date like '%,%' then
substr(( Date), instr((Date),',')+1, length(( Date)))
else substr(( Date), instr(( Date),' ')+1, length(( Date)))
end as year,'yahoo' as acquirer, Company as company
 FROM HTML("#yahoo-table-alias-0", {headers: true})
)
select cast(year as integer) as year, 
sum(case when acquirer = "yahoo" then 1 else 0 end) 
as  yahoo_acquisitions,
sum(case when acquirer = "google" then 1 else 0 end) 
as  google_acquisitions
from (
select year,acquirer, company  from yahoo
union all
select year,acquirer, company from google ) as a group by 1

You have to do a little string manipulation to extract the year from the acquisition date columns. If you run that after replacing the table aliases, and then send the data to Google Sheets to chart, you should get a chart that looks something like below. Clearly, Yahoo! has slowed down in the last few years (they were acquired themselves by Verizon in 2017). Alphabet completed a whopping 34 acquisitions in 2014, including DeepMind and Nest.

image

Querying Notion

You can query any table in Notion that is in an open tab and is publicly accessible. For example, say you had a table in Notion where you kept track of your favorite movies, like this one.

🎬
Favorite Movies

You could simply query it by having that Notion page in one of your tabs—it should show up under your Schema (if it doesn't, try clicking "Refresh Schema"). To query it, simply replace its table alias where it says "#replace-notion-table-alias" in the following query:

select *  FROM HTML("#replace-notion-table-alias", {headers: true})

Say you wanted to join that table with the list of Best Picture Nominated movies to see which of your favorite movies have been nominated for Best Picture. If you opened up the Wikipedia page for Best Picture Nominees, you could join them with your favorite movie table by writing the following query and replacing "#best-picture-noms-table-" with the table aliases for the Best Picture Wikipedia page that shows up under "Schema":

With bp_noms as (
SELECT Year as movie FROM 
HTML("#best-picture-noms-table-2", {headers: true})
UNION ALL
SELECT Year as movie FROM 
HTML("#best-picture-noms-table-3", {headers: true})
UNION ALL
SELECT Year as movie FROM 
HTML("#best-picture-noms-table-4", {headers: true})
UNION ALL
SELECT Year as movie FROM 
HTML("#best-picture-noms-table-5", {headers: true})
UNION ALL
SELECT Year as movie FROM 
HTML("#best-picture-noms-table-6", {headers: true})
UNION ALL
SELECT Year as movie FROM 
HTML("#best-picture-noms-table-7", {headers: true})
UNION ALL
SELECT Year as movie FROM 
HTML("#best-picture-noms-table-8", {headers: true})
UNION ALL
SELECT Year as movie FROM 
HTML("#best-picture-noms-table-10", {headers: true})
UNION ALL
SELECT Year as movie FROM 
HTML("#best-picture-noms-table-11", {headers: true})
),
my_favorites as (
select *  FROM HTML("#replace-notion-table-alias-0", {headers: true})
)

SELECT  case when b.movie is not null then 1 else 0 end as nominated,
a.*  from
my_favorites as a left join bp_noms as b on a.Title = b.movie

The tables in that page are separated by decade so you'll have to use "UNION ALL" to combine all of them into a single table. If you run that query with the example Favorite Movies notion page above, you should get that Citizen Kane, Moonlight, Casablanca, Apocalypse Now, and Lost in Translation were both in the favorite movies table and nominated for Best Picture.

Which cities had a World Series victory and Super Bowl victory in the same year?

If you wanted to quickly find out which cities had a World Series and a Super Bowl victory in the same year, you could open up the Wikipedia list of Super Bowl winners, the list of World Series winners, and the list of major North American cities (to help extract the city from the team names). Then simply replace the "#sb-table-alias","#ws-table-alias",and "#city-table-alias" in the query below with the corresponding table aliases that show up under "Schema" after you open those web pages. To clean up this query, it helps to replace "New York" with "New York City" and "New England" with "Boston" for consistency between the tables.

with sb_winners as 
(SELECT DateSeason as date_season, 
substr((DateSeason), instr((DateSeason),',')+1, 
instr((DateSeason),',')-6) as year,
case when [Winning team] like "New York%" then 
REPLACE([Winning team],'New York','New York City')
when [Winning team] like "New England%" then 
REPLACE([Winning team],'New England','Boston') 
else [Winning team] end as winner FROM 
HTML("#sb-table-alias-1", {headers: true})),

ws_winners as (

SELECT Year as year, case when [Winning team] like "New York%" 
then REPLACE([Winning team],'New York','New York City')
when [Winning team] like "New England%" 
then REPLACE([Winning team],'New England','Boston') 
else [Winning team] end as winner FROM 
HTML("#ws-table-alias-1", {headers: true})
),

cities as (
SELECT Case when City like '%]%' then substr((City), 1, instr((City),'[')-1)
else City end as city
FROM HTML("#city-table-alias-4", {headers: true})
)


select  c.city, ws.winner as ws_winner, 
sb.winner as sb_winner, ws.year as ws_year, sb.year as sb_year
from cities as c inner join
ws_winners as ws on ws.winner like c.city || '%' inner join 
sb_winners as sb on sb.winner like c.city || '%' 
where cast(ws.year as integer) = cast(sb.year as integer)

You should get the result of that happening to three different cities. New York (Mets and Jets) in 1969, Boston (Red Sox and Patriots) in 2004, and Pittsburgh (Pirates and Steelers) in 1979.

image

Which movies were Joaquin Phoenix and Amy Adams in together?

Say you had a desire to find out which movies Joaquin Phoenix and Amy Adams were both in. You could open up the Wikipedia pages for Joaquin Phoenix filmography and the list of Amy Adams performances. Then replace "#jp-table-alias" and "#aa-table-alias" in the query below with the corresponding table aliases that show up under "Schema" after you open those Wikipedia pages.

with jp_movies as (
SELECT case when a.Year GLOB  '*[^0-9]*' 
THEN a.Year else a.Title end as title
 FROM HTML("#jp-table-alias-0", {headers: true}) as a
),
aa_movies as (
SELECT case when a.Year GLOB  '*[^0-9]*' 
THEN a.Year else a.Title end as title
 FROM HTML("#aa-table-alias-0", {headers: true}) as a
)
select jp.title from jp_movies as jp INNER JOIN
aa_movies as aa on jp.title = aa.title
WHERE jp.title <> "TBA"

Since those tables have offset columns for Year and Title, a solution is to check the "Year" column for digits, and if it doesn't have any then use "Year" instead of "Title" as the title (the reason this happens is the "Year" column can span multiple rows, so the first row for each year moves all the columns over one). If you run that query, you should get the result that they were both in Her and The Master.

image

Are Category 4 Hurricanes getting more common?

You can also easily manipulate or aggregate data in Wikipedia tables and then send the data to Google Sheets to chart. An example of this is aggregating the number of category 4 hurricanes per year from the Category 4 Hurricane Pacific Wikipedia page and sending it to Google Sheets to analyze to see if they are getting more or less common. In the query below, replace "#cat-4-hurricane-table" with the corresponding table alias under "Schema" after you open that page.

SELECT COUNT(*) as cat_4_hurricanes, Season  
FROM HTML("#cat-4-hurricane-table-1", {headers: true})
group by Season

You can send it to Sheets by entering in the Spreadsheet ID and Sheet name under "Destination". After you send that query to Sheets and chart the data, you should get something that looks like this:

image