This article is focused on ADF or ASA lookup activity filter modified date, type, is Current version or not etc. query for SharePoint Online List.
Scenario:
Many organizations like to save data on SharePoint Online site, especially metadata. To incrementally extract the latest or certain date ranges modified data from SharePoint Online (SPO) we need to filter the modified date and inspect whether it is the latest version or not.
For example, there are items (documents, folders, ……) reside on SharePoint Online, items property looks like:
{
"count": 110,
"value": [
……
{ "ContentTypeID": "0x010100EE….B186B23",
"Name": "Test Customized reports_SQL Joins.xlsx",
"ComplianceAssetId": null,
"Title": null,
"Description": null,
"ColorTag": null,
"Id": 9,
"ContentType": "Document",
"Created": "2023-04-25T10:53:24Z",
"CreatedById": 61,
"Modified": "2023-08-23T15:13:56Z",
"ModifiedById": 61,
"CopySource": null,
"ApprovalStatus": "0",
"Path": "/sites/mysite/.../Customized Reports SQL joins",
"CheckedOutToId": null,
"VirusStatus": "73382",
"IsCurrentVersion": true,
"Owshiddenversion": 19,
"Version": "9.0"
},
…..
}
We want to know whether they are modified after a certain date, the latest version?, is it a document or folder etc. we need to check when we retrieve it from SharePoint Online we will get json response.
Let’s begin.
Solution:
In this article, we focus on the Lookup Activity only, especially on lookup query content. Not only I will ignore lookup’s other configurations, but also skip other activities steps from the pipeline. Such as how to access SPO, how to extract data SPO how to sink to destination ….
If you are interested in those and want to know more in detail, please review my previous articles:
To implement the filter out items properties from SPO’s json response, we need build dynamic content for lookup’s query.
1) Check list status: active or not.
Copy Activity: lkp metadata of Source to Landing from SPO
Get metadata from SPO
@concat(
'$filter=SystemName eq '''
, pipeline().parameters.System
, ''' and StatusValue eq ''Active'''
)
2) Check items on SPO modified “DATE” and type is “document”
Copy Activity: Lookup_DnA_spo_Sources_array
This lookup activity filter items that save in SharePoint Library:
ContentTyep = Document;
FIle Saving Path = /sites/AnalyticsandDataGovernance/Shared Documents/DA27-PanCanada Major Projects Data Automation/04 – Raw Data
that means, I look up the files save at this path only.
file’s Modified >= pre-set offset day
@concat(
'$filter=ContentType eq ','''Document'''
, ' and Path eq ','''/sites/AnalyticsandDataGovernance/Shared Documents/DA27-PanCanada Major Projects Data Automation/04 - Raw Data'''
, ' and '
,'Modified ge datetime'''
,formatDateTime(addDays(utcNow(),json(activity('lkp metadata of Source to Landing from SPO').output.value[0].SourceParameterJSON).pl_Inspecting_Offset_Day),'yyyy-MM-dd')
,'''')
Here, I use “offset” conception, it is a poperty I save on SPO list. Of course, you can provide this offset value in many ways, such as pipeline parameter, save in SQL table, save in a file ….etc. wherever you like.
For example, you incrementally ingest data on daily basis,
the offset = -1
weekly basis, offset = -7
Ten days, customized period, offset = -10
………
etc.
one more example.
if you want to check items saved in SPO “isCurrentVersion” or not and type is “document”
@{concat(
'$filter=ContentType eq '
,'''Document'''
, ' and IsCurrentVersion eq '
, 'true'
)}
That’s all.
if you have any questions please do not hesitate to contact me at william. chen @mainri.ca (remove all space from the email account 😊)